Blockchains and the Question of Data Reliance

Screen-Shot-2016-08-11-at-7.43.03-AM-e1470915908395
11 August 2016

Kesem Frank is a former blockchain specialist at Deloitte and COO and co-founder of nuco, a blockchain startup aiming to transform digital infrastructure.

In this opinion piece, Frank explores how blockchains could come to play the role of a trusted digital arbiter, and why this may be essential to global consumer protection as the Internet of Things comes of age.

trust, fall

I’m not going to shock anyone by saying we’re all living in the age of big data.

Data analytics and data-driven decision-making is the gold standard in an ever-expanding range of domains. From running organizations to deciding on medical treatment to even dictating athletic training routines, data is established as central to more and more aspects of life.

With the hope I haven’t bored you too much by restating what most of us take for granted, I’d like to ask a naïve question – How do we know we can actually rely on all of this data?

To clarify, I’m not at all questioning the use of data to optimize decisions. On the contrary, I think that’s exactly what we should all be doing. My question also doesn’t relate to the technical aspects of maintaining data. (Just Google any such terms as “data integrity”, “data loss” or “data corruption”, and you’ll see there are countless standards and best practices already dealing with those).

My question is much more fundamental.

Say we have a dataset relevant to a decision we face, and there’s nothing technically wrong with it, what indication do we have that we should actually rely on its contents?

Establishing trust

If you think about it, you can quickly get to two main compelling reasons we have to rely on data: vertical integration and a trusted third party.

Let me explain what I mean by way of two pretty simplistic examples.

Vertical integration includes any such scenario where we own the entire value chain comprising the dataset. Think of your Fitbit (or any similar activity tracker) – you own the sensor generating the data, and you’re right there when the data is generated.

At the end of every day, you get a data read indicating your activity, so you might say something like, “I was moving around more today, so it makes sense I have 3,000 steps today versus my regular 2,000”. The fact that every stage in creating your activity’s dataset is owned by you, makes it pretty easy to accept the validity of such data.

Trusted third parties encompass any scenario where we trust the party generating or tracking the data, and so we accept the data as valid.

When I want to know how many visits my website is getting, I check google analytics. Because I trust Google (notwithstanding the question of whether I should ), I accept that data as valid.

Gray areas

These are really simple examples, but the question of data reliance is absolutely huge in scope.

The scenarios I’ve illustrated above suggest that unless we own the data creation process, we are forced to trust other parties before we should ever rely on their data. The complexity of this situation becomes very clear when you think of the growing reliance on IOT generated data.

In 2007, the I-35W Mississippi Bridge collapsed tragically killing 13 people. The bridge has since been rebuilt and now incorporates over 500 sensors monitoring the bridge for strain, load distribution, vibrations, temperature, etc. Seemingly, this solves the problem. If there’s ever any concerning indications, the sensors will tip us off in advance and we could dispatch maintenance crews and prevent disaster.

However for this to work, we need to fully rely on the sensor’s data, with the question at issue being – why wouldn’t we?

Consider a scenario in which the sensors are malfunctioning and are constantly sending indications that the bridge is fine. For simplicity’s sake, let’s assume the sensors generate two types of data: a “Good” message meaning nothing is wrong, and a “Bad” message meaning we should dispatch maintenance crews.

Let’s also assume there is no “vertical integration” of data, ie the IOT sensors are owned by Company A while the maintenance crews are part of Company B.

If the bridge collapses even though the sensor sent “Good” messages, Company A is in hot water – its sensors did not deliver on their critical role.

However, if the sensors would have sent “Bad” messages but no maintenance crews were dispatched – Company B is all of a sudden at fault. This creates a tangible incentive for Company A, to “cheat” by altering the dataset to show the sensors we’re alerting there’s a problem, but Company B disregarded them.

I’ve used this grievous example to illustrate how serious this question is, but the scenario I’ve described could have just as easily be applied to your TV, washing machine or any other IoT-enabled device.

Blockchain solutions

According to major market trends and projections – this is pretty soon going to be everything we own. I’d also like to note that the question of data reliance is not an academic one at all, nor can we afford the luxury of tagging it as “let’s worry about this in the future” type of matter.

Law enforcement agencies around the world are already doing an extensive usage of cellular data to place suspects in a scene of crime.

The quarried data can serve as supporting evidence either further indicating a suspect or (sometimes) providing an alibi. This is exactly what a network engineer, convicted of murdering his wife, tried to exploit when he used his access to networking equipment to plant fake phone calls from his wife to himself after she was already dead.

I’d suggest that the question of data reliance is an absolutely critical one we should all be asking all the time.

As it happens, my position allows me to go beyond posing the question and actually propose a solution. To me, it is pretty apparent that we already have an architecture that allows us to decide what is true and what isn’t, and achieve consensus among multiple parties, without having to own all the data ourselves or to blindly trust others.

That architecture, of course, is blockchain technology, and I would argue that integrating it into many of our existing systems is absolutely critical for all of us to be able to take data reliance to the next level.

In particular, I believe that to unlock much more meaningful value out of data, a blockchain-based solution is a required, with the following domains at the top of my list:

  • Data analytics
  • Insurance claims
  • Record management
  • Regulatory compliance.

In conclusion, I’ll restate that data reliance will become an increasingly critical issue, growing in importance as data becomes more and more central to our lives.

We have an urgent need to solve the issues raised by the question of data reliance, requiring better answers than either whole ownership of data or the assumption that another party is “probably trustworthy”.

Blockchain technology will provide an effective and highly applicable way to address this question, and will ultimately constitute a commonplace standard for data reliance.

Global communication image via Shutterstock

Read more

Features Big Data