Antony Lewis is a bitcoin and blockchain consultant and blogger, who previously served as the director of business development at bitcoin exchange itBit.
In this article, Lewis attempts to break down some of the more misunderstood questions circulating among institutions seeking to adapt distributed blockchain tech for alternative uses.
There are good reasons and bad reasons to use blockchains.
In conversations with people considering blockchain use cases, I have noticed common confusions arising from certain words. At issue, is that they were initially used in a narrow context (usually to describe bitcoin’s blockchain), and are now being interpreted more generically for other blockchains, in cases where they may no longer apply.
In this post, I hope to untangle some of these common misconceptions.
Writing data
Bitcoin has specific security features for writing data due to the burden of proof-of-work consensus. That is, in order to add blocks of transactions to the blockchain, you have to validate all the transactions within the block (easy) and then perform repeated calculations (called hashing) to find a magic number that makes your block valid and acceptable to the other participants according to the rules of the network (easy, but computationally expensive, therefore energy intensive, therefore expensive). This proof-of-work burden combined with the longest chain rule makes it expensive to mine your own subversive chain.
Private blockchains on the other hand, with known block validators, may have other mechanisms replacing proof-of-work that limit the ability of others to subvert the chain.
These rules can specify that blocks need to be signed by a limited, known list of signatories. The round-robin fashion by which entities take it in turns to write blocks is enough to discourage or limit unilateral bad behavior.
Reading data
Bitcoin, and blockchains, do not have inherent security against read access. Indeed, blockchains are mechanisms for copying data to all relevant participants – this is what consensus is all about.
If you think you have cybersecurity headaches controlling read-access to one central database, then multiply that by the number of nodes in your blockchain to get the new attack surface area of your blockchain.
You can control read-access to some degree by encrypting certain elements on your blockchain and handing out the keys to the relevant participant. But, consider the threat of industrial espionage where keys are sold to a rival organization who also runs a node – now the rival can read your data without even penetrating your system, because the blockchain is copying the data right into his data centre!
There may be solutions here involving key-rotation, but historical data also needs considering. The value of the third party is that they can control access to the data more finely. They also provide a single entity to litigate against if they expose private data or they breach their contractual obligations.
Denial of service
Blockchains are more resilient than centralised systems against denial-of-service attacks, due to their peer-to-peer, multi-redundant nature. If one node is taken offline, the others keep working.
Users connected to the disabled node will be unable to connect, unless there is a mechanism in place for them to find other nodes to fall back to.
There can be confusion between the cryptographic methods used in bitcoin (hashing, digital signatures) and data on blockchains being encrypted (data stored as cyphertext).
This can lead people to think that data on a blockchain is by default encrypted.
In fact, data on blockchains is by default unencrypted, especially data that needs to be validated by the nodes. In bitcoin, transaction data is not encrypted, as you can see by looking at any transaction in bitcoin’s blockchain. (For a deeper explanation of the specific elements in a bitcoin transaction, see here).
The most apparent problem with encrypting data on a blockchain is that the encrypted data can’t be validated, because nodes need to know what they are validating.
For example, if I am validating the legitimacy of your payment of 2 BTC from your wallet, I need to know the contents of your wallet (ie your previous inbound transactions) and the fact that you are trying to spend 2 BTC (and which ones).
In a private chain, if all validating nodes can decrypt your data by having decryption keys, then you need to consider why you are encrypting it in the first place.
There are solutions emerging from primary cryptographic research that allow for facts to be proven about data without knowing the underlying data itself, known as zero-knowledge proofs, but this technology is not currently mature.
If privacy is important, then consider what needs to be encrypted: All data at rest? Data in motion? The whole database? Data within specific database fields? And who will be able to decrypt it and when? How will permissions be granted? Can permissions be revoked? What happens if a third party gets a decryption key through a rogue staff member? What happens if a legitimate user loses a decryption key?
Key management is a crucial part of data security – even more so when the data is freely shared between (usually) competitors in an industry, and it needs to be carefully considered in a blockchain solution.
Many existing centralised solutions already do an excellent job of allowing access to data, with carefully controlled read and write access, and also a layer of accountability on the central owner of the data who can react to either moral imperatives or legal directives.
Facebook, for instance, is quite accessible globally, and it can take down hate speech or copyrighted material.
Blockchains can make access control to data more complex, and immutability is not without its downsides. In many potential use cases, nodes are run by a separate entities or groups (if they’re not, then consider why you’re using a blockchain in the first place), and each entity controls and manages its own access control to the data.
There may be challenges around managing access control across all entities that have a copy of the blockchain data.
This narrative seems to have come from bitcoin’s whitepaper, which describes the purpose of bitcoin to allow people to send digital cash from person to person without a specific financial intermediary.
If you count the miner adding the block as an intermediary who collects fees and rewards for his work, then there are intermediaries in bitcoin. But, the point is that they are not specific (one miner can substitute for another), and you are not beholden to a specific miner for your transactions to work or not.
For many private blockchains currently being described in industry, there are middlemen – these are the participants running the nodes, or the technology vendors clipping tickets to monetize their blockchain solutions.
I have occasionally heard ideas where users need to store blockchain data on their phones (especially for use cases where users should own their own data).
Beware the mobile phone blockchain, as it implies that the phone will be constantly chatting to the rest of the network, downloading and uploading other people’s data non-stop to remain in consensus.
In bitcoin, where old transactions need to be tracked in order to figure out the validity of new transactions, this is the case.
It is also the case that a bitcoin transaction only “happens” or settles if it is broadcast to the bitcoin network and is accepted into a block. Each event in bitcoin is a necessary event to build up the picture of the state of the ledger.
This does not mean that if you throw a blockchain at a random problem, you will immediately accurately capture every single event.
Events need to be input by someone or something and then broadcast and accepted for them to be recorded.
Data on a blockchain doesn’t imply accuracy – events need to be recorded accurately in the first place. This is even more important when the record may be immutable.
This is a confusion around use of the word “true”.
In bitcoin, “true” means that the network has agreed that a transaction has taken place, and nodes are in agreement or consensus that this has happened.
The concept of “truth” as applied to blockchains doesn’t extend to other meanings of “true”. If a heart-monitoring piece of hardware becomes faulty and records incorrect heart-rate readings onto a blockchain, do the readings become truth? Clearly not.
On a registry of car ownership, a blockchain may immutably record that a car has changed owner. If this transaction was made in error or fraudulently due to a hacking of the owner’s phone, what is the state of the truth? If the transaction was found to be fraudulent by the police and needs to be ‘unwound’, then how will that be done, given the cryptographic security of digital signatures? (There are solutions, but they just need to be thought through).
In the case of blockchains, truth just means “what was originally recorded and agreed as valid by the majority of the nodes”.
Valid doesn’t necessarily mean true. Don’t confuse blockchain truth with “The Truth”. For a trivial but concrete example of an immutable lie on multiple levels, see here.
This is prevalent in the blockchains-for-KYC and blockchains-for-document-storage space.
Comments such as ‘This is stored on the blockchain’ can cause confusion when a hash of a document (PDF, JPEG, etc) is published to a blockchain. A hash is not an encrypted version of an original file; and when a hash is stored, you can’t retrieve the original by decrypting the hash. The hash of a fingerprint of the data, and if it is stored on a blockchain, someone who has kept an exact copy of that data (off chain) can prove that that specific data existed at the timestamp when the hash was stored on the blockchain.
While you can store whole documents on blockchains (after all a blockchain is just a database coupled with software that validates and shares new entries to the other participants), passing large chunks of data around at speed can create its own set of problems.
There can be confusion when the word “participants” is used.
Generally speaking, there are three main types of participants to blockchains:
It may be best to always spell out exactly which participants are being referred to.
Blockchains are great when multiple parties need to read the same information, but for whatever reason there can’t be or shouldn’t be any specific individual party in control of that data.
Gideon Greenspan has written a great article about avoiding the pointless blockchain project, and later described some genuine use cases in a follow-up post.
Go for it!
The only way the technology will improve is by people trying it and adapting it to fit problems better.
Try to understand and be aware of the limitations and complexities early and be careful about over-fitting a trendy technical solution to a problem.
This article originally appeared on Lewis’s blog ‘Bits on Blocks‘, and has been republished here with his permission.
Lost in thought image via Shutterstock