When Blockchains Go Down: Why Crypto Outages Are on the Rise

lightoff
23 September 2018

Berniesanders (not to be confused with former presidential candidate and Vermont Senator Bernie Sanders) is an institution on blockchain-based blogging platform Steemit.

Steemit allows content creators to earn crypto – at least, crypto that’s native to the Steem blockchain, of which there are three – for popular posts. While recent successes include waffle recipes, romantic fiction and crypto punditry, berniesanders gets a pretty steady paycheck (about $30 at a time) for his single sentence, self-described “shit posts.”

A recent sampling: “Are you having fun? I’m having fun.” ($60), “I’m on a boat!” ($31), “Show me your shoes.” ($30) and “How many comments can a shit post get?” ($263 and 319 comments).

But for a few hours on September 17, the Steemit community was deprived of berniesanders’ wisdom.

On that day, Steemit became unavailable when Steem suffered an outage and stopped adding new blocks. The blockchain and the apps on top of it had gone dark.

Steem’s outage, the company explained, was related to an upcoming hard fork update. The code for the fork was being run by some nodes in advance, and as such, these nodes split off onto an incompatible chain when certain safeguards failed. The nodes accidentally hard forked the network early, and as a result, the nodes couldn’t come to consensus on new blocks.

“The blockchain was the piece that was halted in this case,” Ned Scott, the founder and CEO of Steemit, told CoinDesk. “But it caused a ripple effect, a domino effect on all the apps built on top.”

For the Steem blockchain, that’s 400 applications, according to Scott.

And several of those applications likely had confused, worried and sometimes angry users wondering why they couldn’t interact with their favorite blockchain-based tools. Case in point, once the Steemit network began functioning normally again, berniesanders returned with a post tagged “testingshitsteem,” “amateurshitdevs” and “deadchain.”

That’s perhaps a bit harsh.

Sure enough, other users weren’t quite so critical. A Steemit user going by “alphasteem” (she of the waffle recipes) said:

“I guess that’s the way things work with new technology.”

The only problem is, that’s not how things are supposed to work with this particular piece of new technology. One of the most frequently cited advantages of blockchain networks is that they suffer zero downtime – or close to it.

For example, there’s a website dedicated to tracking bitcoin’s uptime since its launch in January 2009: 99.992559576 percent, at the time of writing. And the Ethereum Foundation describes the network’s applications as running “exactly as programmed without any possibility of downtime, censorship, fraud or third-party interference.”

In recent months, though, major blockchain networks have seen downtime, and the trend has some people wondering, WTF?

More outages

The incident on the Steem network is not the only recent example of a blockchain going down (in fact, it’s not the only time Steem has gone down in recent months).

In March, Neo’s blockchain was temporarily halted. This can happen, the project’s senior research and development manager Malcolm Lerider initially explained, “when a consensus node gets disconnected during the consensus.”

In response to pointed criticism – to the effect that, if just one of only seven consensus nodes on the Neo network can pause the chain by going offline, Neo is highly vulnerable – Lerider stepped that response back a bit. He said Neo could handle the loss of a consensus node, and that the circumstances leading to the incident were more complicated.

A few months later, the EOS blockchain also saw the production of new blocks halted for nearly five hours.

According to Thomas Cox, who at that time was the vice president of product at Block.One, the company behind the EOS protocol (he’s since left the company), the deferred transactions weren’t being checked correctly, which led to a “weird state” and “prevented further blocks from being created.”

This incident occurred just a couple of days after the EOS network went live in June.

Federated and delegated

These examples raise the question of why, nearly a decade into the existence of blockchains, the promise of zero downtime is starting to show cracks.

The answer may have to do with the emergence of new ways of achieving consensus: the process by which all the participants in a blockchain system come to agreement on the state of the network.

In bitcoin, ethereum and other proof-of-work (PoW) systems, the way consensus is achieved makes it extremely unlikely that a network will come to a halt – even if a high number of nodes drop off.

Speaking to this, Riccardo Spagni, project lead at monero (a proof-of-work cryptocurrency), told CoinDesk:

“PoW can handle things like the network partitioning and coming back together after some time. It’s incredibly robust.”

In contrast, a newer method – versions of which Neo, EOS and Steem all employ – designates a certain set of specialized nodes to determine the state of the network. Rather than “mining,” these nodes come to agreement through quicker and less energy-intensive processes, enabling faster and cheaper transactions than bitcoin or ethereum.

These systems are broadly known as federated or delegated protocols, with more specific labels applying based on the exact cryptographic methods involved: delegated Byzantine Fault Tolerance (dBFT) for Neo and delegated proof-of -stake (DPoS) for EOS and Steem.

Neo’s Lerider disputed the idea that federated blockchains are more susceptible to downtime in general. “Different consensus algorithms may be used in a federated chain,” he told CoinDesk, and “to know which ones that have potential to go down,” it’s necessary to look at the specific implementation.

Broadly, though, delegated consensus has brought something new to cryptocurrency: the potential to scale enough to accommodate use cases that only centralized providers were previously able to handle. For instance, Steem and EOS can support millions of transactions per day, according to the website Block’tivity.

Yet, at the same time, these new protocols have reintroduced a foible of centralized providers to the world of blockchain: downtime. When key nodes in a federated system go down or fall out of sync, the entire network can grind to a halt.

Accessibility or consistency?

That’s not to say these systems are necessarily inferior to traditional proof-of-work, however.

There is an important tradeoff at work, according to Eric Wall, blockchain and cryptocurrency lead at the Swedish fintech firm Cinnober.

“All distributed systems are fundamentally limited by the CAP theorem,” he told CoinDesk.

According to this theorem, which is often cited in discussions of blockchain networks, a given system can only optimize for two of three characteristics: consistency, availability and partition tolerance (hence the acronym “CAP”).

Although, in reality, the range of choices is narrower. Partition tolerance – the ability to run a blockchain over a network that loses some messages, as the internet does – is “non-negotiable,” said Wall. So engineers can either favor accessibility, as in bitcoin and ethereum; or favor consistency, as in EOS, Steem and Neo.

Wall described what these options look like in practical terms, saying, “Many federated systems will simply halt in contingency situations, often requiring manual intervention to start running again. Bitcoin, on the other hand, will typically not halt, but instead bitcoin forks into two blockchains for a short period of time a couple of times a month.”

In other words, from the user’s perspective, the bitcoin network may never go down, but there’s no guarantee that a user hasn’t found themselves on a fork that will eventually be abandoned in favor of a canonical chain.

Most of the time, Wall continued, bitcoin’s lack of consistency isn’t a big deal. The network “does have eventual consistency,” he said, “which comes from the fact that the forks resolve themselves automatically after a short while.”

He added, “So while Bitcoin is not a true CAP system, it’s practically as good as one.”

Then again, certain incidents have shown that favoring availability over consistency can get blockchains into trouble. Steemit’s Scott pointed to an incident in March 2013, when bitcoin forked in what Vitalik Buterin – then a journalist – called “one of the most serious hiccups that we have seen in the past four years.”

Echoing that, Wall suggested that such incidents may be an argument for consistency-favoring “CP” systems over accessibility-favoring “AP” ones:

“Two conflicting forks are a much bigger danger to the network than a single halted one.”

Showing off scars

What might seem notable here, though, is that bitcoin hasn’t suffered a similar incident since 2013, while younger networks continue to experience “hiccups.”

“The reason why these bugs have been more prevalent in federated systems than in PoW-based systems recently boils down to the fact that the Bitcoin codebase is more battle-tested, more stringently vetted and of superior quality than its federated counterparts,” Wall said.

Indeed, when the oldest dPOS blockchain, Bitshares, launched in 2015, bitcoin had already been live for more than six years.

But the younger networks might well catch up. “Steem is now a very battle-hardened blockchain,” Scott said following the recent outage.

“I don’t look back and say there weren’t bumps in the road,” he continued. “I look at those bumps and bruises as testament to our strength and resilience and our drive for innovation.”

Steem still plans to go ahead with the planned hard fork update – its 20th – on September 25.

It is also notable that, grizzled veteran though it may be, bitcoin narrowly avoided terrible consequences from a severe bug discovered this week, which could potentially have taken down large swathes of the network for a relatively low cost.

Speaking to this, Zooko Wilcox, founder and CEO of the Zcash company (zcash, like bitcoin, is a proof-of-work cryptocurrency), told CoinDesk that at the end of the day, no network is perfectly safe.

He concluded:

“There is a risk of software failures taking down any software system, including any blockchain such as Bitcoin, Ethereum or Zcash.”

Light image by Artur Matosyan on Unsplash