I delivered this talk at SAP Inside Track Berlin in September 2018. The audience was mostly enterprise developers, almost all of them specifically working with SAP as a platform.
Given how many companies I’ve seen flirting with the idea of distributed ledgers, I thought it would be useful to give people an idea of which cases I see as being a good fit for them, to give them a leg up the next time it enters the discussion.
We’re going to be talking about distributed ledgers.
Distributed ledgers have gained momentum in the last few years. They are powerful tools with interesting properties of their own, but these properties come with trade-offs. They’re not just a Fancy New Database that you’re going to swap in place of your Postgres instance and get a whole bunch of new qualities for free.
So to help clear up those trade-offs, I’m going to:
Give a quick overview of what they are, in case anyone is not familiar with them;
Talk about what I see as the requirements for them to be worth it;
Give some counterpoints about trade-offs you are making.
Hopefully this talk will give you enough to figure out if they’re a fit for your project. That way you’ll be better prepared the next time some consultant comes along and insists “Oh, we should put that on a blockchain!”.
Who am I?
I’m the Technical Director at Samsung NEXT Europe. My own background is not on venture capital, actually, but on tech. I’m a software engineer, started a few companies myself, did a fair amount of consulting for enterprises.
As you can imagine, I’ve encountered my fair share of gross misapplications of technology, where someone managed to convince an enterprise to use a tool without discussion trade-offs. When Oliver asked me to give a talk, I figured I’d save you all some time.
Now, a warning!
First of all: caveat emptor.
I’m going to place a couple big red asterisks right here.
First, there’s many approaches to distributed ledgers. I’m glossing over a lot of specifics, given the time we have, to focus on the characteristics that all or most of these approaches share.
Feel free to grab me after if you have a specific use case in mind and want to discuss it in depth.
A colleague asked me if these criteria were generally accepted.
That’s the second asterisk. This talk is positively teeming with personal opinion. Some people will agree with most of them, some add other criteria.
I’m sure that for some people, just a couple of the criteria I’m going to list will make it worth it, because these criteria can also have ideological implications. Think of how there’s camp which feels every software should be GPL.
What are these things even?
At their most general, distributed ledgers are a type of distributed database. They’re meant to be synchronized through a consensus mechanism. A key characteristic is that, much like an accounting ledger, you only append data to them but never edit past records.
Boring example! Bitcoin is probably the one you’ve heard of the most. In this case, what the ledger keeps is a transaction of coins, their creation and their transfers among different owners.
This is only one use case. In the last few years there’s been lots of examples of applications for distributed ledgers, from keeping track of diamonds to someone tokenizing a wine harvest to raise money for their vineyard.
Instead of going over specific examples, let’s dive right into the general characteristics, and we can discuss some specific examples later if we have time.
When are they worth it?
There are six requirements that I’d say need to be present for a distributed ledger to be worth the overhead.
The system needs to:
Have absence of trust
Keep track of provenance
If you don’t fulfill at least four out of six, and someone is selling you on distributed ledger… you may need to have a stern conversation with them and their reasons for proposing it.
Let’s go over them in detail.
This is obviously a key aspect. We’re talking about distributed ledgers, after all.
Distributed is a bit of a loaded word, though. Let’s narrow it down.
A distributed system could be one that we want to scale horizontally. You know the drill.
Maybe you have too much data for a single server to store or handle.
Maybe you want to run faster queries.
Maybe you want some redundancy.
You’ll notice the key thing that all those scenarios share: you. There’s usually someone in charge, whether it is you personally or your team or your company, we can point at someone who owns the system.
While the data might be distributed, system management is (by and large) centralized.
That’s not the type of distribution we’re talking about.
In case of distributed ledgers we’re talking about a system that is distributed organizationally.
So our first characteristic is that we’re talking about a system which is spread out not only across multiple data centers but multiple organizations or individuals.
Every one of these stakeholders has a copy of the data, can read from it, and can potentially write back to the ledgers as a whole.
These stakeholders may have agreed up-front on who gets to participate or, in the case of public ledgers, the group might be open to all comers.
Absence of trust
Which gets us to the second key requirement. Absence of trust.
There are permissioned ledgers, where only some parties get to write or read, but by and large, the majority of uses out there are in cases where there is no trust among participants.
It should be obvious to anyone who has designed a centralized enterprise system that there is a certain amount of trust involved, even if limited.
If Bob is in your system, you know who he is and what he can do.
If Alice is an administrator, you trust her not to intentionally mangle your database.
You might have a transaction log that you can reconstruct the database from, you might have backups, but these are just recovery mechanisms.
There’s still a certain fundamental assumption that malicious events are rare, and that we can control them by placing trust in the right people, and assign rights accordingly.
Distributed ledgers, on the other hand, are useful for when you can’t make those assumptions.
We’ll elaborate on this a bit more when we look at the other requirements and counterpoints.
Now, given all these participants don’t trust each other, you’re going to need some sort of mechanism to agree on what ends up getting recorded.
Ledgers are append-only databases, as I mentioned earlier. A database exists to record the state of the world. Consensus is the mechanism through which participants agree on what said state of the world is.
On a centralized approach, where you have control of the users, there’s no need for consensus. As long as Alice has the right permission, the database lets her write to the store.
On a distributed ledger, it’s more … byzantine.
There are several consensus mechanisms, and we don’t have time to go into them. A good starting point I’d recommend would be looking into Practical Byzantine Fault Tolerance.
Effectively, all nodes involved end up voting, and there’s defined criteria for what’s considered a safe level of agreement.
If the problem you are trying to solve inherently can use a consensus mechanism for deciding what gets recorded, then distributed ledgers might be a good fit.
There’s another quality which drops out of these, which is disintermediation.
Let’s define this first.
What are intermediaries?
There are many forms of intermediaries.
Banks are a trivial example, even if somewhat trite. I don’t think I need to convince any of you that if in order for me to send you money, a transaction needs to pass two banks and a connecting system, all of these are intermediaries.
Validators can be intermediaries too. Suppose I’m selling you my car. You probably want to make sure I actually own it, for which you would need to go to a trusted third-party (a vehicle registry).
Once you decide to buy it, back where I come from you’d need two involve two distinct intermediaries: a lawyer to draw up the transfer and then the vehicle registry to record it.
(I understand in Germany it’s simpler).
These are organizational examples, but we can have technical intermediaries as well. They share a common trait with organizational intermediaries in that they are gatekeepers between you and the task you want to accomplish.
In fact, I’d generalize it and say that anything you can’t directly control or fork, anything that sits behind an impenetrable API wall, acts as an intermediary.
For a distributed ledger to be worth it, the system’s requirements need to almost abhor this sort of gatekeeping.
Back to our list…
If we combine the three previous characteristics, we have a system where:
Everyone involved has a copy,
Everyone can try to write to the shared state of the world,
Every node involved agrees on if that write takes place or not.
On this scenario, well… we no longer need a central control mechanism acting as an intermediary for which actions take place.
Keep in mind, of course, that has business implications. Being an intermediary between people and data is a profitable business. Make sure your stakeholders are OK with an approach that will entirely keep them from it.
So far we have:
A system where the data can be out of our control,
Where we don’t trust anyone who may be writing to it,
So we’re going to need to design some mechanism to agree on the world’s state,
And which almost refuses to live behind an API wall.
If you throw those four requirements into a document, chances are that the first question is going to be: well, how do I know where the hell the data came from in the first place?
If so, good, because ledgers are ideal when you are required to record provenance.
In fact, I referred to them earlier as a database. That’s accurate, but there’s a description that’s even closer: ledgers are append-only logs.
That’s what you’re doing - you’re logging changes, which means you get a full history of where the current state came from.
Now, logs aren’t exactly known for their query-ability, are they? They record what happened, but not necessarily the current state. If we have a bunch of logs, chances are we’re going to need some sort of indexing system on top, and a summary of the current state of the world.
So logs are useful, but they carry overhead.
We’ll come back to this when we go over the counterpoints.
We’ve just added a provenance characteristic to our already extremely open and rabidly public database.
Something that almost drops out of that is that this system is going to be radically auditable.
That’s the last requirement on my list.
Anyone who builds systems - especially in a financial context - would agree that auditability is great. I’m sure your users consider this something fundamental, so why am I even listing it among the requirements you might not have?
Usually, I’ve found that businesses tend to be more selective about who gets to conduct an audit.
Here’s the thing… Remember we’re talking about a fully distributed system where participants may not trust each other.
This system will be also fully auditable by everyone involved. Even those other parties you don’t necessarily trust.
Why is this important?
Now, ledgers have all these qualities, so why is it important to look at their flipside?
Well, I keep throwing the word out there… Because of overhead!
Proponents of any new tech usually tend to gloss over things. As a developer, you’re going to get a lot of descriptions which boil down to two things:
“tool X has infinite potential applications!”, and
“you get all these things for free!”.
Whenever you hear that about anything, put your skeptical face on.
Yes, one could twist a tool into a lot of shapes. I could use Postgres to store my documents instead of my file system, but guess which one is going to be easier to access?
Remember TANSTAAFL: There Aint No Such Thing As A Free Lunch.
Overhead can come from different places.
Some of it will be technical and will have an impact on your application’s performance.
Some of it will come for the architecture and will force design decisions on you.
Some of it will simply be cognitive. These are new tools, so learning how to use them right has a cost. You’re going to be better off if you learn these trade-offs in a project that the tool is well-suited for.
Your job, then, is to figure out the trade-offs and decide if the overhead pays for itself.
Counterpoints and trade-offs
One of the most obvious cases in which this might break down is if your system organizationally centralized.
If you only care about distributed data, but all that data should be under the central control of a single organization, look elsewhere.
You may want to have your data horizontally distributed, but if so, you’re better off using something like CouchBase, DynamoDB, or anything else designed specifically for that purpose.
Now, suppose you do have trusted parties, or a situation where you need to bake some fine-grained of trust into your application’s design.
In that case, I expect that whatever user taxonomy you have, it’s going to be easier to bake that user taxonomy into the system than try and twist a consensus mechanism into that shape.
Remember that consensus has a cost, which might manifest as write latency or design constraints. You don’t want to pay a price unless it buys you something.
No need for consensus
Consensus basically means that a majority of nodes in the network need to agree on what the truth is.
If you have trust, then you don’t need consensus. What you have instead is a central Access Control List which you use to validate any user’s actions.
As long as your system can use it to check Alice’s permissions before she tries to edit Bob’s records, that’s all the consensus you’re going to want.
This is where I see a lot of cases break down.
A sample use that you’ve probably heard of for distributed ledgers is sovereign identity. It keeps popping up.
Now, there’s two types of identity that we could talk about.
The first is your real-world, government-issued identity.
The second is the set of any online identities you may want to create to identify yourself with systems.
The second case is one that fully belongs online. Using public key cryptography we don’t need any intermediaries to determine that I am a user Ricardo. I can prove that directly by being in control of a private key.
There’s no reason I should need Facebook Login or Google Auth.
There’s also no reason for me to keep any online badges or recommendations or achievements tied to a single network. We should be able to take our online personas wherever we go, without having to ask for an intermediary’s permission.
Real-world identity, on the other hand, currently depends entirely on trusted intermediaries. Whether I use an ID card or a passport, anyone getting a copy of that has no way to trust the document itself. They only trust that they can go to a third party and ask for them to validate the information.
Once there’s an external point of trust, and that point of trust is anyone else but the person making the claim, a distributed ledger becomes less relevant.
If you are building such a system, you have alternatives. You could just build it on a good old relational database and slap an API in front of it, for example.
And if your system can live entirely behind an API wall, and every user would be just as happy… then you’re going to have a harder time making the case for a ledger.
Other ways to determine provenance
There might also be other, more convenient ways to determine provenance.
Let’s go back to the personal ID example.
Yes, you could have the ID registrar vouch for you on a blockchain, and then you just point everyone to that cryptographic assertion.
That assumes that both the office validating your identity and those checking said assertion are tech-savvy enough.
Because actual use will always trump tech.
If they are, though, do you still need a ledger for provenance when there’s a trusted intermediary? There are simpler solutions.
For instance the ID office could just give you a PGP-signed statement, which anybody who wants to check could just validate against the registrar’s public key.
We have also been tracking provenance for ages the old way - we simply keep a log of who altered what record.
If provenance is the only part of distributed ledgers that you care about, or there are other ways of obtaining it, reconsider.
You want to control sharing
As a counterpoint to their auditability: If you need to have fine-grained control of what information gets shared and replicated… look elsewhere.
If you’re choosing a distributed ledger as your data storage, start by working from the assumption that everyone can see all the data.
We’re effectively talking about the equivalent of giving everyone involved a copy of the entire database.
Depending on the system, that “everyone” might be five large institutions or the entire world. It’s also possible that the actual data logged is encrypted.
The fact still is that anyone with access gets to read every record. As I hope everyone knows, metadata can be very revealing.
If you are currently having to add a few of these features yourself (preferably all, but even 4-5 could be enough), then distributed ledgers are a great choice. You can let them do the heavy lifting in those areas and focus on the core of your system.
If you are not, and you just think they’d be nice to have, well…