So you see, you have a cat video, you want to share it with a friend. What do you do? You press on the share button, you get a link and you share it with your friend. Those actually summarize those two principles of the centralized web. The cat video is posted by someone, someone posted it on Youtube. Because it is inconvenient to host your own server, he uploaded it, because, well, maintaining a server is a lot of work. A lot of people are lazy, so they did the same, which actually summarizes the first step: those resources, those videos, are concentrated on one entity, which is Youtube. Another concept is that the link address is „youtube.com/catVideo“ – this is called host-based-addressing, which just means that the address of the cat video is based on where it is hosted, in this case youtube.com. So, what do you need to do to move the video from youtube.com to instagram.com? You have to upload it, again. Which is actually a no-brainer, this is what we all do. You have your video on Youtube, you have your video on Instagram – but what if I told you that you can upload your video to the internet without hosting it on Youtube or Instagram and you only have one link that you need? Welcome to decentralized web magic, by Nico Reindl. I’m a Full Stack Developer, I’m fascinated with decentralization, so that’s why I give this talk today.
What do we need? The core is all about hash functions. What is a hash function? A hash function takes an input and produces a fixed-length output, typically 32 bytes. We take this cat, we hash it, then you have the hash-cat, which looks like this. So… what? Well, you now have something called content-based addressing. Also sounds complicated, first I talked about host-based addressing – the address was based on the host, dependent on the host on which it was hosted, which was Youtube – now we have content-based addressing, which means the address is based on the content. In those hash functions, when you take one input, you always get the same output. When you take another input, you get a different output. You never get the same output from two different inputs, which means those hashes are unique. You can use them as addresses. It does not matter where it is stored.
So, there is you. You are in a peer-to-peer network, which means everyone can participate anonymously, on every computer, wherever. So, the person can take that video, he can split up the video as four parts and then distribute these across the network. In a peer-to-peer network, all those peers are anonymous and untrusted. So in this peer-to-peer network with this unique address, you can gather all four parts. But, well… can we trust strangers on the internet? How do we establish trust in the first place now? This is where Merkle Trees come into play: Merkle trees allow data verification across peers. So, what is a Merkle tree anyway? A Merkle tree is in essence a binary tree, except of the values, there are hashes stored. Hashes of what? In the ground layer you have the leave nodes. Those are those data parts and the hashes of these data parts is each one leave node. Every two leave nodes share one parent node, which is basically those two hashes concatenated and then hashed again. Rinse and repeat and you get something like a tree structure.
To clarify that, let’s build one for ourselves! You have these four parts of the video, everything gets put through a hash function. Now you have those leave nodes, you take two leave nodes each, first you hash them, concatenate them, hash them again. That’s these purple dots for the parent nodes. You do the same for the third and the fourth part. Now you take those parent nodes and hash them again – there you get the root node. The cool thing about the root node is that it is everything hashed together. And as mentioned before, when you change the input even slightly, the output is completely different. So when you change one of the leave nodes, the root node changes. How do you compare those two Merkle trees if they are the same? Correct, you just take their root nodes and compare them, if they are identical, you can be sure that they’re absolutely the same. How do we use it to download files? You get that root node from a trusted party, let’s say Bob, your friend, sends you the root node. You trust Bob. Then you go into the network and ask who has files associated with that one root node. The network responds and you get all those data chunks. With all those data chunks, you can’t be sure if they’re really correct. It’s the internet, they’re all strangers, you can’t trust anyone. What you then do is to reconstruct the Merkle tree: take all these sets you got, hash them again, get your leave nodes and reconstruct the Merkle tree. As I said before, the only thing you need to compare is the root hash with your root hash. You take the root hash of Bob and compare it with the one you got. If they’re the same, you can be 100% sure, they’re exactly the files you needed to download. You can be happy now, cool, you got your four parts of the video, put them together, get your cat video. But you might ask: „But Nico, why the tree? Why?“ You could just take all the parts together and hash them. Well, you are correct, you could do that and it would work. Let’s say you have 1000 files and you only need 2 of them. Well you have to download all of them, to hash them in order to validate them to be correct. The thing is, if you only need 2 files out of 1000, that becomes a drag. Trees allow you do partial validation, which is awesome, because you only need your root node of your friend, the relevant parent nodes to reconstruct the root node, and your requested files. If you put all that in a tree you see that you only need the parent node, your root node and those two files you request to download. And as you see, you don’t need to download these other files to reconstruct the root hash to validate it in the end. Awesome!
Some examples of projects that use the Merkle tree are the Interplanetary File System, Ethereum – a cryptocurrency –, uTorrent, Bitcoin – everybody knows nowadays what Bitcoin is. In Bitcoin for example you have the blockchain and in each block there is a transaction stored, which are these Tx0 blocks. This is something where these Merkle trees are used for example in Bitcoin.
To summarize, you have content-based addressing, which enables you consistent links no matter where the content is stored and you also have efficient verification with the root hashes. You can make a trustless distribution of datasets across a network. These are the two essences of the decentralized web magic.