Merkle trees (also known as hash trees) are a means of efficiently and securely managing/storing large amounts of data. The idea was created and patented by Ralph C. Merkle in 1979, with the patent expiring in 2002.
In a merkle tree, every non-leaf node is labelled with the hash of the labels of its child nodes (or the hash of the value of a leaf at the bottom of the tree), a process which repeats itself moving up the tree until a single hash remains: the merkle root.
In the figure above, the leaves are the 8 boxes at the bottom which have no children branching out beneath of them. A cryptographic hash function processes each leaf value to produce a hash output. This is then paired with another hash and run through the cryptographic hash function again to produce yet another hash, and so on moving up the tree, until only the merkle root (a[3,0]) remains. If a value of a leaf or non-leaf hash changes, the change impacts the hashes of every parent node in its branch all the way up to the top of tree, making it immediately obvious that the change has occurred simply by looking at the merkle root.
Merkle trees are useful in cryptography for several reasons, some of which include:
- Providing a means of data integrity/validity.
- Condensing large amounts of data so that less memory/disk space is needed to store it and it's easier / faster to transmit across a network.
- Proofs are computationally easy and fast.
In the Bitcoin whitepaper, Satoshi Nakamoto describes how merkle trees are used in Bitcoin in order to compact transaction data so that it's more feasible to store the entire Bitcoin transaction history on a relatively small amount of disk space.
Every signed Bitcoin transaction is hashed and has an output tied to the transaction identifier (TXID). Outputs are categorized as either Unspent Transaction Outputs (UTXOs) or spent transaction outputs. This makes it possible to quickly and easily validate future transactions, as validators just need to check that only UTXOs are used as inputs, implying that there isn't an attempted double-spend.
All of this transaction information is part of a merkle tree, and the root hash of that merkle tree is contained in the blockchain. Each block references the hash of the previous block, all the way back to the genesis block of the blockchain. If somebody attempts to change something about the data from any single transaction from the past, it will cause the root hash of that block to change, which subsequently changes the hashes of all of the blocks that followed it. This is a way to ensure the integrity of the blockchain's transaction history without needing to store the complete data of every transaction.