Other attributes
The first practical Byzantine Fault Tolerance (PBFT) algorithm was introduced in a 1999 academic paper by Miguel Castro and Barbara Liskov. The pBFT algorithm is robust in order to survive Byzantine faults in asynchronous environments and is optimized to have high processing speeds / low latency such that it can be used in everyday applications.
Prior to Castro and Liskov's work, existing BFT algorithms assumed that networks were synchronous and also typically had slow processing speeds, making them impractical for real world applications. With industry and government becoming increasingly reliant on internet-based services (i.e. asynchronous systems) in the late 1990's, malicious attacks and software errors could have potentially serious consequences.
The Castro-Liskov pBFT algorithm itself is a form of state machine replication, meaning that a system state is copied and distributed across multiple servers. The benefit of replicating and distributing a state machine is that it makes the system more robust, because failures such as Byzantine faults occur in isolation on one server rather than causing the entire system to fail. In other words, a system that is not replicated is only as fault tolerant as the single centralized server being used, whereas a single server failure in a replicated system won't cause the entire system to fail.
How the algorithm works can be summarized in four steps:
- A client sends a request to invoke a service operation to the primary replica.
- The primary replica sends the request to all of the involved backup replicas at the same time.
- The replicas execute the request and send a reply to the client
- The client waits for f + 1 replies from different replicas with the same result, where f is the maximum number of replicas that may be faulty in the system.
Although it was a big step forward in cryptography, the pBFT algorithm still has some shortcomings:
- pBFT is very communication heavy, meaning that the network has significant computational overhead during consensus and isn't scalable past a certain number of nodes.
- pBFT can be attacked by an adversary using a simple scheduling mechanism that can either delay or halt the consensus process.
- pBFT chooses a primary state machine replica in round-robin fashion, which requires nodes to agree on a "membership list" from which the primary can be chosen. This makes participation in a pBFT system permissioned rather than permissionless, meaning that it lacks a key property for decentralization.
Byzantine Fault Tolerance is a necessary property for decentralized systems such as blockchain networks. Blockchains are distributed ledgers which are safer than centrally stored ledgers because the networks that host them are robust enough to continue to operate even if one or more servers in the network are maliciously attacked or otherwise fail arbitrarily. However, BFT is typically only a property of the consensus mechanism used in the majority of blockchains. For example, Bitcoin's proof-of-work consensus mechanism is Byzantine fault tolerant and builds off of the pBFT algorithm, but it does not use the pBFT algorithm itself because of the shortcomings mentioned above.
A blockchain project called Hyperledger actually uses the practical byzantine fault tolerance algorithm as one of its primary consensus mechanisms. Unlike Bitcoin and most cryptocurrencies, Hyperledger is a private blockchain which doesn't prioritize being permissionless or scalable to a large quantity of nodes, which makes using the practical Byzantine Fault Tolerance algorithm, well... practical.
Another project, IoTeX, uses an advanced version of pBFT called Scalable Practical Byzantine Fault Tolerance with Short-Lived Signature Schemes, which was described in an academic paper published in CASCON in 2018. The IoTeX algorithm aims to achieve greater scalablility by reducing the amount of computational overhead required.