This is post III, which continues post I (introduction) and post II (leader elections) by diving into the open questions and trade-offs surrounding the consensus protocol for Starknet. The discussion is organized into four sections:
- Dual ledgers for safety and liveness
- Slashing equivocators
- Responsiveness
- Streamlining the critical path to consensus
We have tried to make the tone as informal as possible, citing technical references for the interested reader.
The next post will dive into thoughts and questions revolving around proofs in the protocol.
Dual ledgers for safety and liveness
In this section we outline the dual-ledger design for consensus protocols used by Ethereum and then discuss its applicability to Starknet.
Partial synchrony and CAP
Recall (from post I) the partially synchronous network model which provides a convenient middle ground between synchrony and asynchrony by handling network faults of unknown but bounded duration. For instance, this sensible model captures the idea of power outages of finite duration. Unfortunately it also raises a seemingly insurmountable obstacle known as the CAP theorem: during asynchrony, no protocol can satisfy both safety and liveness. Asynchrony can cause network partitions, which provide a simple illustration of the underlying idea. Suppose the network is entirely honest and partitioned into parts A,B. If the protocol is live then A,B progress independently, contradicting safety. If the protocol is safe then A,B cannot progress independently, contradicting liveness.
Dual ledgers: circumventing CAP with checkpoints
As remarked above, the CAP theorem precludes a single consensus protocol from being both safe and live under asynchrony. This obstruction can be circumvented by cleverly coupling a live protocol with a safe protocol. Write \Pi_\text{safe},\Pi_\text{live} for safe and live protocols respectively. The output of \Pi_\text{live} is called the live ledger. The idea is to create a feedback loop:
- First, the live ledger is fed as the input to \Pi_\text{safe} . The output of the composition is a checkpointed live ledger.
- Next, the checkpointed live ledger is fed back into \Pi_\text{live} . This ensures the checkpointed live ledger is always a prefix of the live ledger.
- Repeat
During asynchrony, the safe protocol \Pi_\text{safe} stalls. In this case, the safe prefix of the live ledger lags behind while the live ledger itself continues advancing. For more information, see e.g this paper and this paper.
Ethereum has been using such a dual-ledger design for its consensus protocol since the introduction of Casper – a finality gadget in the form of a safe consensus protocol (essentially chained Tendermint).
Checkpoints for Starknet
In this subsection we will examine checkpoints in the context of a dual-ledger consensus protocol for Starknet. Two things to keep in mind:
- The consensus protocol will be purely PoS.
- Stake will be managed by an L1 contract.
L1 checkpoints
Under the partially synchronous model, designing the L2 consensus protocol of a validity rollup inevitably faces the same CAP limitations. The dual ledger design outlined above delivers the best of both worlds. But what about L1?
By design, security of validity rollups eventually maxes out at L1 security (upon L1 state updates). To approach this level of confidence at small latency it is natural to consider L1 checkpoints, sent via Ethereum transaction. We see two main trade-offs involved:
- Cost vs confidence – interaction with L1 is expensive; on the other hand, L1 is safe and visible by assumption. Furthermore, L1 can enforce logic of all kinds, e.g slashing rules.
- Cost vs latency – low latency requires increased frequency, which translates to increased costs.
To mitigate the cost factor, L1 checkpoints can be restricted to commitments certified by a majority of stake. In this case they will not provide any data availability about the state of Starknet, but only evidence of L2 consensus (via quorum certificate). For latency, we are fine with several minutes.
Here are some more questions of interest regarding L1 checkpoints.
-
What is verified upon submission of L1 checkpoints? This is a cost/confidence trade-off, since L1 verification of the quorum certificate consumes L1 gas.
-
To what extent can there be conflicting checkpoints? At one extreme is a single checkpoint chain. At the other extreme lies a checkpoint tree. Many intermediate alternatives are possible.
-
Checkpoint durability? Should newer and/or larger quorum certificates be allowed to override existing ones or are there rigid expiration dates?
Digression: checkpoints for fast finality
Starknet L1 state updates must follow the L2 consensus. Hence it is natural to couple proofs with checkpoints. Moreover, the planned latency of Starknet L1 finality is in the realm of several hours, and checkpoints can be used to design fast finality mechanisms. Our thoughts on such mechanisms are outlined separately in an upcoming post which defines some desirable properties and suggests ways of attaining them.
Slashing equivocators
We have previously decided the StarkNet consensus protocol will be leader-based with PoS Sybil resistance. This makes it vulnerable to the “nothing at stake” problem in the form of equivocation. Specifically, a leader can equivocate by producing conflicting blocks during its turn at negligible costs (compared to PoW). If a dual-ledger design is used, checkpoint equivocation is also possible.
Ethereum defends against equivocation with slashing rules. However, slashing instances must be part of the ledger in order to take effect, which means they can be censored by a minority of ⅓ of stake. By maintaining the slashing logic on L1, Starknet will be able to defend against equivocation even by a malicious majority of stake. Note Starknet’s L1 state updates act as unforkable checkpoints, restricting the scope of backward simulation attacks of suddenly presenting alternative histories even without any slashing.
Responsiveness
In our leader-based setting it would be nice for a leader to drive consensus at the pace of actual network delay δ as opposed to maximal network delay Δ, since this would decrease latency and increase throughput. Roughly speaking, responsiveness means that during synchrony, transaction confirmation time depends only on actual network delay δ and not any apriori upper-bound Δ. For example, the Bitcoin protocol is non-responsive because its block difficulty is an explicit function of a pre-specified upper-bound Δ. Consequently, transaction confirmation time suffers from the looseness in the estimate Δ.
Pass and Shi distill a formal definition of responsiveness and also distill an unfortunate trade-off between performance and security: responsive secure consensus protocols can only tolerate <â…“ corruptions, even during synchrony. The argument resembles the proof of CAP and we outline it for three nodes A,B,M with M malicious. Essentially, M partitions the network by presenting contradicting world views to A,B. Specifically:
- M causes communication A↔B to have maximal latency Δ while maintaining communications A↔M and B↔M at a network delay δ≪Δ.
- M behaves honestly in each of the communications A↔M and B↔M, but sends conflicting information to A,B.
- Consequently A thinks it and M are online and B is offline, and therefore perceives communication A↔M as an honest execution of the protocol at ⅓ failures. Similarly with A,B switched.
- Assuming the protocol is responsive at â…“ failures, A,B perform conflicting updates to their ledger, breaking consistency.
In light of the above argument, the best we can hope for is optimistic responsiveness: transaction confirmation time depends only on the current maximum latency causable by malicious agents. Perhaps the first partially synchronous protocol which is both safe and optimistically responsive is a leader-based BFT protocol called HotStuff. As shown in the paper, the responsiveness property achieved in this leader-based context means honest leaders can drive the consensus forward at actual network delay δ. A refinement of HotStuff is used in production by Aptos. However, since HotStuff is not live during asynchrony, it cannot serve as \Pi_\text{live} in a dual-ledger design. We are not aware of any battle-tested PoS protocols that are optimistically responsive and live under asynchrony. Here’s a nice technical reference.
We suspect responsiveness properties will play an important part in achieving lower latency and higher throughput. Motivated by this suspicion we are thinking of an optimistically responsive protocol suitable to produce Starknet’s live ledger.
Streamlining the critical path to consensus
The critical path to consensus is the sequence of operations performed after the previous execution of the consensus protocol to complete the current execution. For example, the leader’s critical path may include block-building and computing a commitment to the state obtained by executing the block. Every “obstacle” in the critical path to consensus reduces throughput. What can be taken out? Are the benefits worthwhile for Starknet?
State commitments
Every Ethereum block contains a commitment to the state following its execution. Hence the calculation of state commitments is in the critical path to consensus: it must be completed before the block is propagated. An obvious way to streamline the critical path for Starknet is to exclude any state commitments from the block body. A less extreme approach involves delayed computation, e.g the body of block n+k must contain a commitment to the state following execution of block n. In this case, assuming some predictability, leader n+k may compute the commitment in advance or at least in parallel to building its own block. The price of the latter optimization is increased latency for non-executing nodes who wish to verify storage proofs with respect to the state at the tip of the chain, since they must now wait k blocks.
DAGs: decoupling consensus from block proposal
“Monolithic” leader-based BFT consensus protocols like Tendermint and HotStuff both place block building and place block propagation in the critical path to consensus: the leader propagates its block during its turn. The key idea behind DAG-based protocols is to abstract the block proposal layer from the consensus layer and give it a life of its own. Consensus then becomes an overlay to “merely” order proposed blocks. Apart from streamlining the critical path, this decoupling further benefits throughput by enabling many agents to simultaneously propose blocks for ordering. The DAG (directed acyclic graph) vertices are blocks, while edges are causality relations: a block may refer to several blocks, each reference signifying chronological precedence.
In the PoS setting, Narwhal is a DAG-based PoS mempool protocol serving as a block proposal layer. Bullshark is an elegant partially synchronous PoS consensus overlay for Narwhal. Both are used in production by Sui.
In the PoW setting, DAG KNIGHT is a beautiful DAG-based generalization of Nakamoto consensus used in production by Kaspa. Among several lucid insights, the paper observes the essential independence of the block proposal layer from any upper-bound on network delay Δ. DAG KNIGHT makes full use of this property of the DAG to achieve optimistic responsiveness.
Wen DAG-based Starknet?
Summary
Having described some of our thoughts regarding the consensus protocol for Starknet, we encourage the community to participate in the discussion, offer thoughts and feedback, and help design the Starknet decentralized protocol.
In upcoming posts we will share concrete proposals for a decentralized protocol that addresses some of the open questions presented in the series of posts.