A few months ago we shared a preliminary version of our thoughts about the decentralization of StarkNet. Since then we have absorbed feedback, consulted with experts, and expanded our thoughts to previously “blackboxed” areas such as leader election. The time is ripe for another round of feedback.
Our thought process about StarkNet’s decentralized protocol consists of three parts, to be presented in a series of three posts.
- Overview and tight requirements that reflect the tradeoffs in question.
- A concrete suggestion for leader elections.
- Our thoughts surrounding the remainder of the protocol, namely consensus and proofs.
We highly recommend starting by reading this post for an outline of the architecture, and the problems specific to StarkNet.
The (ongoing) planning for StarkNet’s decentralization has been challenging and we have received valuable help. Thanks to Bartek Kiepuszewski and Joachim Neu for illuminating discussions on consensus and high level architecture.
In a nutshell, StarkNet’s decentralized scheme will use PoS Sybil resistance and feature two sets of actors: block-proposers (often called sequencers) and provers who produce proofs of many blocks at a time. At a high level, the protocol answers three questions about each set:
- Who are the members?
- How are tasks assigned to members?
- What actions comprise each task?
In this section we will discuss the first two questions.
The fundamental obstruction to consensus is honest forking – when honest agents diverge from an agreed-upon state. Protocols widely vary in their treatment of honest forks. For example, bitcoin allows them and is content with arguing they are resolved with high probability as time passes. On the other hand, Tendermint prevents any honest forks while operating within its security assumptions. A sound consensus protocol must resolve each possibility of honest forking, and providing such an analysis is challenging but necessary.
The root of honest forks is conflicting information among agents. Such conflicts are caused either by a faulty network or by malicious agents. Some consensus protocols reduce conflict possibilities by limiting the set of agents who can propose blocks at each time slot. Specifically, they assign a leader to each slot in a publicly verifiable way, allowing honest agents to disregard block proposals by non-leaders.
In some protocols, honest agents can disagree on the leader for a given slot, resulting in honest forks. This happens in bitcoin: honest agents A,B may receive conflicting children of the same block from distinct authors, and accept them if the PoW is valid. On the other hand, Tendermint requires consensus on the upcoming leadership schedule. The distinction of whether the protocol involves consensus on future leaders is worthwhile, because a global leadership schedule simplifies protocol analysis: conflicting block proposals in a given slot are caused either by a faulty network or by a malicious leader.
For the above reason we have opted for a leader-based consensus protocol with global scheduling (the network agrees in advance). Our prioritized requirements are detailed below, following the requirements from the consensus protocol.
Having decided on PoS, StarkNet’s block proposers will be stakers, and their tasks for each slot will be assigned by the leader election protocol.
Remark. Recent leader-based BFT protocols (Bullshark and Fin) use DAGs to increase throughput by decoupling leadership from data propagation. Specifically, they allow all stakers to propose batches of transactions which are then ordered by the leader to form blocks (composed of batches). The symmetry of these protocols with respect to batchers also makes the DAG live during asynchrony and inactivity. We are considering some of these ideas for the StarkNet protocol. In this setting, one may envision a decoupling of “batchers” and “sequencers”, in analogy to the decoupling of “sequencers” and “provers”. This would introduce another set of actors into the protocol. In our view, the gap in resources required for batching and sequencing is relatively small and does not warrant explicit separation of these roles in the protocol. Thus, a DAG-based StarkNet will still consist of just “sequencers” and “provers”, and does not warrant a separate high-level treatment.
The StarkNet L1 state update mechanism requires a STARK proof (of a sequence of L2 blocks) verified by the L1 cairo-verifier contract. Such proofs will likely involve thousands, or even hundreds of thousands of transactions. Consequently, the provers who compute them will consume considerable resources.
While the use of leaders is motivated by fork mitigation, we see no analogous motivation for a particular procedure to choose provers that we consider uncontested. Instead, we have a multitude of trade-offs and open questions. We briefly mention those directly related to the aforementioned high-level questions.
- Relation between the sets of sequencers and provers – do we assume an inclusion in some direction, or maybe even equality?
- Assigning provers – do we use a variant of leader elections or another mechanism?
- Division of proofs among the provers – does the protocol associate a single proving task to each state update, or does it explicitly assign composite recursive proving tasks?
We are thinking hard about this part of the protocol. A detailed account of the various open questions and trade-offs will be given in the third post of the series.
To tackle the decentralization problem we went for a divide-and-conquer approach by tentatively dividing the protocol into three black boxes, and investigating each separately. These components are:
- Leader-based consensus protocol.
- Leader election protocol.
- Proof protocol.
The above decomposition was very helpful to us, but there is inevitable coupling. For example, a proof of consensus obviously depends on the consensus protocol, and changes drastically between longest chain and BFT fork choice rules.
This series of posts will start from these not-so-black boxes and gradually dive into their dependencies. We consider requirements under assumptions about the ambient network, which are detailed in the first appendix.
We began the process by trying to distill some important properties. Those pertaining to safety and liveness warrant a brief introduction, given in the second appendix. For emphasis, let us record the key point: we hope to use the assumption of L1 as an always available public board to circumvent the classical impossibility results on standalone consensus protocols.
As a rollup, our primary goal is to reduce user costs by scaling Ethereum.
Inexpensive – the consensus overhead should be sufficiently small for StarkNet to facilitate a 100x decrease in user L1 gas consumption.
StarkNet’s consensus will be strongly permissionless: participation will not require L2 consensus. Note this property is not satisfied e.g by Ethereum: new validators are recorded in the ledger, which requires consensus; hence a minority of validators can censor new participation by preventing a supermajority.
Strongly permissionless – participation requires only tokens and not L2 consensus.
There is little point in a permissionless protocol that can’t handle participation loads.
Sufficiently scalable – the protocol works well with 50 simultaneously active sequencers.
L1 state updates will require sequencer consensus. Hence, proofs of consensus must be verifiable on L1.
L1-verifiable – an L1 full node can verify proofs of StarkNet consensus.
L1 state updates must have the security of L1.
Final L1 state updates – L1 StarkNet state updates are final (safe under network partitions and inactivity) modulo L1 reorgs.
The protocol should ensure accountability even of a malicious_majority_ of stake.
Strongly accountable – safety and liveness violations are punishable by slashing, for any fraction of the participants (including malicious majority of stake).
Furthermore, a one-time malicious majority must not damage StarkNet forever. Instead, the protocol should spring to life soon after the malicious majority vanishes.
Springy – the protocol springs to life soon after a malicious majority vanishes.
Users should receive quick indications of transaction inclusion.
Fast indication of inclusion – indication of tx inclusion in an L2 block appears within 10 secs of inclusion.
The first practical barrier to act on txs is their validity. For mid-high-value txs one waits for some economic backing, but for low-value txs validity can be decoupled from finality. Some dual-ledger finality gadgets involve a process of sanitization that prevents inclusion of conflicting txs in the finalized ledger. In this case, validity of a tx is undetermined even upon inclusion in a block. Moreover, despite consensus on a block, sanitization may remove a particular tx from it in order to avoid conflicts with previous txs in the finalized ledger. We want users to know their txs cannot be omitted from a block after inclusion.
Atomic consensus on blocks – consensus on a block implies consensus on all constituent transactions.
For mid-value txs, we’re content with accountable economic backing of L2 without waiting too long.
Fast accountable L2 finality – accountable finality of a tx is discernible within a few minutes from tx inclusion.
There is not much use for liveness during network partitions due to obvious safety hazards. However, liveness under inactivity is still nice to have, e.g in the case of 40% inactive stake. For an L1 protocol, honest inactivity is indistinguishable from a network partition and therefore conflicts with safety. In our case, safety comes from L1, so it will be nice to maintain L2 liveness under dynamic participation.
L2-adaptive – the L2 protocol is live under dynamic participation.
Lastly, it will be nice to use tried-and-proven implementations.
Battle-tested – parts of the StarkNet protocol are based on implementations by systems operating for at least one year with at least 100M$ TVL.
Note these requirements are much more concise than those presented a few months back. Even more notably, they are not conflicting!
In order to maintain a pure PoS Sybil resistance, the leader election protocol must not be biased toward strong machines, but rather determined by stake and randomness only. Note we definitely want to incentivize sequencers to have strong machines (for reduced latency), and this will be achieved by a fee mechanism that encourages the inclusion of more txs per block.
Fair – uniform distribution determined only by stake; material resources offer no benefit.
Participants should be able to enforce consensus in real time. This applies in particular to correct leadership.
Locally verifiable – an L1+L2 full node can verify all leadership claims in real time.
In the same spirit, the StarkNet protocol must eventually assure mere observers (non-executors) that it has been followed.
Efficiently provable – correct block authorship (at each slot, the author of the block is the designated leader) is efficiently STARK provable.
A good decentralized consensus protocol is resilient to faulty/malicious behavior. In such a protocol, no single entity can have a dominant role for extended periods. In particular, leadership must be brief. On one hand, this is not a property of the black box labeled “leader elections” but rather a constraint on its use for consensus. On the other hand, the leader election protocol must facilitate sufficiently short slots/block times.
Brief leadership – nobody leads continuously for too long (i.e more than 10-15 secs).
StarkNet’s proofs enable an L1 node to trustlessly synchronize on the last proven StarkNet state (we disregard data availability). This streamlines the bootstrap procedure to become an L2 full node by eliminating the need for historical data. However, verifying consensus on a certain block may require historical knowledge of past stake distribution or actions. Longer dependence means a cumbersome bootstrap procedure due to a longer synchronization period. We’d like to mitigate this burden for consensus in general and for leader elections in particular.
Memoryless – any dependency of elections on historical data (L1 and L2) goes back at most several hours.
We have discarded some admittedly wonderful properties because they increase the likelihood of honest forks.** For us, fast finality is more important.** Still, we think they warrant the elaboration below.
In leader based consensus protocols, the role of a leader includes at least a choice of order on some set of data (usually a set of transactions), and often even a choice of the data itself (when the leader constructs the block it proposes). This choice has value and can be sold; it is often perceived as undesirable manipulation over consensus and we’d like to minimize its influence. Furthermore, this choice becomes more valuable if the leadership schedule is known in advance, because people can plan profitable manipulations in advance. Perhaps such creative schemes are best thwarted at the protocol level?
If information in a distributed system is available for only a short period, then different agents are likely to have conflicting information. This has advantages too. Start with a reasonable sounding property.
Globally unpredictable – can only calculate a future leader shortly before it acts.
A stronger local version is interesting.
Locally unpredictable – cannot predict future personal results.
An even stronger version, in case you really don’t like people scheming.
Secret – others’ results can never be calculated; the election outcome is decided only upon all candidates revealing their result.
In StarkNet, an L1 state transition requires an L1-verified proof to obtain L1-security. Since proofs are such a crucial component of the protocol, any decentralization requirement applies to the proof protocol. To this end we want a permissionless protocol. While a consensus protocol requires Sybil resistance for DDoS protection, a proof protocol has no analogue because proofs can not conflict with each other. Although PoS is tempting for strong accountability (as in the consensus protocol), it is not assumed. Instead we formulate a weaker version of the requirement, without specifying some resources underlying prover elections.
Strongly permissionless – participation does not require L2 consensus.
L1 state-updates must require sequencer consensus, so the proof protocol must in particular enforce proofs-of-consensus.
Proof of consensus – A proof submitted without proving sequencer consensus is rejected.
A permissionless protocol does little for decentralization if the most powerful machines monopolize proving.
Computationally inclusive – the protocol enables and incentivizes participation of machines of variable computational power.
High-value txs warrant the full economic backing of L1, for which the users may wait. This waiting period must be reasonable.
Hourly L1 finality – latency between a proof task and its completion is incentivized to be at most a few hours.
While STARK proofs cannot be forged, malicious inactivity is still possible (recall the dynamic participation model allows for honest inactivity). The protocol should be resilient against such inactivity attacks and/or disincentivize them. For the former:
Adaptive (dynamically available) – liveness under L2 network partitions and inactivity. If the unknown subset of active participants fulfills a specified honesty threshold, then the protocol can advance.
In this introductory post we have outlined some thoughts and requirements for a decentralized StarkNet. The next post will dive into these thoughts in more detail, focusing on open questions pertaining to consensus and the behavior of provers in the protocol. The last post will describe a concrete suggestion for a leader election protocol.
We work under the partially synchronous model of communication and take interest in dynamic participation. The definitions are recorded below for completeness.
The partially synchronous model of communication provides a convenient middle ground between synchrony and asynchrony. It does this by handling network faults of unknown but bounded duration.
Partial synchrony – the protocol admits a known delay bound ∆ and an unknown global stabilization time (GST) which satisfy the following conditions (for every protocol execution):
- Asynchronous phase – a message sent before GST is received by the recipient by GST+∆.
- Synchronous phase – a message sent after GST is received by the recipient within the delay bound ∆.
Classical BFT protocols involving signature collection require constant participation. Hence it is often assumed that honest agents are always active. In such settings, the only cause of apparent inactivity by an honest agent is actual asynchrony. This assumption may be unrealistic for large scale protocols. Longest chain protocols need not require constant participation; indeed it is possible for an honest bitcoin miner to sleep for some time and then resume activity. We don’t want honest participants to vanish for extended periods, but prefer to incentivize their activity instead of assuming it.
Dynamic participation – the protocol admits an unknown global awake time (GAT), beyond which every honest agent is awake.
We consider the underlying L1 network as a graph whose vertices are the L1 nodes and whose edges signify communication between nodes. It is a fundamental insight that from the perspective of a self-contained protocol, inactivity is indistinguishable from asynchrony.
Every L2 full node is in particular an L1 full node, because it needs to track stake (maintained on L1) and L1 state updates. Hence we consider the L2 network as an “overlayer” given by a subgraph of the underlying L1 network. This model is meant to emphasize that even during L2 network faults, L2 nodes may still admit L1 connections. Assuming L1 connectivity and liveness, an L2 full node can distinguish between asynchrony and L2 inactivity by observing L1. Specifically, observed L1 inactivity signifies “true” inactivity while observed L1 activity signifies an L2 network partition.
The two fundamental properties of consensus protocols in the context of blockchain are safety and liveness. Safety is a reformulation of the classical “agreement” axiom of BFT theory. In what follows, we informally define these properties and their specializations to communication and participation models.
Throughout, we assume a known “universe” of participants is specified via public key infrastructure (PKI). Lastly, the honesty threshold of a set of agents is the fraction of honest agents out of all the agents.
Safety – If the set of active participants fulfills a specified honesty threshold, then all active honest nodes agree (i.e there are no honest forks).
Liveness – If the set of active participants fulfills a specified honesty threshold, then the protocol can advance, i.e a malicious minority cannot freeze it indefinitely.
During synchrony and under static participation (every node is active), these are necessary properties for every worthwhile consensus protocol. However, their fulfillment during asynchrony and/or dynamic participation is much more subtle.
Finality – safety under L2 network partitions and inactivity. If the unknown subset of active participants fulfills a specified honesty threshold, then all active honest nodes agree during both synchrony and asynchrony.
Adaptivity (dynamic availability) – liveness under L2 network partitions and inactivity. If the unknown subset of active participants fulfills a specified honesty threshold, then the protocol can advance, i.e a malicious minority cannot freeze it indefinitely. (In our setting, this implies the protocol remains live if some of the stake becomes idle.)
For an L1, inactivity is indistinguishable from asynchrony. Consequently, the CAP theorem implies that adaptivity and finality are mutually exclusive properties. Recent literature on consensus protocols sidesteps this problem with user-level solutions, for instance using dual ledgers with a finality gadget as in PoS Ethereum. By introducing assumptions about the underlying L1, we hope to either resolve the availability/finality dilemma for L2 consensus at the protocol level, or to improve on existing user-dependent solutions.
Safety is crucial, but a protocol in which a safety violation is inconsequential to the violators is vulnerable. The following enhancement of the safety property incentivizes honest behavior.
Accountable safety – for every honest fork, the active honest part of the network contains evidence proving a large subset of nodes acted maliciously. (The punishment mechanism is not a part of the property, and can be social or protocol-level.)
The above argument shows that adaptivity and accountable finality are mutually exclusive properties for an L1. However, we want to employ L1 to get a strong form of accountability which applies even for a malicious majority of the stake.