StarkNet Decentralization - Tendermint based suggestion

Motivation

The primary motivation of this suggestion is to let StarkNet have fast finality, which is economically backed by a lot of stake. We also try to leverage Tendermint - a well-established BFT protocol deployed on numerous blockchains.

On the downside, due to the quadratic complexity of Tendermint, this suggestion limits (in practice) the sequencers amount to a few dozens. However, since sequencers can’t include invalid transactions as everything is proven to Ethereum, there is appealing to have a more performant algorithm with only a few active validators each time, compared to similar potential designs for L1s.

Stake management

Since Tendermint includes slashing, all the stake has to be managed on L2.

Sequencing

Proposers (i.e., sequencers) will be chosen according to their stake. The randomness source and algorithm for selecting the specific sequencer for each slot are out of scope.

Each proposer will be chosen for a concise period (around 2-10 seconds) and is expected to behave as described by the Tendermint protocol. It is recommended to read the Tendermint whitepaper if you are not familiar with it. If you need only a refreshment, here is the Tendermint flow in one draw:


To align honest and rational behaviors, we add the following slashing rules to Tendermint (similar to what’s the Cosmos SDK is doing):

  • Preventing double voting - sequencers that are signing on more than one block in the same round will be slashed
  • Preventing idle sequencers: the last round in which each sequencer signed will become part of the StarkNet state. Sequencers that are idle for too long will get slashed
  • Preventing inconsistency generated from not respecting “Tendermint locks”

Once a block of StarkNet is committed, it is final and cannot be reverted unless >⅓ of the stake of the system is slashed.

Proving

To couple incentives between the sequencer and prover in the best way possible, we want to create a protocol in which any sequencer might get requested to become a prover. Such property has several advantages:

  • It saves us the need to split the fees between sequencer and prover. Participants earn from being sequencers, and being a prover can be viewed as a “community service”.
  • It makes sure that sequencers would never blindly sign/not-verify - since there is a real chance the protocol will request them to prove it.

However, such design also has several notable disadvantages:

  • It means that every sequencer needs to be able to run a prover. This might, de-facto, reduce decentralization.
  • To “keep the element of surprise,” the protocol’s latency and amount of proved blocks each time would be suboptimal.

To be concrete:

  • L1 keeps track of the round number of the last block it expects to prove and the current round number on StarkNet (initially, they are both zero).
  • After at least MAX_ROUNDS have been managed on StarkNet, fresh randomness on L1 determines:
    • How many rounds would be proven between (1,…, MAX_ROUNDS)? We note this by R.
    • Who (among the quorum signed in the R-th round) would be the prover?

Crucially, observe that the sequencers don’t know when they participate in the Tendermint what block is the last block to be proven next, and how the prover will be shuffled from this block’s signers. This means that it’s impossible to be a sequencer without being a prover.

The consensus transactions (signatures of the signing nodes, proof that the weight for each block is sufficient, slashing transactions) are being proved alongside the “applicative transactions” as part of the proof. Also, a consensus on the block that comes after the last-to-be-proven block is proved as part of the proof. This guarantees that no sequencer “messes” with the signatures on the last block.

In addition, the identity of the eligible prover (i.e., the sampling algorithm given the signatures on the height-X block) is also part of the proof.

The selected prover has a limited time to generate a proof - T_1 hours after the previous proof has been submitted. If the selected prover fails to create proof (either because it is malicious or due to honest power failure, etc.), anyone can generate proof for T_1 additional hours. Shall this happen, the protocol slashes the original prover, and rewards the proof submitter.

As a rule of thumb, it is safe to assume that a proof would arrive at the second window, as such event would cause all the sequencers to lose some stake. A detailed treatment of this edge case is discussed below.

Note:

  1. There is an edge case hiding here, as not all Tendermint “Rounds” ends with a consensus on a new block - and if there wasn’t agreement on a round there is nothing to prove. Therefore, to make the proven quota well-defined, the last block included in the proof is actually the first block with a round number greater than R.
  2. Unlike the previous suggestion there is no “commitment” mechanism. Instead, L1 assumes a certain L2_blocks/L1_blocks rate, and consensus on the new state is attested in the proof itself.

When a proof doesn’t arrive - Handling inactivity in the network

Inactivity happens when L1 expects a new proof to be submitted, yet no such proof arrives. On L2, inactivity might occur due to the two possible scenarios:

  • A large portion of the stakes went offline at once, leaving the L2 state unable to progress. (Notice it must be at once. The slashing rules mitigate a gradual disappearance of stakers)
  • The stakers agreed on an invalid block that cannot be proven

We need to develop a mechanism that will allow the state to advance (and will reduce the stake of dishonest parties while not hurting the stake of honest parties).

This is challenging, as L1 is unaware of the reason behind the delay and has no information on who should be punished.

The suggestion is to allow this to happen through:

  1. Allowing state updates to occur even when less than ⅔ of the stake agree on a valid state. These unique “state updates” need to happen only when L1 “gave up” on continuing with ⅔ of the stake, and the margin for how much stake we need to progress has to advance slowly.
  2. Slash stakers that didn’t participate in the new state update. This is done so the stake of the sequencers that didn’t participate in the chain will decline fast, and we will return fast to the stable state.
  3. Slash, a bit, even the active stakers, to incentivize them to submit a proof as fast as they can (and not, for example, censor other major staker and wait for the next time) window).
24 Likes

Adding BFT-based sequencers is a great idea.

One small point:

On the downside, due to the quadratic complexity of Tendermint, this suggestion limits (in practice) the sequencers amount to a few dozens. However, since sequencers can’t include invalid transactions as everything is proven to Ethereum, there is appealing to have a more performant algorithm with only a few active validators each time, compared to similar potential designs for L1s.

Tendermint communication complexity is O(nlogn), not quadratic, and it’s been demonstrated that a Tendermint chain can have >100 validators (e.g. see the Cosmos Hub).

Deciding on the right way to do the sequencer <> prover interaction is also something of interest, especially given that in a BFT protocol there’s no real “leader” who commits the block. Maybe the prover is always the proposer for that round?

13 Likes

Thanks on the complexity correction!!

Regarding the Sequencer<>Prover, our line of thinking is that the prover would prove a sequence of blocks. WDYM by “commits the block?”

15 Likes

So my perspective on how to design these systems comes from less of “how does one decentralize a rollup sequencer” and more of a question of “ how does one improve the assumptions of the IBC client bridge model with modular data availability assumptions and STARK validity proofs”….

The current widely adopted Tendermint light client in IBC uses BFT assumptions of f + 1 honest validator voting power to ensure data availability and correctness of the state transition function. 2f+1 voting power provides finality/atomic broadcast.

Starknet would like to modularize this architecture in two ways. It would like to bridge interoperate into and out of a Cairo virtual machine that provides succinct validity proofs through Starks so that the bridge does not have rely on BFT assumptions for correctness.

Starknet would also like all transactions relevant to the bridge to be published on counterparty Ethereum chain thus relying on the Ethereum chain for data availability.

One thing to note is the forthcoming ABCI++ API in Tendermint 0.36 will allow modification of Tendermint consensus rules to require agreement on external oracle events this will make it possible for Tendermint application builders to for instance require agreement on the state of an external data availability provider before a new block is produced.

Another thing I observe is that efficient correctness/ fraud proofs of Cosmos staking, reward distribution, slashing and governance modules are very challenging because they touch a lot of state during normal state transition function. It’s very appealing to fall back to standard BFT assumptions for the correctness of these functions.

So all this analysis leads to a fairly clear architecture. There is a Cosmos SDK chain which contains a Starknet staking token. The Starknet staking token is bridged using standard BFT assumptions to the outside world. There would be a modified gravity bridge that informs enables transfers of the Starknet token to and from Ethereum and informs the Cairo bridge of the current stake weight on the Cosmos chain of validators. If you are willing have all the economic punishment on the Cosmos chain side then the Cairo bridge only needs to know about the current set of validator public keys that can propose a new Starknet block and details of Tendermint leader election can be omitted.

There is a Starknet specific module that will need to be written that handles agreement on Cairo related state within the Cosmos state. There is a bridge into the Cairo VM from Ethereum that provides Cairo execution with L2 security guarantees for everything in the Cairo VM.

It seems like this architecture would also enable adopting Cosmos ecosystem’s forthcoming MEV mitigation systems like an encrypted mempool could be incorporated into Starknet.

I tend to think of this architecture as a less ambitious intermediary step towards what Celestia is building.

16 Likes

The Celestia repo has a high level proposed alternative to Cosmos reward distribution module that is efficiently fraud provable.

Fraud provable staking alternative

13 Likes

Combining the prover and proposer is a clean and elegant solution. I wonder whether it does not incentivize “lazy censorship.” Namely, a proposer can propose a block that is later easy to prove to preserve the prover’s computational resources.

12 Likes

@gakonst How is the O(nlogn) achieved? I was reading/thinking about the Tendermint protocol and AFAIU it seems to have O(n^2) complexity in a “normal scenario”, and O(n^3) in the worse case

Thanks a lot :slight_smile:

13 Likes

Ok, this is what I found out. It may not be accurate, and it is possible that Tendermint does not use precisely the following method, but I am guessing that it uses something similar:

The complexity reduction is achieved by finessing the method used to broadcast messages. Instead of each node broadcasting its message to all other nodes, the following can be done:

  • Each node signs and sends its message to the block proposer
  • The block proposer sends all the signed messages to each other node
  • Now each other node has received the messages of all other nodes, together with their signatures. If the signatures are valid, the node accepts the messages.

This makes the communication complexity O(n), but note that the size of the messages sent is O(n^2)

14 Likes