Reflections on the Decentralized Starknet protocol

Reflections on the Decentralized Starknet protocol

In this post we would like to share some of our thoughts at Nethermind’s Research group around the Starknet Decentralized protocol. For a well organized and systematic presentation of the protocol, we invite you to read Ilia Volokh’s series of posts.

1. Soft inclusion mechanisms for forced staking updates

Context: The L1 Staking Contract can enforce staking updates on L2 via L1 → L2 messages. These forced updates have a deadline (measured in L1 blocks, around 24h). In the current proposal, when a deadline is reached without L1 receiving a confirmation from L2 of the forced update, honest L2 nodes perform a staker reset (i.e., revert their state to the last accepted block on L1 and reconfigure their staking set to include the stale updates).

Staker resets can affect the latency of the protocol and should be used as a last resort mechanism. Instead of optimistically hoping that forced updates will be included in proposals by altruistic nodes, some soft measures can be taken before the deadlines are reached. We identified two measures that are not mutually exclusive:

A) Decreasing rewards for blocks containing forced updates. Proposers who used some of their block space for including forced updates should get rewards for this. We propose a decreasing reward scheme in order to avoid the scenario in which proposers avoid including force updates in order to increase the rewards for them. Decreasing rewards can lead to faster inclusion of the forced updates into blocks.

B) Engage the consensus mechanism for including forced updates. Proposed blocks should contain forced updates if available. If a new proposal doesn’t contain a forced update (which has a deadline in some epsilon amount of time), during consensus, (honest) nodes will not vote for this proposal. One has to take into account the time it takes for nodes to sync their views of L1, so this mechanism can be engaged after an amount of time in which we are sure that all nodes learned about a new forced transaction.

2. Pool of proofs

Context: For a block to include txs, it must include a proof for some previous block(s). A block must contain a proof for the latest non-empty block and all intermediary empty blocks in its strand. A proposer is expected to have such a proof to be able to propose a block. Proposers learn in advance when they have to prepare proofs. A proposer can acquire a proof by computing it or by delegation. The protocol offers rewards for blocks and for proofs. The Starknet protocol is using Tendermint as a consensus protocol. We anticipate that proposers for rounds 0 for any height will be highly incentivized to acquire such proofs, but not proposers for rounds strictly greater than 0.

Nodes learn about a proof only when a proposer makes a proposal containing this proof. We identified some drawbacks around this mechanism. For example:

  1. The proposer who obtained a certain proof can be offline when they are supposed to propose a block, so the computation resources for that proof are wasted.
  2. Being a proposer for round 1 may be a privileged position when it comes to rewards. Suppose in round 0, there is a proposal for a valid block b with proof p , 2f+1 nodes lock on this proposal, but not all nodes commit to it. In this case, the proposer in round 1 is lucky as they can make the same proposal, reusing the proof p and get the rewards.
  3. Similar with the above situation, (rational) nodes can steal the proofs exposed by previous proposers in order to get rewards. For example, at round r I receive a PROPOSAL message with block b and proof p. I can collude with others and decide not to lock on b. In the next round I am the proposer. As I don’t want to waste computing power to generate a proof, I propose again (b,p) to collect rewards. Note that such a scenario breaks the trust assumption of 2f+1 honest nodes of Tendermint.
  4. When the upper bound on empty blocks (in a strand) is reached, only a block containing a proof can be proposed. In this case, the consensus protocol will be stuck for a considerate amount of time on a certain height before consensus can be reached on a value, as proposers for rounds greater than 0 may not have had enough time to acquire the necessary proof for making a proposal. Moreover, leaders for rounds greater than 0 have no guarantee that they will get rewards if they acquire a proof, as previous leaders may have already exposed the proof before them.

We think that making a proof a “public good” as soon as it is computed is a healthy alternative to avoid the above issues. Proofs should be added to a common pool of proofs available to all nodes and the entity posting the proof should receive the reward. One disadvantage for having a pool of proofs is that it introduces a new component into the protocol.

One can start computing the proof of a block as soon as the block is committed to the L2 chain. Let us describe what the proof for block H looks like, depending on the type of block.

  • Block H is non-empty. The proof for H is the recursive proof of the followings:
    • the proof of the execution of the txs in H
    • the proof of the verification of the proof in H
  • Block H is empty. The proof for H is the recursive proof of the followings:
    • the proof that block H is empty
    • the proof of the previous block in its strand, i.e., block H-k

With the above mechanism for computing a proof, a proposer for block H+k must include the proof for block H, independently of whether block H is empty or not.

In terms of who is in charge of computing the proofs, we can think of several approaches. The simplest one is the current Proof Scheduling mechanism (i.e., the proposer for round 0 of height H is in charge for computing the proofs for the previous non-empty block H-ikwith i > 0, and all empty blocks H-jk, with i > j > 0, in the strand of block H); note that this approach doesn’t solve the above case 4. Alternatively, one can allow a general race or a priority race (in which leaders for all rounds for height H+k have priority on proving block H) for computing proofs, enabling a faster access to proofs.

Unprovable blocks. One big concern around the Starknet protocol is due to unprovable blocks. Consensus can be reached on unprovable blocks due to, for example, inconsistencies between execution VMs and provers (e.g., an execution VM used during consensus for validating a block may output that the block is valid due to a bug, but the prover cannot actually prove the block). In such cases, a recovery mechanism must be triggered. In the current design, there is no mechanism for signaling that a block is unprovable. However, the system can recover by employing the reset mechanism used for forced staking updates which can take more than 24h.

A pool of proofs can help dealing with unprovable blocks. The entity who is supposed to provide a proof for a block can submit instead a proof showing that the block is unprovable. For example, if a prover fails to prove the execution of the txs in a block, then the entity can run a prover in a zkVM building a proof showing that the prover failed (and consequently, that the block is unprovable). Even though this mechanism is extremely computational expensive, using it in extreme rare situations can improve the current mechanism for dealing with unprovable blocks.

Resets can also be avoided when identifying an unprovable block. When there is a proof P in the pool showing that block H (already contained in the chain) is unprovable, nodes can treat block H as an empty block (i.e., a block that doesn’t change the state) and use the proof P in the recursive proof that will be needed in the next non-empty block.

3. Decrease the rewards for empty blocks

Context: When they are missing the expected proof for making a proposal, proposers can propose empty blocks.

Even though having the option of proposing empty blocks helps in not slowing down the protocol while waiting for proofs to be computed, it raises other problems For example, malicious/lazy nodes can propose only empty blocks for censoring transactions or for damaging the quality of the chain.

The protocol should discourage nodes for proposing (only) empty blocks when not necessary. A solution around this would be to halve the rewards for consecutive empty blocks proposed by the same node. The first time a node proposes an empty block, the reward should not be halved.

Summary

With this post, we wanted to point out some problems the Decentralized StarkNet protocol can encounter. We made some suggestions for circumventing them which hopefully can improve the robustness of the protocol. We look forward to hear your feedback on these reflections!

This should be easily incorporated into the decentralized sequencer implementation. A flavor of this behavior is well-understood and used in production systems that I’m familiar with (in a different system and different context: the system is called CometBFT which also implements Tendermint, the context is proof-of-stake L1 where validators can vote - via a step in consensus called ‘ProcessProposal’ - on the validity of a block proposal depending on external/off-chain validation criteria).

Overall very informative proposal! I’m still parsing the rest of it but initial thoughts are very positive.

Hello! Thank you for the feedback, it shows a deep understanding of the protocol. I have a few comments below.
1.A: Note that these forced updates are sent as regular L1->L2 messages. As such they contain economic incentives for running their respective handlers. Moreover, since the chain will fork if not included in time, the stakers’ stake will be affected and thus they have an even greater incentive. Giving extra rewards (I guess you meant ‘increasing’, not ‘decreasing’) is something we’ll consider.
1.B: As explained above, we think there is clear incentive to add forced updates. And it may indeed be that towards the deadline of the update honest nodes will not sign on new blocks if they don’t contain the update. But I don’t think this should be part of the protocol. We will probably have a period where we allow registrations without enabling them to be forced to see if there are problems. Note that since we’re talking about staking, with a minimal amount that will be large, there shouldn’t be a lot of these.
2.1 Correct, this is similar to it being offline when proposing a block and so that slot’s time is wasted. Same here. The next slot will probably be an empty block and then everything resumes. We have considered approaches to force proposers of the next heights to prepare proofs for such cases but the problem is that proof generation is more expensive than creating a block and it is very likely that knowing that most of the time their proofs will not be used they will choose to forgo these slots which may mean no height will be valid (note that a proof takes longer to create than a few slots). We could imagine higher rewards for these heights, but these need to be much higher since the work is wasted, or paying rewards to proposers of the next height even when not required, which requires a mechanism for them to prove they indeed created a proof, all of these are more complicated and as mentioned can be gamefied as well. Again, would love to hear other suggestions.
2.2. Note that in this case it needs to be exactly the same block. So the rewards (that are a state inside the blocks) are the same and go to the proposer of round 0. Moreover, since this situation is rare (2f+1 prevotes arrived but not precommits, so a glitch in the network at exactly the right time), I don’t think we should create elaborate mechanisms around it.
2.3. As you mentioned, this required f+1 dishonest nodes (and also will work only when they propose after an honest one). So if this happens then a lot of other bad things can happen. And again, they don’t get the reward.
2.4. If the proof is exposed it can be replayed. A case of a lot of empty blocks means the original block is probably unprovable (e.g. due to a bug). In this case the only solution is to fork before that block. This is the reason we put the upper bound, to force this fork before the network progresses more. The fork can be L2 only (by social consensus) or by using the L1 reset mechanism (which is automated)

We don’t object to proofs being public good, but I can’t see how we force it (and as explained above, I don’t think the cases mentioned above pause a risk)>

I’m not sure what is different in the mechanism for generating proofs you described, but I want to highlight some things for readers:

  1. prooving a block naturally achieves recursion as executing a block means also verifying the proof it contains. As such, this is not a case of a tree of recursion, so simpler.
  2. when there’s an empty block we expect the following proof to be a “real” recursive proof for the empty block as well as the one before it in the strand (we now call these proofchains, like blockchain). Since an empty block is easy to prove, then creating such a recursive proof should be easy (don’t need to run machines in parallel or something like that).

We have considered protocols of switching strands when a height 0 is missed, but these created more complex protocols and we think that recovering from an empty block the way we currently suggest is simpler.

Building a proof to show the prover failed will be very expensive as the prover takes much longer to run than a block. And not sure how it helps when done in a pool as you need to reach consensus anyways. We can have a fork rule to say that above some empty blocks the network automatically forks.

You can’t treat a block as empty in retrospect as following nodes depend on it (its hash in the chain).

Indeed, rewards to empty blocks will be minimal to discourage lazy nodes.

Note we have some thoughts about “proof of validation” so that nodes don’t lazily sign on blocks without validating them. E.g. requiring them to show the resulting state of a random transaction in the block before getting their signing reward. This is still being worked on.

Thank you again for the post!

B) Engage the consensus mechanism for including forced updates

We need to be very careful about allowing the progress of time to invalidate a proposal. An example:

  1. Imagine proposal P has been locked by f+1 nodes.
  2. Forced update inclusion is determined based on the system time of the sequencer.
  3. There is now a network failure for long enough that P would now be considered invalid since the forced update is old enough.
  4. The system is now locked on an invalid proposal.

This is manageable (E.g. validation time used is (H-1).timestamp ) but a subtle issue to keep in mind.

There is an important assumption behind this point, that a leader cannot acquire a proof during the time that it is the leader. In the case of a node building a proof this should be true, I cannot begin a round as a proposer and prepare a proof within timeoutPropose. On the other hand if there is a proof market, it is possible that a node can “purchase” a proof within the time window.

2f+1 nodes lock on this proposal, but not all nodes commit to it.

Actually our plan was for the block builder to be a field in the block, and he gets the rewards for the block and proof. This matters because once f+1 nodes lock on a given proposal, that must be the value decided by Tendermint. The case I think you actually care about is when a valid proposal is sent in round R but less than f+1 nodes lock. In this case the proposer for R+1 has the freedom to send whatever proposal it wants; meaning it can extract the proof from Proposal(R) and set itself as the block builder.

Interesting point that round 1 is privileged. In general (without proofs) a group of f+1 nodes could feel incentivised to collude such that they get extra turns as the proposer. Is your claim that in our case, with proofs, this incentive to collude increases?

If the proof pool is part of a decentralized network, wouldn’t we still have the same problem?

  1. The proof pool is a contract on Starknet and the validators just pay attention to this TX instead of the round 0 proposal.
  2. The proof pool has a separate instance of consensus, but I think this would still just reduce back to the initial problem.

The issue is revealing the proof before obtaining a commitment from the network. I think we need to invert this and allow someone to obtain a commitment to reward from the network before they reveal the proof. If we are worried about validator collusion, then this commitment from the network can only be given out once to avoid malicious validators handing out a commitment to then block the submission and steal the proof. What I derive from these constraints:

  1. User U builds a proof of block H, PH.
  2. U then builds a proof that he has such a proof ready, PPH
  3. U submits PPH in block H+i
  4. U now holds a monopoly on submiting PH to the pool.

This sounds like a race to produce PPH, which favors centralization, as the strongest provers will always win. We explicitly wanted to try to decentralizing proving by getting each proposer to do it. Of course we may be able to tweak the design. An example that pops into mind: the leader for H+k has a monopoly on submitting PPH until H+k/2.

Of course, we also need to incentivize the proofs to be ready before H+k, yet still have the incentive to produce the proof if that fails.

I fear proof pools add a lot of complexity and we should be strongly convinced of the need.

The insight of trying to prove a block as unprovable is an interesting idea. We cannot, as Ittay said, treat the block as empty if this is the case. We would still have to perform an L2 fork, it would just benefit from lower latency. As you pointed out, we already have the forced update mechanism which prevents Starknet from dying in this case. I therefore see this as optimizing a disaster scenario (i.e. low priority).

Could we avoid this entire issue by moving consensus from agreeing on a StateDiff to agreeing on an ordered list of Transactions? The former depends on a valid execution of transactions which opens up the unprovability criteria. The later runs consensus without any assumption of a valid execution.