Starknet Decentralized Protocol V - Checkpoints for Fast Finality

This series of posts started with an introduction, followed by a concrete candidate for leader elections, a discussion of consensus, and finally our thoughts about proofs in the protocol. This post focuses on using checkpoints (in the context of dual-ledger design) for fast finality. In this context we also recommend the prescient Ethereum research post on checkpoints using recursive proofs.

Motivation

Starknet’s current architecture can be summarized as follows:

  1. The sequencer constructs a blockchain by periodically creating Starknet blocks from user transactions.

  2. The prover periodically computes proofs of state transitions according to execution of many Starknet blocks with respect to Starknet’s previous L1 state.

  3. Each L1 state update occurs in two steps, where the first is followed almost immediately by the second.

  4. A proof is sent to an L1 verifier contract.

  5. An L1 state update transaction is sent to Starknet’s core contract. It executes successfully if the L1 verifier has verified a suitable proof and the proof’s input state coincides with the previous L1 state.

There is usually a latency of several hours between creation and propagation of a Starknet block on L2 by the sequencer, and the block’s L1 finalization (inclusion in an L1 state update).

While the sequencer and prover are operated by the same entity, users can trust the operator’s reputation that every Starknet block published on L2 will eventually be proven and updated on L1. In other words, users trust the operator not to fork. Assuming such trust, the latency between L2 propagation and L1 finality is significant only for L1/L2 operations e.g withdrawals.

An obvious step toward decentralization would involve replacing the single sequencer with a Sybil resistant consensus protocol. In such a scenario, we assume confidence in L2 consensus to be much smaller than confidence in L1 security, which is Starknet’s eventual security. Without trust, latency between L2 consensus and L1 finality is significant to user experience because it marks the latency of L1 security and L1/L2 interaction. This justifies a fast finality gadget which has a latency of several minutes at most and also inspires greater user confidence than the L2 consensus by itself.

In the consensus post, the section on dual-ledger designs mentions a suspiciously similar finality gadget – checkpoints of the live ledger. In this post we expand our attention to eventual L1 state updates, and argue that checkpoints can be coupled with them to inspire even more user confidence. We henceforth assume checkpoints include a block-hash, a commitment to the state obtained by its execution, and a quorum certificate. The certificate certifies the information as indeed having been output by the consensus protocol.

Goal and target audience

Throughout this post, the goal of the checkpoint protocol is to quickly (!) make L1 state updates predictable: a few minutes after a transaction reaches L2 consensus, users should be confident of eventual finalization.

The target audience consists of all L2 nodes who maintain the state at the tip of the chain. This includes not only sequencers and other executing nodes, but also stateful non-executing nodes. Note each L2 node is assumed to have an L1 endpoint to track the L1 state of Starknet.

The primary example of non-executing nodes are Starknet full-nodes, characterized by actively participating in the L2 network alongside maintaining the state. However, actively participating in the L2 network is not necessary: it suffices to merely know the state (possibly alongside the relevant blocks).

State without execution?

To trustlessly learn the state obtained by executing a list of transactions, a node must somehow verify the execution. In Ethereum, full nodes verify by executing. In Starknet, execution can be circumvented by verifying proofs.

It will also be desirable to give non-executing nodes access to the state at the tip of the chain at a level of L2 security, i.e before any proof verification. To this end, Starknet sequencers will propagate each block’s state-diffs (state changes following execution).

Confidence-inspiring properties

As explained above, our goal is to provide confidence of eventual finalization within a few minutes after L2 consensus. In other words, checkpoints should quickly (!) address the question of “what will be finalized?” and leave only the question of “when” (latency of L1 state updates).

The question of “what” has two parts: what can be finalized and what will be finalized. Below we formulate possible properties of checkpoints which inspire confidence in the answers to these questions. Subsequent sections discuss subtleties and trade-offs.

Four properties that address the “what can” question are:

  • Necessity – L1 state update proofs start and end at checkpoints, and go through all intermediate checkpoints.
  • Quorum – checkpoints are supported by a majority of stake.
  • Visibility – the L2 network cannot hide checkpoints.
  • Linearity – conflicting checkpoints cannot simultaneously exist.

Two properties that address the “what will” question are:

  • Slashing – unfinalized checkpoints eventually trigger staker slashing.
  • Opening – checkpoint finalization eventually becomes permissionless: anyone with the required proofs can finalize.

Quorum

The main trade-off we see is between cost and confidence, embodied by the following approaches.

  • Forced quorum – the quorum certificate is checked upon submission.
  • Incentivized quorum – the quorum certificate is not checked but invalid checkpoints can be retroactively deleted.

Incentivized quorums are cheaper in the happy flow. However, they do not preclude fraudulent checkpoints. On the other hand, forced quorums are costlier but preclude fraudulent attempts at quorum certificates.

Slashing

On one hand, slashing is a simple quantifiable economic incentive. On the other hand we see two drawbacks, both stemming from inability to identify failures as malicious. The first drawback is mostly resolved with the “opening” property.

  1. A 100% malicious prover layer can veto checkpoints from being proven and consequently trigger slashing, independently of the remaining stakers.

  2. We cannot distinguish between honest provers who don’t know the state or the transactions to prove from malicious provers who veto. This suggests assuming provers are a subset of sequencers.

  3. Assuming provers are a subset of sequencers, they may constitute a small minority. In this case, a minority stake can compromise liveness.

  4. Potentially, bugs will prevent checkpoints from being proven. In this case, honest sequencers may be slashed. Such possibilities may deter people from staking. In theory a governance mechanism could flag for bugs, but it would be nice to avoid this flow when possible.

Opening

The opening potentially inspires two levels of confidence, depending on assumptions.

  • Hopeful
    • Confidence: assuming others have sufficient information and incentives, only bugs will prevent a state update.
    • Protocol requirements: someone with incentives has sufficient information.
  • Self-reliant
    • Confidence: assuming I have sufficient information and incentives, only bugs will prevent a state update.
    • Protocol requirements: I have sufficient information.

In order for all execution-free stateful nodes to achieve self-reliance on checkpoints, the protocol must supply them with sufficient information to independently perform state updates. By assumption, execution is ruled out. Hence, knowing the ledger (the transactions themselves) is insufficient. This information void can be filled by frequent proofs using recursion.

For example suppose the current L1 state is S₀ and it is followed by four L2 blocks B₁,B₂,B₃,B₄. Suppose there is a checkpoint on state S₄, the result of sequentially executing these blocks with respect to S₀. For self-reliance it suffices to verify a proof of each block’s execution: if the need arises, the proofs can be merged recursively into a proof for the state transition S₀→S₄.

The burden (on the protocol) of providing such proofs must be analyzed. Nevertheless, a Starknet protocol in which stateful execution-free nodes are self-reliant with respect to checkpoints is the best we have come up with.

Self-reliance: cost vs latency

As discussed above, frequent dissemination of mini-proofs throughout the L2 network achieves self-reliance with respect to checkpoints. It is only natural to utilize mini-proofs for L1 state updates. Specifically, it is natural to recursively aggregate mini-proofs until submitting a recursive proof to L1 becomes profitable. However, there is tension between economic efficiency of using recursion for mini-proofs and their latency: to decrease mini-proof latency we must decrease their complexity, which in turn increases verification complexity. Since recursion amounts to proving verification, it becomes wasteful if verification complexity exceeds the complexity of the original mini-proof job.

Slashing vs opening

“Slashing” is correlated only to the stake while “opening” reflects actual user value, which may exceed stake value. For instance, perhaps stake value is 100M while a user values the success of his transaction at 500M. In that sense, “opening” provides a stronger guarantee.

On the other hand, slashing increases confidence in all cases since it occurs in the unhappy flow as well. In contrast, “self-reliance” only increases confidence in the happy flow where mini-proofs are propagated.

L1 checkpoints

The “visibility” property strongly suggests posting checkpoints on L1. This is further amplified by the “linearity” property, since L1 logic can be used to enforce it. Appending quorum certificates to L1 involves additional costs. On the other hand, it narrows the possible reasons for stalled L1 state updates to just two: a bug or a malicious majority of L2 stake. The “slashing” property is also greatly strengthened through L1 logic by holding even a malicious majority of L2 stake accountable. Lastly, the “self-reliance” variant of the “opening” property motivates the frequent propagation of mini-proofs, in which case “visibility” is useful to avoid useless computation, i.e generating proofs for checkpoints that won’t be finalized. Hence we think L1 checkpoints are an obvious design choice for fast finality.

Summary

This post explored ways of using checkpoints in the context of dual-ledger designs to quickly make eventual L1 state updates predictable and consequently provide fast finality to users. While not strictly necessary, we think a fast finality mechanism is desirable from the user perspective, especially if the sequencing and proving layers are not highly decentralized. We welcome your feedback and hope for suggestions on how to achieve fast finality on Starknet!

37 Likes

I don’t understand what you mean by “what can” and “what will”. Are the “what can” points additional properties that we may or may not enforce on checkpoints, whereas the “what will” properties will definitely be enforced on checkpoints?

Q1. If we are using a slot-based mechanism, why do we need a quorum certificate? Wouldn’t all blocks have already reached consensus?

Q2. Couldn’t “non-executing” blocks follow the tip of the chain /checkpoint blocks using the state-diffs? Assuming your trust the majority of Sequencers / consensus, shouldn’t this be enough?