Data availability with EIP4844

Data availability with EIP4844

Introduction - the data availability problem

The gist of the data availability problem is to convince a third party that some data has been made available to a given network of nodes without requiring it to naively verify this claim by requesting to download the entire data. This problem has been extensively discussed in the context of blockchains. You can find a more detailed overview in [1], [2], [3].

Data availability has a key role in the operation of rollups (both validity rollups and optimistic rollups). We are trying to avoid the following bad scenario: a malicious L2 sequencer (or a coalition of such) creates a block on L2 that eventually updates the state of the rollup contract on L1, but avoids publishing this block to any of their peers on L2. For validity rollups, this block must be valid (otherwise, it would have been rejected by the on-chain verifier), but now we are stuck with a state commitment on L1 where no one outside the malicious sequencers coalition is aware of the state behind it, making it impossible to create new blocks.

To avoid this scenario, we must make sure that the L1 contract, which in the case of Starknet is known as the Starknet Core contract (this contract holds the latest state commitment and can only update it after a proof has been verified on L1), does not accept a state update unless the associated block data was published. Today, Starknet sends the state-diff data to L1 as calldata, which accounts for more than 90% of the average transaction fee. With EIP4844, Ethereum introduces a new market for data, giving L2s like Starknet the opportunity to publish their data via another mechanism that will potentially be much cheaper, resulting in lower fees for end users.

The current mechanism

The Starknet core contract only allows a call to updateState to update the commitment from A to B if a corresponding fact was written to a registry contract. The fact that the core contract is looking for is roughly: “the Starknet OS ran successfully, starting from commitment A and reaching commitment B after executing a sequence of valid transactions”.

The statement being proven is actually a bit more complicated. We don’t only care that the OS ran successfully, but that it ran successfully with its memory containing some specific values that we expect. More formally, part of the proven statement is that the OS memory m:\mathcal{A}\rightarrow \mathbb{F} extends some m^*:\mathcal{A^*}\rightarrow\mathbb{F}, where \mathcal{A}^*\subseteq\mathcal{A} is the set of address whose contents we want to attest to in the proof. m^* allows us to express claims about the behavior of the OS execution. For example, if we want to claim that the initial commitment is A and the final commitment is B, it’s enough to know to what addresses the OS is expected to write these values; let’s call them a and b, and add m^*(a)=A and m^*(b)=B to the proven statement. For more information on how the public memory mechanism works, see the Cairo paper.

The (address, value) pairs in m^* are divided into pages, and each page is sent to the MemoryPageFactRegistry contract. The values in a page are then hashed, and all the page hashes are eventually hashed together to contribute to the final fact that represents the statement we’re proving.

The final step is connecting the values sent to the MemoryPageFactRegistry contract to a Starknet block state-diff, that is, to understand what m^* tells us about what happened in the block. To answer this, one needs to see exactly the values placed in the output segment by the Starknet OS and ensure that these values uniquely encode the state-diff constructed during the OS execution. This requires a deep review of the inner workings of the Starknet OS, but the interested reader can find the relevant code in output.cairo.

Using blobs - the commitment equivalence trick

With EIP4844, we’d prefer to send Starknet block’s state-diff as blobs rather than as calldata. This means that we must come up with a different mechanism than the public-memory one described in the previous section. The challenge we face is binding the blob data to the actual state-diff constructed during the OS execution.

Luckily, the simple commitment equivalence trick proposed by Vitalik in 2020 comes to our aid. The idea is to take both the KZG commitment C available on Ethereum when sending the blob transaction (via the new BLOBHASH opcode), and a different commitment C' which will be computed by the Starknet OS, and prove that C and C' are commitments to the same data. This idea is illustrated in the following diagram drawn by a 5-year old:


Image 1: commitment equivalence testing. On the one hand, we have the blob sent to L1 alongside the KZG commitment, and on the other hand, the Starknet OS which computed a different commitment to the block’s state diffs.

In general, checking commitment equivalence is no easy feat, and without knowing more about the structure of C and C' there isn’t much we can do. However, if both C and C' are polynomial commitments, then we can do it by checking that C and C' agree on the evaluation at a random point. To choose the random point, we can use the Fiat-Shamir heuristic and open both commitments at x=h(C, C') where h is some cryptographic hash function.

The KZG commitment C can be opened on Ethereum with the new point evaluation precompile. In order to use the commitment equivalence trick, the commitment computed by the Starknet OS, C', should satisfy the following properties:

  • STARK friendly - proving the computation of C' should not be too expensive. This rules out KZG since elliptic curve operations on the non-native curve are extremely expensive.
  • Easy to open on Ethereum - suppose C' is a commitment to the polynomial p, then in order to compare the opening of C to the opening of C', we need to have a gas-efficient verification that p(x_0)=y_0. This can always be done with a separate STARK proof attesting to this fact, but this is a complication we’d rather avoid.

One option for choosing C' is a FRI-based polynomial commitment, as described in [4]. In this suggestion, proving that C' commits to a polynomial p that evaluates to y_0 at x_0 (or put more simply, opening C' at x_0), involves engaging in the FRI protocol to prove that the quotient polynomial \frac{p-y_0}{x-x_0} is of low degree. This still leaves us with the question of what hash to use for constructing the FRI commitment on the Starknet OS side, as this same hash needs to be applied on Ethereum for the opening proof. Now, we’ve reduced the commitment equivalence problem to the infamous problem of finding a hash that is both zk-friendly and Ethereum-friendly. In the next section, we describe the simpler solution which we plan to incorporate in Starknet v0.13.1, which avoids these issues altogether.

Avoid computing fancy commitments

As we mentioned in the previous section, obtaining a polynomial commitment C' in the Starknet OS is not trivial. However, it turns out that we don’t really need C' to be structured. Suppose that the state diff we want to commit to is d_0,...,d_{n-1}\in\mathbb{F} , with the associated state diff polynomial defined as P_\text{diff}=\sum\limits_{i=0}^{n-1}d_ix^i. Instead of attempting to commit to P_\text{diff} in a sophisticated way, we can directly evaluate it at the “random” point x_0=h(C,C') where h is a STARK-friendly cryptographic hash and C' is a completely unstructured (but obviously binding) STARK-friendly commitment, for example a Poseidon chain hash of the d_i's.

Note that to compute x_0, we need C, the KZG commitment that will be accessible on the Ethereum blob transaction that carries the state diff. We already said that computing it in the OS is prohibitively expensive. The key point here is that the sequencer already knows C after it has finished producing the block. Once the block is closed and its associated state-diff is known, the sequencer can compute the KZG commitment ahead of time and give it as input to the prover. Thus, the Starknet OS can simply “get” C as input (or witness, if you’re more convenient with this terminology) without actually caring about
its source and whether or not it truly represents the KZG commitment of some data blob.

Given C, after the OS is done verifying the block, it can output the triplet \left(C, x_0, P_\text{diff}(x_0)\right). The randomness of x_0 follows from the fact that the OS itself computes it by hashing C' (which is a chain-hash of d_0,...,d_{n-1} and is thus bound to the state-diff) and C together. Since nothing was said about C up to this point, the important remaining step is, once the blob transaction carrying d_0,...,d_{n-1} is sent to Ethereum, to verify that the KZG commitment to the blob was indeed the same C that was used by the Starknet OS.

We summarize the resulting protocol in the next section, here we proceed to handle some inaccuracies in the previous paragraph. These clarifications are not essential to the understanding of the flow described so far, feel free to skip to the next section.

  • The OS outputs the evaluation of P_\text{diff}, this means that the blob we’re going to send to Ethereum will have to match the same polynomial. To satisfy this, the blob can’t include the state diff elements d_0,..., d_{n-1} directly, but rather P_\text{diff}(\omega_1), ..., P_\text{diff}(\omega_n), where \omega_i,...,\omega_{4096} are the first roots of unity of order 4096 over the BLS12-381 field. For more information, see blob_to_kzg_commitment in Ethereum’s consensus specs. We could have avoided that and define P_\text{diff} differently, but this would have required computing the polynomial interpolation of the d_i's in the OS, which adds to the proof complexity.
  • When we say that the OS outputs \left(C, x_0, P_\text{diff}(x_0)\right), technically, we mean that they’re part of the public memory segment. Instead of adding all the state-diff to m^* in the proven statement (see our overview of the current data availability mechanism), we’re only adding these three words. \left(C, x_0, P_\text{diff}(x_0)\right) will be sent as calldata to Ethereum using our pre-4844 data availability mechanism.

How everything fits together

We are now ready to describe the flow step by step:


  • The Starknet sequencer closes block B with state diff d_0,...,d_{n-1} (n\le 4096)

  • The sequencer computes the KZG commitment to P_\text{diff}(x)=\sum\limits_{i=0}^{n-1}d_ix^i, and gives it as an input to the Starknet OS along with all the necessary information to verify the execution of B

  • The Starknet OS is executed and outputs \left(C, x_0, P_\text{diff}(x_0)\right), where:

    • C is a witness given to the OS, when ran by the honest sequencer, it is the KZG commitment of P_\text{diff}

    • x_0=h(C,C') where C' is a chain-hash of d_0,...,d_{n-1}

  • A STARK proof attesting to the fact that the OS ran successfully with the output \left(C, x_0, P_\text{diff}(x_0),...\right) is sent to the on-chain verifier, and a corresponding fact is registered on the fact registry contract

  • A state update transaction is sent to the Starknet Core contract. This is a blob transaction, carrying the blob \widetilde{d_0},...,\widetilde{d_{n-1}} where \widetilde{d_i}=P_\text{diff}(\omega_i), where \omega_i,...,\omega_{4096} are the first roots of unity of order 4096 over the BLS12-381 field. The calldata in this transaction includes:

    • x_0, supposedly h(C,C') where C' is a chain-hash of d_0,...,d_{n-1}
    • y_0, supposedly P_\text{diff}(x_0)
    • C, supposedly the KZG commitment which was used by the OS to generate x_0
    • \pi, a proof for the opening of C at x_0, encoded in the format described in the EIP4844 specs
  • The core contract makes the following assertions:

    • A fact attesting to the OS execution with output C, x_0, y_0 was indeed registered on the fact registry contract

    • kzg_to_versioned_hash(C) equals BLOBHASH on blob index 0 (for more information on kzg_to_version_hash see the EIP4844-specs)

    • Calls the point evaluation precompile on \text{BLOBHASH} \; | \; x_0 \; | \; y_0 \; | \; C \; | \; \pi


It remains to explain why the above checks made by the core contract guarantee that a state update will only be accepted on L1 if the corresponding state diff data was published. The first assertion guarantees that C was indeed used in determining the evaluation point x_0, and that the value of P_\text{diff} at x_0 is indeed y_0 (here we assume the correct behavior of the Starknet OS, the correctness of the on-chain STARK verifier, and of course the soundness of the STARK protocol itself). The second assertion guarantees that C is not simply a sequence of random bits, but the KZG commitment to a blob that was just sent to L1. Let’s mark the blob by B and its corresponding state diff polynomial by P_B.

It remains to show that P_\text{diff} and P_B are the same polynomials, as this will convince us that the state diff constructed by the OS is indeed the polynomial represented by the blob that we sent. We already know that P_\text{diff} is of degree at most 4096 since this is verified in the Starknet OS. Hence it suffices to check if they agree on a random point. This will guarantee with probability \ge 1-\frac{d}{|G_r|}, where d is the degree of the polynomials and G_r is a prime order subgroup of the BLS12_381 curve (r is a 255 bit integer, you can read more about the pairing used for the opening proof here), that P_\text{diff} and P_B are identical.

The final assertion in the core contract verifies that P_B(x_0)=y_0 (we already know that P_\text{diff}(x_0)=y_0 since the OS ran successfully and outputted x_0,y_0). Recall that the OS constructed x_0 by hashing C,C' together, hence it is uniquely (up to hash collisions) determined by P_B and P_\text{diff} (as P_\text{diff} is bound to C' which is a chain hash of d_0,...d_{n-1}). This means that the first assertion guarantees that x_0 behaves like a random point, on which we saw that P_\text{diff} and P_B agree. Thus, P_\text{diff} and P_B agree on a random point, which concludes the proof.


Where is the data?

This section will be updated with the relevant contract addresses once Starknet mainnet is upgraded to v0.13.1. In the meantime, you can check out some blobs that were sent to the Sepolia testnet in this repository. There you will find a decoding script to read Starknet’s state diffs according to the format described in the docs, in addition to the inverse FFT required to translate the blob to the state diff encoding.

References

[1] What Is Data Availability?
[2] What is the data availability layer?
[3] Data Availability Sampling: From Basics to Open Problems
[4] Transparent Polynomial Commitment Scheme With Polylogarithmic Communication Complexity