Upcoming Feature: Starknet Applicative Recursion

Allow us to introduce a very cool upcoming feature:applicative recursion! We’ll justify the name (sometimes abbreviated as SNAR) in a minute, but for now let’s recount the two key benefits:

  1. Decoupling of L2 block frequency from L1 costs,
  2. Reduced fixed L1 operating costs.

Time to dive in.

Present and future

Present

We start off in the now, with L1 operating costs of roughly 215K gas fixed per block:

  1. 23K gas per fact registration – part of Verify Proof and Register,
  2. 56K gas per register SHARP memory page,
  3. 136K gas per State Update,
    1. 50K gas for the KZG precompile to sample a blob,
    2. 86K gas for running the state update function.

So SHARP consumes 79K gas per job for registering each job’s data, with job = block. Then the Starknet operator consumes an additional 136K gas per block to bump the state root in the Starknet core contract. Note Starknet block DA must fit into a single blob.

Unfortunately, fixed per block means that L1 operating costs are linear in the amount of blocks – regardless of their mass (the resources they contain). Consequently, very frequent blocks would incur heavy daily operating costs.

Thus we find ourselves with an unpleasant trade-off between cost efficiency on one hand, and low block times on the other hand. Of course, sufficiently high demand brings frequent and massive blocks, but at current demand, many blocks are closed due to the 6 minute time limit without hitting any other resource limit.

You may also be thinking to yourself: can we batch several blocks into the fixed costs? Yes. Very yes.

Future

There’s a new recursive tree in town: a SNAR tree. The leaves are Starknet blocks, and the internal nodes are special (applicative) recursive proofs that we’ll explain later. Each tree has two configurable limits: amount of leaves and amount of DA bytes (in units of blobs). A tree is closed when one of its limits is reached. What does it mean for a tree to be closed? Very simple: the SNAR root proof is sent to SHARP as a job. Thus Starknet jobs are now roots of SNAR trees instead of Starknet blocks.

The L1 operating cost structure is similar:

  1. SHARP memory registration per-job: 23K gas,
  2. SHARP memory page per-job: 36K gas (reduced from 56K thanks to SHARP optimization),
  3. State update per-job equal to 86K gas + 50K gas × tree blob limit.

Thus, apart from the KZG precompile which remains at 50K gas per blob, the remaining fixed-per-job 145K gas is now amortized over all the blocks in a single SNAR tree.

Crucially, there is no per-block L1 operating cost above: we have decoupled blocks from L1 costs!

Nice picture of nice feature:

Cost efficiency

Fixed cost efficiency

Let’s compare the L1 operating cost of a blob’s worth of DA.

  • Right now, the cost is at least 215K gas per blob. The cost equals 215K if the blob is fully utilized by a single block, and increases if many blocks are required since each has its own fixed costs.
  • In the future, the cost is 145K / BPT + 50K gas, where BPT denotes blobs per SNAR tree.

Taking BPT = 3 gives a cost-per-blob of 98K gas: just 45% of the original operating cost. Taking BPT = 6 gives a cost-per-blob of 74K: just over ⅓ of the original cost!

DA cost efficiency

If blocks have many storage writes, then the expected number of multiple writes to the same storage cell increases. Since DA consists only of state diffs, there is a cancellation effect which increases with block size. Hence it’s more efficient to send DA for larger blocks. Right?

Nah. Applicative recursion squashes the state diffs of all included blocks at the root of the tree, so the end result is a state diff that equals the squashed diffs of all the leaves. In other words, the end state diff depends only on the transactions in the leaves, and not on their grouping into blocks. As for the “how” – that’s the applicative part: an applicative recursive proof proves the verification of its two child proofs, and then the execution of a “diff squash” program that does what its name suggests. The squash program also does some book keeping: if its leaves B₁, B₂, B₃, B₄ bring a sequence of state transitions S₁→S₂→S₃→S₄→S₅ then the root proof attests to the validity of S₁→S₅.

Summary

To conclude, applicative recursion unlocks frequent blocks and optimizes Starknet gas consumption. But some key questions remain:

  1. What should be the value of blobs per tree? Ethereum’s blob 1559 targets 3 blobs per block. More than 3 blobs per SNAR tree means a single state update transaction consumes more than three blocks whence the subsequent Ethereum block will face a higher blob price. Hence by sending such transactions too frequently we are shooting ourselves in the foot by raising our own prices. There are many interesting strategies here, ranging from never exceeding 3 blobs per tree, to spacing out occasional uses of 6.
  2. What block times should we aim for? The natural reply may “low as you can go”, but there’s a trade-off: each proof of a block must prove execution of SNOS, which is a significant overhead. How much is too much?

Thoughts? Opinions? Memes? Post below!

Thanks for this great write up on applicative recursion @ilia! I had a few questions

  1. Will applicative recursion involve changes in the SNOS itself or will this be some wrapper logic on top of the current SNOS?
  2. Why can’t we have multiple blobs per state update today? Is there something specific which SNARs enable? Or given the current block limits it just doesn’t happen?
  3. You mentioned reducing block times will have the overhead of running the SNOS for every block. However, if my blocks are majorly empty, is that a lot of overhead? Or, if we modify the SNOS to do multiple blocks at one go (just thinking loud here), does that reduce the overhead? I am trying to understand what’s the “overhead” mainly, running the VM multiple times or running a lot of txs in the SNOS.
  4. Are there any timelines for when we can expect this on testnet/mainnet :slight_smile:

Thanks in advance!

Hey @apoorvsadana great questions as expected!

  1. Wrapper logic. There will be an additional “squash” program which is the “applicative” part of the applicative recursion.
  2. It’s because of the implementation of the update state function in the core contract. The SNAR trees are managed in SHARP and the state update function is not aware of what goes on in the backend.
  3. Running SNOS itself incurs an overhead: part is “fixed” and part is “marginal” i.e dependent on the block contents. In the extreme case of 1 tx per block, the overhead will likely be very wasteful: most of what you’re proving will be OS steps as opposed to transactions. That would make us sad. As for SNOS being able to run on multiple blocks, that is certainly a future direction, but it’s not in the roadmap at the moment.
  4. Sooner than you think! Stay tuned :slight_smile:

Happy to continue discussing!

Thanks for the elaborate answers @ilia!

Hey Illia, I have a few questions:

  1. Is each block still tied to a single blob?
  2. I know this feature has been pushed to mainnet but I’m wondering why the logic wasn’t to just wait until each block hit DA limits; it seems like this would avoid the overhead of SNOS that the current design is subject to. Do any problems arise if transactions are executed in (say 1s) batches, the results are streamed to other nodes and the final block is just one megablock that is bound by DA? I don’t see any practical reasons why this would be an issue as it would significantly save costs and fully decouple blocks from the L1.
  3. Please correct me if I’m wrong but it seems like there is still some coupling due to the number of leaves limit.

Yo @fikunmi_ap!

  1. No!
  2. This was the situation before applicative recursion! Blocks closed on a lifespan of 6 minutes or due to reaching some max DA, but updates about the pending block were propagated sooner. This left two things to be desired: reduction in confirmation times and reduction in block times. Momentarily disregarding the latter, we could have drastically reduced confirmation times by having more frequent updates to pending, e.g 1 sec. However, that would have required changes in the sequencer that are a work in progress. That being said, block times are also important, and applicative recursion solves this more difficult problem.
  3. There is indeed a weak coupling due to the config width of the SNAR tree, but in practice it’s not really there. SNAR trees are closed when we hit some limit of k blobs; currently k is bound by the max number of blobs allowed in an L1 block, and is usually configured below the target.

(btw it’s ilia - just one ‘L’ :slight_smile:)

Ok @ilia.
It’s all much clearer now.