Canonized Post Processing

Canonized Post-Processing

Tl;dr - The StarkNet Operating System (OS) may benefit from long-range computations that are applied to prior blocks. This post explains the basic concept and gives a few examples.

Disclaimer: These are my personal thoughts, not something that StarkWare/Net has decided on doing. I.e., I’m suggesting some future improvement to the StarkNet ecosystem, as a member in this large and growing community.

Definitions

  • Chunk - the sequence of blocks, and transactions within, covered by a single StarkNet proof
  • Post-processing - any computation (expressed as a Cairo program) that takes as input a set of chunks of size greater than 1. Remarks:
    • One could generalise definition of post-processing to include a single chunk
    • Usually the set will be a sequence of consecutive chunks, but certain use cases may use randomly sampled chunks (or, say, every even chunk, etc.)
    • Notice that anyone may write a post-processing program and execute it, which leads to the next definition
  • Canonized post-processing (CPP) - a post-processing program that is known to the StarkNet OS, used, trusted and needed by it.

Examples of useful CPP, assuming 1 chunk covers, say, 1 hour (this time period is arbitrary)

  • Econometrics
    • Non-linear statistics, like median tx size, median fee, etc. (linear statistics, like average, can be computed without CPP, chunk by chunk), this could be used for the following
    • Fee Mechanism design - long-range statistics could feed into fee mechanism to enforce better estimation
  • Data availability
    • Daily state-diff, perhaps even using lossless compression
    • Daily volition/validium state diff, for users who are happy with daily use of Validium but at the end of each day wish to have data committed onchain

Canonization process

  • Anyone may write a post-processing program, denoted P
  • That person/entity submits P for “canonization”, which passes a process, TBD, that may include:
    • Canonization committee review
    • Minimal time on testnet
    • Define OS interaction - is the CPP something that must be used? Can be used? At what frequency?
    • Governance vote

Recursion, Economic Incentives

  • The output of a CPP needs to be proved. Assuming recursive STARKs, this proof can be submitted to some future block.
  • Proving a CPP is a BIG computation (because the input covers several chunks) and a service to the community. So asking the prover of this CPP to pay a fee for adding it to StarkNet doesn’t make sense. Rather, this particular prover should be incentivized from the fees (or other sources) flowing in StarkNet.

Why not add CPPs to L1 (say, Ethereum)?

  • Many platforms exist today that do offer post-processing services for Ethereum’s L1 (Infura, Alchemy, etherscan,… ). But none of these are canonized post-processing, as defined above, i.e., Ethereum (or Bitcoin, for that matter) do not accept them as part of their core protocol.
  • This is one area where validity proofs are different and better, for several reasons:
    • Validity proofs ensure that as long as the input is agreed (latest blocks), then the output is reported with integrity. If you don’t have validity proofs (and their verifiers) then things are harder.
    • An L1 could decide to add a consensus decision on CPPs, but this would require the participants in the consensus to naively re-execute the same CPP on the large sequence of blocks. While this is doable, it both complicates things in terms of security/consensus, and also comes at the cost of the ongoing computation needed to accept new blocks to the L1.
  • Since Validity Rollups rely on validity proofs in each and every step, adding CPPs and utilising them is much easier.
2 Likes

Love the open nature of the proposal — it’s great to see this coming from you as an individual actor rather than from Starkware as an entity!

Some questions below — both out of curiosity, and to help draw out more definition.

Design intent

I imagine that the kind of processing that CPPs would do could, in theory, be baked directly into the StarkNet OS and launched via the usual upgrade procedure. With that alternative in mind, is the design intended to:
(a) Make the protocol more directly modular/extensible, i.e., reduce the need for frequent StarkNet OS upgrades?
(b) Decentralize upgrades to the protocol, i.e., change who has power to make modifications?
(c) Other?

Canonization

Would new CPPs be canonized purely via on-chain tx, or would StarkNet client developers need to release new clients that support a new CPP? (The former sounds a bit like Tezos-style governance where the core protocol can be upgraded on-chain; the latter sounds a bit more like Ethereum-style governance)

Fees

Proving a CPP is a BIG computation (because the input covers several chunks) and a service to the community. So asking the prover of this CPP to pay a fee for adding it to StarkNet doesn’t make sense. Rather, this particular prover should be incentivized from the fees (or other sources) flowing in StarkNet.

If a CPP provides data availability on e.g. a daily basis, it might require a lot of L1 calldata space and thus cost a lot (prover fees aside). Because it’s a large expense, it might be important to charge each transactor roughly in proportion to their impact on the size of the diff — however, these impacts aren’t known at tx time, but rather only after all chunks have been processed. How might this quandary be handled? Would CPPs need a way to reserve “worst case” fees from each transactor at the time of their transaction, and later offer refunds?

Further examples

Are there other sorts of CPPs that come to mind, aside from non-linear statistics and data availability? Is it likely that we’d end up with a relatively small static set of CPPs, or are there reasons why a bunch of them might be added over time?

Thanks, @RoboTeddy , answers/comments below:

With that alternative in mind, is the design intended to:
(a) Make the protocol more directly modular/extensible, i.e., reduce the need for frequent StarkNet OS upgrades?
(b) Decentralize upgrades to the protocol, i.e., change who has power to make modifications?
(c) Other?

It’s (b), let me elaborate: making any change to the OS is a really big deal, a lot of implications. My suggestion allows any anon to suggest a PP that will become later a CPP, safely. First, the ecosystem will use outside of the OS, and later, after scrutiny and discussion, it’ll be easier to accept into the OS and have the OS rely on it.

Would new CPPs be canonized purely via on-chain tx, or would StarkNet client developers need to release new clients that support a new CPP?

A good CPP will be something whose output is written into the OS (say, median tx fee / size over past day). Then, some clients will decide to also display/use it. But that really would depend on the specific program executed

How might this quandary be handled? Would CPPs need a way to reserve “worst case” fees from each transactor at the time of their transaction, and later offer refunds?

This is another example of needing to price things now, that will be paid later. This issue also arises with standard StarkNet txs, that are paid for at sequencing time, but loaded onchain in bulk later as part of a proof. Requires more thought… One option is to have a CPP funding pool that funds this. Another point is that some CPPs won’t necessarily cost all that much (like posting median tx).

Sure, I think @bbrandtom had a cool example, I’ll let him share it :slight_smile:

1 Like

Thanks.
A few questions:

  1. Canonized PPs are always run? for all chunks?
    Are they uniform in the chunks they run on?
    That is, can we correlate the results of different PP programs, because they always relate to the same chunks?
  2. Not entirely clear on the data flow here - is the result of PP, run in the OS, available as input to other transactions? or other PPs?
    (also related to the previous point) - do you envision CPPs being composed?
  3. Can a CPP be removed after it’s canonized?

Part of canonization also has to answer this question, i.e., some CPPs may be needed to be run every day, while others might be out there for anyone to use if there’s a need. For example, maybe you don’t need to know the median tx size every day. I do think that a CPP should be deterministic, i.e., that no matter who runs it on a certain chunk, you’ll get the same answer.

I’d say the answer is yes. If the output isn’t consumed by any further app, then no point in having it canonized.

I suppose the answer should be Yes, through the same governance mechanism that got it in there in the first place.

I think we’ll also need to define whether runs can be correlated.
Especially if we want some sort of composition between PPs, or other applications.
I think they should, and therefore simplify the design for this (with a trade off on some loss of generality).


Won’t this create an issue of backward compatibility?
At least you’ll need to track dependencies on PPs, so when you remove a PP, you’ll know what breaks.

Another question:
What if there’s a bug in some CPP, say an unforeseen edge case.
Does this mean proofs for state update (runs of the OS) will fail, until this CPP is fixed/removed?