Optimistic parallelization revived

Optimistic parallelization revived

TL;DR

  • Optimistic parallelization is now being implemented on the Starknet sequencer in Rust
  • Moar tps in Starknet 0.13.2

Introduction

Back in Starknet 0.10.2, we introduced optimistic parallelization to the sequencer as part of a series of efforts to improve the system’s throughput.

In parallel, we worked to migrate the larger parts of the codebase to Rust. The result of this effort was the blockifier, Starknet’s new execution engine, which in turn relies on the Rust implementation of the cairo-vm by Lambdaclass.

Given that the transition from Python to Rust outweighed most of the potentiatal performance improvements, the first iterations of the blockifier did not contain optimistic parallelization. After a few Starknet versions that focused on stability and fee reduction, we return to parallelization in order to reach the full potential of the Starknet sequencer.

What is optimistic parallelization?

Naively, executing a block of transactions in parallel is impossible as different transactions may be dependent. This is illustrated in the following example. Consider a block with three transactions from the same user:

  • Transaction A: swap USDC for ETH
  • Transaction B: pay ETH for an NFT
  • Transaction C: swap USDT for BTC

Clearly, Tx A must happen before Tx B, but Tx C is entirely independent of both and can be executed in parallel. If each transaction requires 1 second to execute, then the block production time can be reduced from 3 seconds to 2 seconds by introducing parallelization.

The crux of the problem is that we do not know the transaction dependencies in advance. In practice, only when we execute transaction B from our example do we see that it relies on changes made by transaction A. More formally, the dependency follows from the fact that transaction B reads from storage cells that transaction A has written to. We can think of the transactions as forming a dependency graph, where there is an edge from transaction A to transaction B iff A writes to a storage cell that is read by B, and thus has to be executed before B. The following figure shows an example of such a dependency graph:

image

In the above example, each column can be executed in parallel, and this is the optimal arrangement (while naively, we would have executed transactions 1–9 sequentially).

To overcome the fact that the dependency graph is not known in advance, we introduce optimistic parallelization, in the spirit of BLOCK-STM developed by Aptos Labs, to the Starknet sequencer. Under this paradigm, we optimistically attempt to run transactions in parallel and re-execute upon finding a collision. For example, we may execute transactions 1–4 from figure 1 in parallel, only to find out afterward that Tx4 depends on Tx1. Hence, its execution was useless (we ran it relative to the same state we ran Tx1 against, while we should have run it against the state resulting from applying Tx1). In that case, we will re-execute Tx4.

Parallelization in the blockifier

The entry point to the new parallelization infrastructure in the blockifier can be found in concurrency.rs. The Rust implementation follows the original BLOCK-STM paper more closely compared to the previous version of our optimistic parallelization. The scheduler tracks the tasks that remain to be done, these can be either validation tasks or execution tasks. Worker threads then query the scheduler for the next task, and communicate with a versioned state to obtain the latest known storage values. This is illustrated in the following diagram:

Diagram: worker threads request the next task, and work on a versioned state (version i contains the writes of tx i). The execution itself is handled by the same code that handled sequential execution.

The blockifier’s implementation contains an additional commit phase that isn’t part of the original BLOCK-STM paper. The motivation for this phase is to delay fee transfers to the sequencer (which, if unhandled separately, leads any two transactions to collide due to reading & writing the sequencer’s balance). If all transactions before transaction i are committed, then the fee transfer is applied alongside an additional check to see whether or not the transaction can fit into a block. If both tests pass, the i’th transaction is committed.

What’s next

Several potential improvements to the blockifier’s implementation of BLOCK-STM will be added down the road. At the moment, accesses to VersionedState are mutually exclusive; this can be relaxed to locking only over the particular addresses that we want to read/write from.

Another direction to explore is adding language features that will allow contract developers to write more “parallelizable” code. For example, when performing a commutative operation on storage values today, one has to read the value, apply the operation, and write the new value. This means that although the underlying operation is commutative, any two transactions that perform it will have a read/write collision. If, however, the smart contract language has primitives that encapsulate “atomic storage manipulation”, then these can be used to enhance parallelization. For example, consider u256 addition. If we want to add some value to a storage slot (as happens when updating the sequencer’s balance due to fee payments), we can call an ad-hoc increment(storage_address, value) operation that does not necessarily lead two transactions that apply increment to the same address to collide. Collision will only happen if the first operation results in an overflow. This direction was explored in more depth in a recent paper by AptosLabs, where they introduce the notion of “deferred object” to the smart contract language.

Finally, while mostly targeted for the sequencing layer, the parallelization infrastructure can be used by full nodes to optimize re-execution. The improvement is even more significant for nodes, since they can get the optimal ordering via the dependency graph produced by the original execution.

Summary

As of Starknet 0.13.2, parallelization is back on the menu. Parallelization, along with AOT compilation via the cairo-native project that is planned for 0.13.3, will unleash the full potential of Starknet, reaching throughput that is only possible on a validity rollup.

Gm,

Excited about those ameliorations, looking forward to see those improvement and play with it.

I’m wondering something because there are some cases where I have to read the storage value, store it in a temp variable (e.g.: to emit it in an event later on) then write something else in it.
Wondering if it would also be possible to have a write fn that’d return the old value?
Although this might come with its own security concerns I guess…

Thanks in advance,
G.

For arbitrary manipulation, I assume that the answer is that they can not be parallelizible, but like addition, we thought of potential storage aggregators that will have some dedicated syntax in Cairo and will allow “atomic” updates. Increments and decrements for ERC20 transfers are classical examples (imagine e.g. dedicated syscalls that can fail for increment and decrement of a u128). The recent paper from Aptos re deferred storage may offer more general abstractions that can be used in our contracts as well.