L1 to L2 message cancellation

The Problem

Sending L1 to L2 message trigger a L1 message handler on L2 in a happy path scenario. If message handler results in unprovable computation handler won’t be executed. This might pose a problem for L1 contracts using L1 to L2 communication. For example, a token bridge on deposit, on L1, will usually first put L1 funds into an escrow, second, send a message to L2 in order to mint L2 tokens. In case L2 side of the bridge is unable to handle the message (for example it is misconfigured, or reached some limits imposed by bussiness logic) message will be effectively “swallowed” and no action will be taken, neither on L2 nor L1 which in turn means that funds will be stucked on L1 if no extra logic exits to compensate this particular edge case.

Proposed Solution

In order to take a compensating action (it would be returning the funds to the L1 user in case of bridge example) L1 logic needs to be sure that message will never be executed (sequncer might be just delaying the message, which would lead to a double spending problem). The way to achive this is to remove message hash from the message queue in the StarkNet core contract (sequencer won’t be able to commit a state change that would include the removed message).

In order to prevent malicious use of this mechanism (to stop the sequencer from commiting a state update) there should be a delay before which cancelations are not accepted.

23 Likes

I cannot comment on the proposed solution, but I encountered the same issue when designing an NFT bridge. In that case is even more important to avoid locking the NFT in the contract since it cannot be refunded, so having a mechanism for cancelling the bridging transaction would be great.

10 Likes

Thanks for the suggestion.
Should anyone be able to request to cancel a message? I think that if the time delay is long enough (e.g. 1 day), then this is fine.
Would love to hear your thoughts on this,

9 Likes

I think that if the time delay is long enough (e.g. 1 day), then this is fine.

Agree that if the delay is long enough that it should be OK.

I wonder though if the delay needs to be long enough to handle any potential StarkNet (or Ethereum) liveness failures (e.g. due to bugs, DoS attacks, etc) — this isn’t the simplest system on earth and I expect there’ll be some problems along the way.

Some applications might rely very strongly on messages sent getting through if they are provable, so to be more sure perhaps the delay should be something closer to a week.

9 Likes

A bit off topic here but IMO L2 contracts who receive L1 messages should be written in such a way that they don’t ever hard-fail on receiving messages, as long as the messages themselves are valid.

For example, messages that failed to be consumed by the business logic can be:

  • stored inside a buffer in contract storage, so that additional attempts can be made later, or they can be “returned” to L1; or,
  • immediately bounced back to L1 directly via an L2-to-L1 message.

Or even better, only put the message into a buffer in the message handling transaction, and deal with business logic / returning of message in a separate one, removing the risk of business logic messing up with the whole transaction. (we can even build a single message hub contract for everyone to use, similar to the L1 core contract that handles all the messages)

Of course this won’t solve the problem of sequencer censoring transactions, and message cancellation is still needed.

10 Likes

I agree. Since this is a worst case mechanism, which I don’t expect users to use in the normal flow, I don’t see a reason not making it longer.

This is a good point, I will take a look at our token bridge implementation to see how it handles it.

This is similar to the mechanism from L2 to L1, where the message it sent to L1 and stored on the StarkNet Core contract for later consumption by a different transaction.
Do you think it also makes sense for the L1 to L2? I think it won’t allow message cancelation.

8 Likes

It doesn’t allow message cancellation by itself, but it does allow us to build a message “cancellation” protocol on top. It obviously can’t deal with sequencer censorship, but such a pattern can avoid messages getting stuck due to business logic (as long as the recipient contract supports such “cancellation” of course), as stated by OP.

6 Likes

I don’t think that should be the case. What if the sender contract isn’t cancellation-aware?

Or maybe we can invoke a callback function on the sender contract on cancellation, and revert the tx if it fails. This way we can allow anyone to initiate the cancellation.

6 Likes

Arbitrum has Retryable Tickets. They aim to solve all of these issues: cancellation, reverts, refunds, etc. I don’t know how well they work in practice but it looks like a very developed concept so they’re probably worth looking into.

9 Likes

As @xJonathanLEI already pointed, it should not be the case. The whole point of message cancelation is to maintain invariants, i.e.: in case of a token bridge it would be: no tokens are stucked in L1 escrow. So, the contract that sent the message should be aware about the message cancellation in order to execute the logic that will maintain the invariant.

How to make sure that compensating logic will be executed on message cancellation? Out of two ways mentioned by @xJonathanLEI: permissionless cancelation with a callback to the sender or a permissioned(where only the message sender is allowed to cancel) cancellation I think the later is simpler (sender contract needs to verify that cancelation callback comes from the right contract).

BTW, I see no reason for making the cancelation delay overly long. Its purpose is to defend the sequencer from malicious use of the mechanism, for this small multiply of batch time will suffice.

10 Likes