Protecting L3 chains from L2 censorship

Background

The Arbitrum challenge protocol, which is run in the rare case where validators are staked on conflicting claims about the chain history, creates a “chess clock” for each side in a challenge. The chess clock is a total time allocation for that side to make its moves. The chess clock is running when it is that side’s turn to move, and if the chess clock runs out the side forfeits the challenge.

Each move in the challenge is a transaction on the parent chain (i.e. Ethereum for an L2 chain’s challenge protocol, and an L2 chain for an L3 chain’s challenge protocol). The chess clock is given enough time so that a side can make all of its moves in the challenge, even in the presence of a worst-case censorship attack against the parent chain.

The issue

A question that arises for L3 Orbit chains is: what is the worst-case censorship attack against the underlying L2 chain, such as Arbitrum One?

The worst case happens if the L2 sequencer is malicious, and tries to censor every move made by one side in the challenge. Of course the L2 sequencer can’t censor a transaction forever, but it can delay the transaction’s inclusion by the force-inclusion delay (currently 24 hours, though there are proposals to cut this to, say, 8 hours). A transaction that has been waiting in the delayed inbox for the force-inclusion delay (or longer) can be forced into the sequence whether the sequencer likes it or not. (Anyone can force inclusion of such a message, by making a special call to the inbox contracts that live on L1 Ethereum.)

To cause this kind of delay for a transaction, the L2 sequencer would have to refuse to include the transaction in its sequence, then refuse to include any transactions submitted through the delayed inbox, which means not including any asset deposits to the L2 chain, thereby requiring a forced inclusion after the 24 hour delay.

While this is less than the 7 day maximum that is commonly assumed for censorship on Ethereum—that’s the reason for the 7 day challenge period in Arbitrum One’s settlement protocol—the difference is that in theory the L2 sequencer could cause this delay repeatedly, for every move by one side in the challenge game, adding up to many days of delay.

How to fix it

Fortunately this issue is not too hard to fix. We simply need to prevent the L2 sequencer from carrying out a worst-case censorship attack repeatedly.

The Arbitrum Security Council already has the power to fire the L2 sequencer and appoint a replacement, so in the short to medium term, we can rely on the Security Council to replace the L2 sequencer if it tries to do repeated long censorship attacks. This is a reasonable assumption because these attacks would affect many users of the L2.

In the longer run, the sequencer protocol can be changed to prevent repeated 24-hour (or whatever) delays. For example, whenever an abnormal delay of length D occurred, the protocol could subtract D/2 from the maximum delay parameter going forward. This would guarantee that a series of delays could add up to no more than double the initial force-inclusion delay. (The subtracted-off amount could be added back slowly afterward, so a well-behaved sequencer would get back its safety margin.) Other approaches are possible, so presumably the DAO will debate which approach is best.

For now, it seems safe to assume that the Security Council will replace the sequencer if it tries to commit repeated censorship of time-critical transactions.

Appendix: Wait, why is the force-inclusion delay 24 hours anyway?

You might be wondering why the force-inclusion delay is 24 hours. Why not allow a neglected transaction to be forced into the sequence sooner?

The answer is that forcing inclusion of a transaction typically causes a reorg of the L2 chain, and Arbitrum chains are designed to almost never reorg—that’s why most people decide to trust the sequencer’s immediate transaction result. Over the history of Arbitrum One, there has been only one reorg, affecting a handful of transactions.

So the force-inclusion delay is currently set to be longer than the reasonable worst-case sequencer downtime. The current sequencer uses a highly redundant devops setup, with multiple hot spares in diverse locations, but that wouldn’t prevent downtime if there is some bug in the sequencer that causes it to get into a stuck state that can’t be fixed by rebooting. In that case, engineers would need to diagnose and fix the bug before the sequencer could resume.

Under normal conditions, the force-inclusion delay doesn’t matter because an honest sequencer will include delayed-inbox messages once they have finality on Ethereum, with a delay on the order of ten minutes. So the initial judgment was that preventing reorgs was important relative to the worst-case inclusion delay that might result from a serious bug.

6 Likes

Maybe a little off topic, but regarding “why is the force-inclusion delay 24 hours anyway?”:
What sort of batch information gets posted to the L1?

In a based roll-up for example, all of the L2 transaction information is posted to the L1, so the forced-inclusion path is in the order of the finality of the L1. It feels like we should be able to extend the semantics of the L2 batch to include specific transaction information, such as transactions in the force inclusion queue. Would this be realistic?

We could place the additional burden of “bigger batches” as a result of this 1-to-1 mapping from the forced inclusion queue on the users entering the forced inclusion queue. If that is seen as too high a barrier, we could introduce a fast and slow force-inclusion queue: fast performing as I mentioned, requiring more information per-batch, and slow acting as is currently the case.

The L2 batches do include (by reference) transactions from the L1 delayed inbox, but this is at the discretion of the sequencer, because we want the sequencer to know when and where these messages will appear in the sequence. If their inclusion came as a surprise to the sequencer, this would create a reorg of the L2 chain, which we want to avoid.

(L2 reorgs aren’t impossible on Arbitrum but they are extremely rare. There has been only one L2 reorg in the entire history of Arbitrum One, and none on Nova.)

Normally the sequencer includes a delayed inbox message into the sequence as soon as that message’s arrival in the delayed inbox has finality on L1, which typically takes 12 to 19 minutes. But if the sequencer does not do this for some reason, inclusion can be forced after 24 hours.

Why 24 hours? In practice, the scenario where the force inclusion delay is likely to matter is if the sequencer has a bug that prevents it from operating, or there is some bug with batch posting that prevents batches from being posted. (The sequencer, though logically centralized, uses multiple redundancy, so a sequencer crash causes a fast failover to one of the hot spares. That’s why the concern is not about a crash but rather a bug that prevents all of the instances from making progress.)

24 hours is meant to be enough time to fix any such bug, allowing recovery without a reorg. There have been proposals to reduce the force inclusion delay to, say, 8 hours, which seems safe to do.

2 Likes

I wonder if there is an L1 contract design which can incorporate something to the effect of:

“Check the force inclusion list up until the last L1 block. If it is non empty, verify that each of those transactions are included in the L2 batch. If the tests pass, include the L2 block, else reject.”

If there was an L1 reorg and the inclusion list changed, the proceeding L2 update could include the proofs for the new list. That wouldn’t invalidate previous updates, just require additional/different proofs. It feels like Arbitrum would be flexible enough to enforce that without too much burden on the L2 sequencer. Even in a “very surprising” scenario, the sequencer could just append the inclusion list to the batch.

If an L2 batch failed, the L2 could continue to process transactions, but L1 finality can’t be provided until the inclusion restrictions are obeyed.

Standard optimistic design is a delay on force inclusion on a scale of hours or days, so perhaps I’m missing something. If I had to guess, the L2 is prioritizing an L1-agnostic L2 sequencer that can almost always publish L2 updates. What I am proposing would require an L2 sequencer who reads from the L1 every 12s I think. If there are more detailed discussions on this decision, please link them. Keen to learn :pray:

It’s an interesting question: Can the protocol force the L2 sequencer to include delayed-inbox transactions when they should be included?

I think that would require some extra bookkeeping in the inbox contract (to keep information in contract storage so the contract can check it, rather than just storing a hash/merkle commitment to the information as is currently done). And if the rule is that inclusion should happen after L1 finality, then enforcement would require that finality of L1 block be available to the L1 execution layer.

That said, it would be interesting to see a specific proposal for how to do that kind of enforcement, to see how much it would cost, and open a conversation with the community about whether that cost was worth paying.

1 Like

Is the reason that L2 reorgs are rare mainly because sequencer can only reorg on pre-confirmations prior to the data/root being posted to Ethereum?

Current sequencer is run by Offchain Labs, do we know who is/are the “backup” sequencer(s) that could replace the active one?

Relaying only on the Security Council for censorship might not be optimal from a user perspective imho.
What do you think about initially implementing say 2 or 3 sequencers in a round-robin style/with a turn-based approach, and then possibly increasing their number over time?

Additionally, you have an issue if the person whose turn it is starts censoring certain users. If these turns are long (e.g a given sequencer gets a monopoly for several hours) and the sequencer censor, we will be sad and even if it goes down, it would be difficult for the protocol to recover. If turns are short, other sequencers could step in and catch up where the primary sequencer was censoring or went down.