[FINAL] Protocol Upgrade #7: Fault Proofs

The bonds are sized based on the anticipated cost to post a counter claim as well as to deter spamming invalid claims. There’s more details in the specs. As an example on op-sepolia the game 0xcf8f181497DAD07277781517A76cb131C54A1BEE shows the escalating bond sizes. The list-claims subcommand of op-challenger can provide a good view of the claims in the game: ./op-challenger/bin/op-challenger list-claims --l1-eth-rpc <SEPOLIA_L1> --game-address 0xcf8f181497DAD07277781517A76cb131C54A1BEE

The maximum depth of a game is 73, but there can be any number of claims and counter-claims within a dispute game. Due to the permissionless structure where many different actors can participate in the same game, a single claim may be countered by any number of different counter-claims, effectively combining multiple disputes into a single game.

Users are able to complete the full withdrawal cycle without depending on any privileged action. The caveat is that the Guardian role has the ability to override the system by either blacklisting games or reverting to a permissioned system. So the trust assumption is reduced to requiring only that the guardian role does not act to intervene but there is still a trust assumption, inline with the stage 1 requirements.

It’s not expected that normal users run op-proposer to regularly propose output roots, they’d generally just propose a single output root if they need to withdraw and the chain operator isn’t proposing outputs for them. They can do that through direct calls to the DisputeGameFactory even via Etherscan or could use the create-game subcommand of op-challenger.

For running op-challenger, there aren’t detailed docs currently. Definitely an area to build up between the official docs and community content.

3 Likes

There isn’t a specific timeline. The upgrade will require coordination with the security council to obtain the appropriate signatures. As mentioned in the proposal, the recommendation is to perform the upgrade shortly after the end of the veto period and I’m definitely excited to see it go live as soon as possible.

Thanks for you reply. I’m also very excited to see it implemented. When is the end of the veto period?

The veto period ends on June 5th I believe.

1 Like

The SEED Latam delegation, as we have communicated here, with @Joxes being an Optimism delegate with sufficient voting power we believe this proposal is ready to move towards a vote.

1 Like

Hi all,

I shared an objective overview of the facts of this upgrade above on behalf of the Developer Advisory Board, but wanted to follow up with a personal message explaining why I plan to vote no on this proposal.

Here are a few facts for context:

  • If this proposal passes, deployment will occur on June 10th.
  • The Fault Dispute Game itself has not been audited.
  • There is an audit contest for the Fault Dispute Game that is scheduled for July.
  • The safeguards themselves have received a thorough audit and no High Severity issues were found.
  • This audit was focused only on the on chain components and assumed that off chain monitoring and execution worked perfectly.
  • In the event that there is a bug in the Fault Dispute Game and the safeguards need to be used, the Optimism Bridge will go down temporarily and all pending withdrawals will need to be reproven.

These facts lead to me the conclusion that there is quite a serious downside risk of things going wrong after this upgrade, with very limited upside.

In terms of risks, I would suggest two major categories:

1) Reputation

I think it is highly likely (80%+) that there are critical bugs still in the Fault Dispute Game. This is no fault of the OP Labs dev team. It’s incredible complex code and hasn’t been audited. It would honestly be astonishing if there were no serious bugs.

In the event that a single critical bug is found and requires shutting down the bridge, it would cause major reputational damage for Optimism. Even though there are safeguards, mainnet bridge code having a critical bug would be very bad for public perception.

If this happens more than once (which doesn’t seem out of the question), the reputational damage would be quite a bit worse.

2) Security Risk

I participated in the contest focused on the safeguards, and feel very confident in the on chain components functioning safely.

However, the whole system relies on two facts:

  • Off chain monitoring will catch any fraudulent games
  • The guardian role will be able to move quickly enough to override a fraudulent game before the withdrawal is executed

In general, I feel uncomfortable about the idea that proactive action is needed to secure a multibillion dollar protocol. Although it is unlikely, it feels like there is just too much that could go wrong with this much at stake.

For example, after Protocol Upgrade #8, the guardian role is performed by the 10/13 security multisig. For a normal protocol, this might be sufficient, but with billions of dollars on the line, it doesn’t seem out of the question that an attacker could take 4 signers temporarily offline for a few days until their withdrawal is processed.

(Note that I believe the Foundation has an override here, so this example may not be exactly right, but my point is that anything that requires proactive action does not feel like adequate security to me.)


I want to make clear that I have the utmost respect for the OP Labs team and think the Fault Dispute Game is incredible well built. I am excited for it to be deployed.

But I think it is rushed to try to perform this deployment before the Fault Dispute Game audit is complete. No code this complex is perfect, and the downside of there being issues feels too large to justify taking the risk.

For these reasons, I plan to vote no on this proposal, and look forward to voting yes later in the summer when the risks have been addressed.

8 Likes

Following the explanation by the posters in the topic I am voting for this proposal.

Thanks for the proposal and best of success to this community!

Hey @ajsutton! I think this link is no longer working. Can you please update it?

Thanks @zachobront for your personal thoughts I share the same concerns. As much as I want to see Fault Proofs go live, I can’t help but feel that we moving too fast with too much on the line.
I do appreciate all the work put into getting Fault Proofs and all its components like the Fault Dispute Game to the level they are at today. I’d be comfortable voting for after the FDG audit.

Unfortunately I will be voting against this time round.

Thanks for raising great points. We have a question about the timing of the deployment.

According to Adrian’s clarification, the deployment will be coordinated with the security council and if the fact that the Fault Dispute Game isn’t fully audited is deemed a blocker, we can delay the deployment in July? We believe we can vote for the proposal and time the deployment appropriately, relying on the judgement and coordination by the council and OP Labs.

One question is that when proposing an output root, could we only store the minimum necessary data on-chain and delay creating the game contract until it is challenged?

The rational is that the current create() method of DisputeGameFactory takes about 429k gas, which is more than 4x higher than 93k of L2OutputOracle.proposeL2Output().

The link should be working again now. It appears to have been a temporary issue with the Sherlock site.

1 Like

Hey @zachobront, thanks for sharing your perspective on this proposal. We appreciate it, especially given your position.

Many, if not all, of the risks you described are very valid points.

As far as I know, the proposal follows the previously published audit framework, falling into the safety and reputational category. A bug does not immediately compromise the system since it implements various defensive mechanisms, including the Guardian setup itself. Therefore, some bugs are expected, and the security setup needs to be prepared for them.

Right now, I would appreciate having a complete understanding of your perspective. If this upgrade does not move forward, what would be viable actions to resubmit it? Should we wait for the audit contest to be completed or pay for a new audit?

I am happy to see a careful evaluation here, reconciling the security efforts made by core developers with what is believed to be concretely left to do. It is impossible to determine when code is completely bug-free, even with several audits completed. However, we should ensure that our reputation is not easily put at risk (e.g., if several bugs are found shortly after implementation).

For us, if there is a clear risk that we can feasibly avoid, we should make every effort to mitigate it.

I love seeing these conversations happen and am happy that other delegates share their perspectives based on what is described above. CC: L2Beat team @kaereste and @GFXlabs, as well others.

4 Likes

https://www.coinbase.com/blog/strengthening-our-defenses-with-seal-wargame-simulations

TL;DR: We conducted a successful SEAL wargame with Base and Optimism (OP), testing our response to a simulated vulnerability in bridge withdrawals.

2 Likes

Hi @Joxes — thanks so much for the thoughtful response. I’ll do my best to share specific details of my thinking below, but please let me know anywhere you disagree or you feel my explanation is unclear.

I largely agree with your assessment here. The core Fault Dispute Game (because of the training wheels) is probably best described by the Safety / Reputational quadrant, which seems to have undefined behavior in the audit framework.

But there is a subtle and important distinction in what is meant by training wheels. The current safeguards require proactive action to block a fraudulent proposal. It’s important to recognize that any “black swan” risk to this off chain system would immediately move all Fault Dispute Game code into the Safety / Existential quadrant.

I assume the recent SEAL Wargames focused on the robustness of these systems, but we can’t have the same auditable security guarantees for an off chain system requiring human intervention as on chain code.

My perspective is that at least one high quality audit is required for the (large) reputational risk and (small) safety risk to be worthwhile.

I believe waiting for the already funded audit contest in mid July is the most effective way to do that. As long as that contest doesn’t uncover a large amount of deep issues requiring major rewrites, I would be comfortable voting yes as soon as it’s complete.

For further clarity, I would like to state that I am 100% on board with @Tane suggestion here. If OP Labs decides to time the deployment for after the scheduled audit, that would be sufficient for me to feel comfortable with the current proposal.

5 Likes

Hi I’m Mofi, a protocol engineer at OP Labs. I do not represent or speak on behalf of the Optimism Foundation.

I shared an objective overview of the facts of this upgrade above on behalf of the Developer Advisory Board, but wanted to follow up with a personal message explaining why I plan to vote no on this proposal.

First and foremost, we sincerely appreciate you taking the time to evaluate the system and share your concerns. It is incredibly valuable to have experts like you participate in the governance system and thoroughly review proposals. We understand your concerns here and want to address them directly so that governance participants can make an informed decision.

I think it is highly likely (80%+) that there are critical bugs still in the Fault Dispute Game. This is no fault of the OP Labs dev team. It’s incredible complex code and hasn’t been audited. It would honestly be astonishing if there were no serious bugs.

The Fault Dispute Game is certainly a complex system. It’s important to consider that audits aren’t the only way to ensure code quality, nor are they a silver bullet. The OP Labs Audit Framework is based on the idea that security starts with developers and an audit is a final assurance, not the basis for considering a system secure.

In this case, OP Labs ran a Sherlock contest focused on the safety nets and overrides — the components of the fault proof for which safety is of existential importance. The Fault Dispute Game is considered a reputational risk but not an existential risk. The Audit Framework explicitly notes “fault proofs with training wheels” as the sole example in the “reputational/safety” quadrant (image reproduced below) of the audit rubric.

Most of the code outside of the Fault Dispute Game contracts falls into the liveness/reputational risk quadrant. The majority of code in the fault dispute system is in op-program (a combination of the consensus critical parts of op-node and op-geth) and op-challenger which are the types of code that experience has shown audits are less effective for vs what is learned by getting code running in the real world. That’s why we’ve been rolling this out incrementally for some time now, including a lot of internal deployments, the public Fault Proof Alpha release and upgrading op-sepolia.

It’s also worth noting that the Immunefi bounty program is being updated to include these contracts so that auditors and bug hunters will have an opportunity to disclose issues in the Fault Dispute Game contracts before any mainnet launch.

In the event that a single critical bug is found and requires shutting down the bridge, it would cause major reputational damage for Optimism. Even though there are safeguards, mainnet bridge code having a critical bug would be very bad for public perception.

Pausing withdrawals is the final safety measure, but not the only one. The Guardian also has the ability to blacklist specific games, allowing withdrawals proven against valid output roots to continue unaffected. The respected game type can also be changed to a permissioned game which does invalidate already proven withdrawals but allows them to be re-proven and therefore does not shut down the bridge. The exact way in which the system should respond depends on the severity and impact of the bug.

We do believe it’s important that governance participants understand that reputational risk does exist here no matter what — it’s never a good thing to need to fix bugs in production. Our position is that the purpose of Stage 1 is to make it possible to deploy fault proofs into production without requiring perfection. OP Stack developers will learn a lot from seeing this system on mainnet. Even Stage 2 explicitly assumes that there will be bugs in the fault proof system and uses multiple independent proof systems to mitigate this risk.

In general, I feel uncomfortable about the idea that proactive action is needed to secure a multibillion dollar protocol. Although it is unlikely, it feels like there is just too much that could go wrong with this much at stake.

For example, after Protocol Upgrade #8 2, the guardian role is performed by the 10/13 security multisig. For a normal protocol, this might be sufficient, but with billions of dollars on the line, it doesn’t seem out of the question that an attacker could take 4 signers temporarily offline for a few days until their withdrawal is processed.

Proactive action for security is ultimately part and parcel of an optimistic rollup. Rather than using a positive proof of validity, users permissionlessly publish and challenge output root proposals. Challengers must be able to detect invalid proposals in the same way that the Guardian must be able to detect these proposals. In a meaningful way, it is not possible to separate an optimistic rollup from proactive action.

Even today, under the permissioned proposer model, it assumed that the proposer may be fault and a delay is added so that the Guardian role can intervene to remove a proposal if needed. While the contracts that support this process are simpler than the fault dispute game contracts, calculating the correct output root requires correct operation of op-node, op-geth, and op-proposer, none of which are audited before each release. Great care is taken in the development of these components, but the Guardian role is the backstop to ensure security even if they fail — just as in the proposed Stage 1 fault proof system.

I want to make clear that I have the utmost respect for the OP Labs team and think the Fault Dispute Game is incredible well built. I am excited for it to be deployed.

Thank you for your kind words, and again thank you for the work you have done to look into the proposed system and share your concerns.

8 Likes

It is possible that could be done as an optimisation in the future. For simplicity, we haven’t done so at this stage. Note that if this were done, the proposer’s bond would need to be increased to cover the cost of the challenger creating the game to ensure the system remains compatible. So the upfront cost would likely be similar, but you could reclaim that bond if the output root is valid.

In terms of costs to operate a chain, there is also the balancing factor that, unlike the L2OutputOracle, the DisputeGameFactory does not require proposals be made at a fixed interval. So proposals could be made less often for chains that don’t typically see many withdrawals to save costs then made more frequent as chain usage grows. The timeliness of proposals is also less important because they are no longer restricted to a privileged actor so users can propose their own output root if they don’t want to wait.

1 Like

The following reflects the views of L2BEAT’s governance team, composed of @kaereste and @Sinkas, and it’s based on the combined research, fact-checking, and ideation of the two.

While we understand and appreciate @zachobront’s points and concerns, we’ll vote in favor of the proposal.

Given that the vote has no immediate consequences as it’s not an on-chain executable vote, we treat it as a way to signal support in the overall direction of implementing fault proofs. We trust the Foundation and OP Labs judgement as to whether the implementation is ready to be moved to production and the timing of the actual upgrade.

1 Like

Firstly, thank you for the proposal. We consider every step taken towards permissionlessness to be important. We really like the structure of the new mechanism, especially its allowance for anyone to propose output roots and participate in the dispute system. However, we still have concerns about the exit window. As mentioned in the Optimism Docs under Privileged Roles, this structure may create some delays. Nevertheless, as the ITU Blockchain Delegation Committee, we are aware of the value of this development and support this proposal.

1 Like