[FINAL] Protocol Upgrade #7: Fault Proofs

Hi I’m Mofi, a protocol engineer at OP Labs. I do not represent or speak on behalf of the Optimism Foundation.

I shared an objective overview of the facts of this upgrade above on behalf of the Developer Advisory Board, but wanted to follow up with a personal message explaining why I plan to vote no on this proposal.

First and foremost, we sincerely appreciate you taking the time to evaluate the system and share your concerns. It is incredibly valuable to have experts like you participate in the governance system and thoroughly review proposals. We understand your concerns here and want to address them directly so that governance participants can make an informed decision.

I think it is highly likely (80%+) that there are critical bugs still in the Fault Dispute Game. This is no fault of the OP Labs dev team. It’s incredible complex code and hasn’t been audited. It would honestly be astonishing if there were no serious bugs.

The Fault Dispute Game is certainly a complex system. It’s important to consider that audits aren’t the only way to ensure code quality, nor are they a silver bullet. The OP Labs Audit Framework is based on the idea that security starts with developers and an audit is a final assurance, not the basis for considering a system secure.

In this case, OP Labs ran a Sherlock contest focused on the safety nets and overrides — the components of the fault proof for which safety is of existential importance. The Fault Dispute Game is considered a reputational risk but not an existential risk. The Audit Framework explicitly notes “fault proofs with training wheels” as the sole example in the “reputational/safety” quadrant (image reproduced below) of the audit rubric.

Most of the code outside of the Fault Dispute Game contracts falls into the liveness/reputational risk quadrant. The majority of code in the fault dispute system is in op-program (a combination of the consensus critical parts of op-node and op-geth) and op-challenger which are the types of code that experience has shown audits are less effective for vs what is learned by getting code running in the real world. That’s why we’ve been rolling this out incrementally for some time now, including a lot of internal deployments, the public Fault Proof Alpha release and upgrading op-sepolia.

It’s also worth noting that the Immunefi bounty program is being updated to include these contracts so that auditors and bug hunters will have an opportunity to disclose issues in the Fault Dispute Game contracts before any mainnet launch.

In the event that a single critical bug is found and requires shutting down the bridge, it would cause major reputational damage for Optimism. Even though there are safeguards, mainnet bridge code having a critical bug would be very bad for public perception.

Pausing withdrawals is the final safety measure, but not the only one. The Guardian also has the ability to blacklist specific games, allowing withdrawals proven against valid output roots to continue unaffected. The respected game type can also be changed to a permissioned game which does invalidate already proven withdrawals but allows them to be re-proven and therefore does not shut down the bridge. The exact way in which the system should respond depends on the severity and impact of the bug.

We do believe it’s important that governance participants understand that reputational risk does exist here no matter what — it’s never a good thing to need to fix bugs in production. Our position is that the purpose of Stage 1 is to make it possible to deploy fault proofs into production without requiring perfection. OP Stack developers will learn a lot from seeing this system on mainnet. Even Stage 2 explicitly assumes that there will be bugs in the fault proof system and uses multiple independent proof systems to mitigate this risk.

In general, I feel uncomfortable about the idea that proactive action is needed to secure a multibillion dollar protocol. Although it is unlikely, it feels like there is just too much that could go wrong with this much at stake.

For example, after Protocol Upgrade #8 2, the guardian role is performed by the 10/13 security multisig. For a normal protocol, this might be sufficient, but with billions of dollars on the line, it doesn’t seem out of the question that an attacker could take 4 signers temporarily offline for a few days until their withdrawal is processed.

Proactive action for security is ultimately part and parcel of an optimistic rollup. Rather than using a positive proof of validity, users permissionlessly publish and challenge output root proposals. Challengers must be able to detect invalid proposals in the same way that the Guardian must be able to detect these proposals. In a meaningful way, it is not possible to separate an optimistic rollup from proactive action.

Even today, under the permissioned proposer model, it assumed that the proposer may be fault and a delay is added so that the Guardian role can intervene to remove a proposal if needed. While the contracts that support this process are simpler than the fault dispute game contracts, calculating the correct output root requires correct operation of op-node, op-geth, and op-proposer, none of which are audited before each release. Great care is taken in the development of these components, but the Guardian role is the backstop to ensure security even if they fail — just as in the proposed Stage 1 fault proof system.

I want to make clear that I have the utmost respect for the OP Labs team and think the Fault Dispute Game is incredible well built. I am excited for it to be deployed.

Thank you for your kind words, and again thank you for the work you have done to look into the proposed system and share your concerns.

8 Likes