Hi, Iām maurelian, a member of the EVM Safety team at OP Labs.
Those familiar with the process of executing upgrades to the Optimism Superchain will know just how deeply we go into the process of verifying the expected result of an upgrade. In this post, weāll cover the most comprehensive update to this process weāve made since its inception, and discuss why we think these changes improve our security posture and reduce signer overhead. Most critically, we replace previously manual validation steps with a smart contract.
TLDR
Previously: All upgrades previously required signers to perform āexhaustive manual state diff validationā by simulating and checking every single state diff resulting from an upgrade.
Now: We have the OPContractsManager, a verifiable contract which contains all of the logic used to execute an upgrade, and so we can simply verify that logic and then trust that state changes will be executed as expected.
By analogy; Instead of tasting every single dish that comes out of a kitchen, we can simply verify the recipe and trust that following it correctly will consistently produce good meals.
This approach is safer and more scaleable, as it removes our reliance on a heavily manual process and replaces it with smart contract automation.
Outline
This article will cover the following topics:
- Goals for a secure upgrade process
- The early history of our upgrade process
- What is exhaustive manual state diff validation and why earlier upgrades required it
- Recent improvements to our upgrade process (the OPCM)
- How these improvements enable us to move away from manual state diff validation
Goals for a secure upgrade process
The goal for any upgrade process is to enable shipping L1 contract upgrades without bugs or malicious code being introduced. For this documentās purposes, those are taken to mean:
- Bugs: unintended behaviours which are accidentally introduced.
- Malicious code: unintended behaviours which are intentionally introduced, whether by members of the collective, or outside parties attempting to compromise our software supply chain.
It is crucial to minimize the number of manual steps required, both by developers and signers in order to fulfill these goals, as each manual step introduces the potential for error.
Next weāll look at how upgrades were performed early in the life of the OP Stack, and then cover why that necessitated exhaustive state diff validation.
The early history of our upgrade process
Until Q1 of this year, each L1 contracts upgrade was written in a bespoke script. Here is an example script from a previous upgrade with over 500 LoC.
For each OP Chain (the individual L2 chains in the Superchain) this script had to be duplicated and then customized for that specific chain. There was very little means of enforcing consistency between one upgrade and the next.
Perhaps worst of all, the upgrade script was typically written by a different set of engineers from those who had actually written the features that necessitated the upgrade.
This created many opportunities for errors to creep in; either from incorrectly duplicating another upgrade script, by forgetting to update an address after duplicating the script, or from simply not properly understanding the intended upgrade path.
Because this process was so error prone, we needed a rigorous process in order to detect such errors.
What is exhaustive manual state diff validation and why earlier upgrades required it
Exhaustive Manual State Diff Validation is the process by which every signer, manually verifies every single change introduced by an upgrade transaction.
At a high level, this process involves:
- Writing a document describing why each diff is safe and necessary. See the Task State Changes section of this document for an example of the guidance which was written for each upgrade transaction.
- Signers then simulate the transaction to calculate the resulting state diff, and review it against that document. The State Diff tab of this transaction in Tenderly provides a good example of a simulated state diff for a typical transaction.
The links above correspond to the same single transaction, which only upgrades two OP Chains, and requires validating over 40 state changes for each chain. Signers reported to us that completing the full validation process would require several hours for each transaction, and sometimes, round-trip communication with engineers during the signing process.
This emphasizes a tradeoff between the benefits of rigor and costs of complexity. While exhaustive manual state diff validation adds a layer of rigor to what bugs or malicious code signers can catch, our experience shows that having them verify too many things increases the risk of failing to verify the most important things, such as the *Expected Domain and Message Hashes, or the Upgrade Transaction Inputs (including prestateHash)*
To continue improving our upgrade security postureāwhile increasing the frequency, signer count, and chain count of upgrades, we must find a better way. We needed a process for upgrades which did not require so many manual steps, each of which introduces the potential for error.
The OPCM: A major upgrade process improvement
The OPContractsManager (OPCM) (first introduced for Upgrade 13) significantly improves our security posture, allowing us to left-shift verification work, and avoid the danger of repetitive manual steps. By verifying the OPCMās logic once, we create a trusted compute base that eliminates the need to manually verify every state change during upgrades.
Concretely, the OPCM is a smart contract that serves as a standardized upgrade mechanism. For each new release of the L1 contracts (ex. op-contracts/v4.0.0), we deploy a new OPCM on L1. This OPCM is then used to execute all on-chain management operations (including deploying new chains and upgrading existing chains).
A critical advantage of this approach is that the OPCM is developed and tested in the same repository as the L1 contracts. This allows us to run the full suite of tests against both a freshly deployed system, or one which is forked from the contracts currently running on mainnet and then upgraded. Although this does not mean that any upgrade is safe simply because it uses any OPCM, it enables more parties to verify the safety of an upgrade, more rigorously, and earlier in the process, without time pressure.
Importantly, OP Chains do not need to grant any authorization to the OPCM. Rather, the OPCM is a stateless on-chain script that wholly describes both:
- the correct deployment or upgrade logic for each release
- the implementation logic of the post-upgrade system
Thus in order to upgrade an OP Chain, its L1 contracts owner must DELEGATECALL the OPCMās upgrade() method.
Because upgrades wholly defined by the logic of the OPCMās upgrade method, it is thus sufficient to verify the logic of the OPCM being called during an upgrade. This shifts our security focus left - from verifying every upgrade transaction āon the clockā while signing to verifying the mechanism that produces those transactions rigorously and in advance of actual signing dates.
How these improvements enable us to move away from manual state diff validation
Now that weāve covered both the historical context, and how the OPCM changes things, we can talk about better approaches for protecting against bugs and malicious code.
1. Defending against bugs
Preventing bugs from being accidentally introduced to the code is the area weāve traditionally focused the most on. The mitigations occur throughout our software development lifecycle, and include the use of specs, design docs, failure mode analyses, threat modeling, code review, testing and auditing.
Assumption: If at least one of the developer and the security reviewer are honest, then a review report which attests to a specific git commit provides confidence that the commit is believed to be free of bugs.
2. Defending against malicious code
To defend against malicious code being intentionally introduced, we have implemented a couple crucial safe guards.
1. VerifyOPCM:
The new VerifyOPCM script can be used to verify that a deployment of the OPCM (at a specific address) was deployed from a trusted git commit. The script verifies the bytecode of all other contracts and blueprints referenced by the OPCM.
Per the assumption of the āDefending against bugsā section, the VerifyOPCM script is trusted to do this because any changes to it MUST be included in reviews of the L1 contracts (the first review to include it can be found in both our repo and Spearbitās.
Since the VerifyOPCM script can be used to gain trust in the OPCM, then you can trust that the OPCM will correctly perform the upgrade without introducing any bugs or malicious code.
Importantly, this script provides protection against a malicious entity tricking signers into executing an upgrade using a malicious OPCM with logic which does not match that of the reviewed commit.
2. Comprehensive release checklists: Starting with Upgrade 16, OP Labs has begun publishing a release checklist which includes:
-
Verification of the deployed OPCM
-
Verification of non-OPCM contracts (ie. Safe Modules)
Those checklists can be found here:
- Contracts Release Checklist op-contracts/v4.0.0-rc.8
- Sepolia Release Checklist
op-contracts/v4.0.0-rc.8
Importantly, anyone with
gitandforgeinstalled on their machine can replicate the steps in these checklists.
The mechanisms discussed in this section are the main reasons we donāt need to exhaustively manually check every single state diff in every single upgrade transaction.
Instead we can simply use the tools above to determine that we trust the OPCM, and that executing upgrades using the OPCM will behave as expected.
It is important to underscore that these changes apply only to upgrades and other actions (ie. adding a new dispute game) which are managed by the OPCM. For other multisig transactions we will continue to perform exhaustive manual state diff validation.
Crucially, for all multisig transactions it remains essential to verify that the domain and message hashes match the expected values, regardless of wether or not the transaction uses the OPCM. A significant motivation for this change is to allow signers to place more emphasis on these verifying these hashes.
Summary
Weāve shown that the historical practice of exhaustive manual state diff validation is no longer necessary given our modern tooling.
Anyone can now use publicly available tools and processes to gain trust in an OPCM deployment, thus provides stronger guarantees, to a larger swath of people, with less manual labor. This approach scales, and can be sustained alongside the growing Superchain ecosystem.
