Retro Funding is an economic flywheel that rewards positive impact to the Optimism Collective. The Collective is running Retro Funding as an ongoing experiment where insights and learnings from each round inform the design of future experiments. Below is a retrospective on one of our latest experiments.
Round details
- Voting period: June 27th - July 11th
- Reward amount: 10M OP, max reward per project 500,000 OP, minimum reward per project to be eligible was 1,000 OP.
- Impact period: October 1st, 2023 - May 31st, 2024
- Voting transparency: Votes were public
Retro Funding 4: Onchain builders
Retro Funding 4 rewarded onchain builders that deployed their own contracts on the Superchain. These builders play a crucial role in driving Superchain adoption and generating sequencer revenue. For this round, builders on Base, Zora, Mode, Fraxtal, Metal, and OP mainnet were eligible.
Results overview
-
Strong power law distribution: The top 20% of applicants received around 70% of OP rewards, which follows a power law distribution. The median project received 13K OP, which further demonstrated the shift towards a power law distribution compared to Retro Funding 3, where the median project received 45K OP.
-
New builders: This round saw a significant influx of first time applicants, with 76% of projects participating in the Retro Funding program for the first time. In a survey among round recipients, 41% responded that they did not apply to previous rounds because they were unaware that Retro Funding rewarded onchain builders.
-
Coverage: Retro Funding 4 rewarded ~20% of the Superchain’s onchain activity, achieving high coverage of overall activity. The remaining ~80% of Superchain activity is distributed as follows: ~10% of Superchain activity comes from large projects that did not apply, and the remaining ~70% comes from unknown projects, P2P transfers and more.
Metrics-based voting: The experiment
Retro Funding 4 experimented with Metrics-based voting, testing the hypothesis that by leveraging quantitative metrics, badgeholders are able to more accurately express their preferences for the types of impact they want to reward, as well as make more accurate judgements of the impact delivered by individual contributors.
Voter sentiment
The experiment received an overwhelmingly positive sentiment on the voter side, with surveyed badgeholders unanimously signaling that Retro Funding should run further experiments utilizing metrics-based voting.
Badgeholders signaled that they found a metrics-based voting approach to be more accurate in determining impact, with 71% of surveyed badgeholders agreeing that metrics-based voting yielded more accurate measurements than the individual project reviews conducted in previous rounds. The average results satisfaction rating from badgeholders was 4.8 out of 7.
“Metrics-based impact is a tremendous achievement and I think these will be some of the best results ever! I have very high confidence in these results. I think future rounds will be even better, especially as the trusted user models improve.”
The most prominent feedback on the metric-based voting approach was the lack of “human touch.” Badgeholders wanted more human input to help interpret the raw onchain data and shape metrics to reward impact accurately.
“I don’t think we have all the formulas correct yet. But I think it’s much better to be allocating based on data that we collect, rather than trusting what is provided by individual projects. I feel that we must strive to independently verify data and easily identify “games” being played to skew metrics in their own favor.”
Recipient sentiment
In addition to being perceived as more accurate, metrics-based voting gave participating builders much higher predictability and transparency. While in previous rounds there was little visibility into how project impact was evaluated, this round provided a clear signal to builders on what was being prioritized and considered. In the post-round recipient survey, 60% of participants responded that they were satisfied or very satisfied with the round’s overall reward allocation, while 40% said they expected for their project to receive more rewards.
Metric creation process
Badgeholders shaped the creation and curation of impact metrics by participating in a workshop and multiple surveys. The opportunity to test the voting UI yielded even more feedback, ensuring the final metrics framework truly reflected the community’s values and preferences.
The impact metric setup was led by OS Observer, which leveraged public datasets, meaning anyone can verify the integrity of the source data. All of the code powering impact metrics is open source, allowing anyone to audit the models and query logic. For each metric, you will find a detailed description and a link to its calculation within the voting application.
This process highlighted the importance of an iterative process to creating metrics. Only when badgeholders evaluated how different projects performed against certain metrics were they able to give feedback on how well a metric performed at achieving its goal.
Enhancing metrics and balancing use cases
Badgeholders flagged that a more balanced approach to metrics was needed, with concerns that overly simplistic metrics could be gamed. Solely relying on metrics such as gas fees or trusted users, could end up disproportionally rewarding projects that performed well on a single vector. Combining multiple qualitative and quantitative criteria for a single metric was a popular request, with one participant stating, “Any metric can be gameable; and sophistication needs to happen.” While this round focused on simple one-dimensional metrics, a new frontier is emerging with metrics that express more comprehensive measurements of impact. Overall, however, badgeholders rated their confidence levels on Round 4’s impact metrics as 5 out of 7.
Confidence in data sources and trust signals varied. On average, badgeholders rated clarity and transparency around the impact metrics 6/7. One participant stated, “Folks were generally happy with the metrics coverage,” but there’s room for improvement regarding both transparency and education.
However, the resulting incentives of rewarding impact via a particular metric can be complex and therefore hard to grasp. For example, some badgeholders selected metrics that were on a logarithmic scale instead of linear. Discussions among badgeholders towards the end of the voting process highlighted that the implications of rewarding impact using a log scale was not clearly understood among voters.
More in-depth analysis is necessary to understand the effect of rewarding impact via specific metrics. It’s questionable if the current structure involving many voters quickly choosing what they prefer reflects a sufficient level of analysis to set the right incentives.
Trusted user model: Refinements and challenges
The trusted user model aimed at identifying addresses that are likely to belong to a real human. 10 out of the 16 impact metrics relied on the trusted user model. The trusted user model faced criticism for either being too narrow, excluding genuine real users, or being too broad, including users which could be bots.
Some badgeholders wanted to see a trusted user model which was more inclusive, sharing that they think the criteria for “trusted users” were too narrow and excluded many genuine users.
“Most users might not meet all these requirements, especially if we want to achieve mass adoption.”
Other badgeholders, however, were concerned criteria weren’t exclusionary enough. They worried the trusted user model still captured users which demonstrated bot or farming behavior.
There is a tradeoff between a narrow model – which has a high probability of only considering real humans while also excluding a lot of what could be real human users – and a broad model, which would include a larger number of users that could include bots.
For future iterations, a multitude of trusted user models that enable more customization could help badgeholders express what user behavior they deem impactful. Identifying sybil behavior among addresses has been a large focus within the broader industry, and while promising efforts have been made, it is unlikely a model will emerge which will not face a tradeoff between narrow and broad criteria.
Challenges around account abstraction
Challenges with the trusted user model were highlighted by projects which integrated account abstraction (ERC4337). The typical account abstraction user experience involves creating a new address for interacting with an application. This address usually has no reputation attached to it and is indistinguishable from bots or sybils. This resulted in most account abstraction transactions not meeting the requirements of the trusted user model. As account abstraction sees more adoption (leading users to not attach reputation to their addresses), it becomes questionable if trusted user models will remain valuable.
Distribution curves & maximum rewards
Of the 12 unique metrics measured in this round, 4 were also provided on a logarithmic distribution curve. While logarithmic metrics were used significantly, they received mixed reviews.
Distribution curves can allow badgeholders to set different incentives scales. A logarithmic curve can be perceived as harsh when evaluating impact, because it results in statements such as: A project with 1,000 transactions should receive half the rewards of a project with 10,000 transactions. This raised questions around fairness and it was unclear if the implications of selecting a logarithmic curve were obvious to voters.
Further exploration of distribution curves could enable more nuanced decision-making.
Open source multiplier
Badgeholders had the option to add an “open source multiplier” to their ballot, which multiplied the rewards of projects that had an open source license within their code repo.
Even though the open source multiplier had significant shortcomings, it remained popular with badgeholders: 80% of ballots included it.
Finding an applicable rule-set for classifying a project as open source posed a significant challenge. The rule-set used for this round was as follows:
-
The Github repo containing contract code must leverage an open source license, as defined by the Open Source Initiative, within their repo license.md
-
The Github repo containing contract code had to be created and made public before May 1st, 2024.
There was no requirement that solicited in-depth proof that the code in the Github repo provided was an exact match to the one deployed onchain. Furthermore, file licenses were not accounted for – an individual file in a repo might have a license identifier, that is not open source, that overrides the repo license.
Of the 94 projects which qualified for the open source multiplier, an audit by Bixbite and the Velodrome team identified 20 issues.
- 5 projects had not verified their contracts via a blockexplorer, making it impossible to verify that the code within the relevant Github repo matched the onchain contracts.
- 13 projects had one or multiple files within their repo with conflicting licenses.
- 2 projects did not have their full contract code within their Github repository.
At scale, it becomes unviable to audit each project’s code within the onchain builder round: It is necessary to develop a scalable methodology to identify open source projects.
“I believe there should be some reward here, but I’m yet unconvinced the framework we have for it is perfect.”
Rewarding impact retroactively
The goal of retroactively rewarding impact instead of proactively rewarding projects based on their future promises was underscored throughout the entire round 4 voting and evaluation process. One badgeholder noted, “[Retro Funding is] about rewarding impact directly, not rewarding the little guy.”
In this round, badgeholders did not have the opportunity to designate outsized rewards to projects that are “in need of rewards,” or “are just getting started.” The fact this topic came up within Retro Funding’s fourth iteration demonstrates how hard it is to align all voters on a retroactive rewards model.
Voting experience
The voting experience has seen strong improvements round over round, with badgeholders rating Round 4’s voting experience at 8/10, compared to 6/10 in Round 3.
Voting required significantly less time than in previous rounds as well, with badgeholders spending an average of 2 hours on voting in Round 4, compared to 16 hours in R
ound 3.
Some badgeholders suggested that showcasing project performance against metrics created a biased voting experience, where voters could reverse engineer a ballot to reward the projects they preferred. On the flipside, some voters voiced that measuring the performance of different projects against metrics gave them relevant insights into raw metric quality. Check out a deep dive on the badgeholder voting behavior, created by OpenSource Observer.
Sign up experience
Round 4 featured a narrow scope with explicit eligibility criteria. A significant share of applicants, around 40%, did not meet the requirements, and therefore were not eligible for rewards. Applicants could not easily verify if they met the eligibility criteria, which led to significant frustration with the process.
Contract attributions
When applying, builders had to prove they deployed the contracts they submitted. This involved having builders sign a message using the contract’s deployer address. ”Downstream contracts” were considered as well, which involved builders claiming factory contracts and receiving attribution for contracts deployed by the factory.
Some builders voiced concerns with this model. On one side, NFT creators who use platforms like Zora and Sound.xyz were not able to claim their NFTs as their contracts, as these are attributed to Zora’s/Sound’s factory contracts. However, both Zora and Sound.xyz forwarded a portion of their rewards to creators.
Builders that issue transactions on behalf of their users, a popular model for Farcaster Frames, also faced challenges with contract attributions, as their contracts did not meet the minimum requirement for unique address interactions. While unfortunate, there is no current methodology for easily verifying that the transactions were initiated by unique users.
As frontends and interfaces evolve, and as protocols become more modular, builders might increasingly start leveraging onchain contracts they do not own. The Optimism Collective should explore how it can reward user interfaces for their impact.
Resulting workstreams
This round has surfaced different workstreams necessary to improve impact measurement for future onchain builder rounds. Based on the feedback outlined above, we want to answer the following questions in our research:
- Humans in-the-loop: How might we leverage subjective human input to improve metrics? How can we align raw onchain data with subjective, qualitative evaluation?
- Trusted user models: How might we iterate on the trusted user model to serve different needs? What is the most appropriate balance between the narrow and broad scopes discussed above?
- EIP4337: How might we support transactions within the account abstraction value chain?
- Contract attribution & ownership verification: How might we upgrade our contract attribution and ownership verification to keep up with industry innovations?
- Dapps & frontends: How might we measure the impact of interfaces which allow users to interact with onchain applications?
- Open source classifier: How might we accurately classify contract code as open source? How might we measure the impact of open source contract code?
Final thoughts
Through this fourth round of Retro Funding, we were able to test the efficacy of quantitative impact measurement data in increasing voting accuracy and transparency. This first round showed the promise of this approach while it surfaced many shortcomings that will lead to more accurate impact measurement in the future
Stay up to date on Retro Funding and participate in discussions on round experiments and more through the Optimism Governance forum. We look forward to continuous iteration and improvement!
As always,
Stay Optimistic!