Futarchy v1 Preliminary Findings

Optimism’s v1 experiment with Futarchy (our “Futarchy Grants Contest”) has officially concluded with the closure of Butter’s play money markets at the end of Optimism governance Season 7 on June 12, 2025. To recap, the Futarchy experiment ran parallel to a traditional Optimism Grants Council cycle, with each decision-making cohort (Futarchy and Grants Council) selecting 5 projects to each receive one 100K OP grant aimed at increasing Superchain TVL. Our overarching questions, as outlined in the initial forum post, were:

  • Is Futarchy a more effective way to make grant allocation decisions than existing approaches?
  • Specifically, do projects selected via Futarchy see a greater increase in Superchain TVL than projects selected by the Optimism Grants Council?

This post summarizes some of the key learnings from this first experiment with Futarchy for Collective decision-making, including descriptive forecaster participation patterns, indications of prediction accuracy of forecasters, and an observational comparison between Futarchy and Grants Council project performance based on increase in Superchain TVL 3 months after the experiment.

In a separate analysis (forthcoming as a comment to the S7 Grants Council Impact Analysis), we also share findings from a preliminary impact analysis of the effects of Futarchy grants allocation on subsequent Superchain TVL growth. While sample size constraints, treatment spillover effects, confounding variables (i.e., many other factors influencing Superchain TVL), and market volatility limit our ability to make definitive causal claims, the observational data provides interesting insights about new decision making methods and sets the stage for future iterations and learnings, in addition to the many valuable operational and product learnings from v1.

Who Participated in Futarchy v1?

Our Futarchy experiment attracted 430 forecasters after filtering out 4,122 suspected bots from voluntary sign-ups (likely sybils proved to be a persistent challenge given the play money environment). Overall, forecasters demonstrated consistent engagement throughout the experiment, with 5,898 total trades (we will follow-up with a more granular analysis of the distribution of trades per forecasters). Unique visitors to Butter’s site spanned 10 countries (across 4 continents). New daily users ranged from 1 to 79 (average number of new users/ day was 36).


Source: Butter

Forecaster backgrounds varied significantly: 88.6% were first-time Optimism governance participants who received the baseline 50 OP-PLAY (play token) allocation, while only 11.4% were existing Optimism community members who earned additional OP-PLAY tokens through OP attestations. In contrast to what we saw, we initially anticipated that Optimism gov participants would participate at higher rates given they were eligible for a higher starting play token balance, had higher context about the Superchain-growth-focused question and metric, and had shown interest in participating in other previous governance initiatives. Based on signup information on affiliated organization, self-reported backgrounds ranged from various crypto projects or DAOs to many reporting ”none” or “unaffiliated.”

:light_bulb:Key takeaway: We successfully attracted over 400 active and diverse forecasters, with the majority coming from outside our governance ecosystem.

Futarchy versus Grants Council Comparison

The experiment required the Futarchy contest and the Optimism Grants Council to each select 5 projects, from the same total set of 22 projects, to receive a grant. Futarchy and the Grants Council picked 2 of the same projects (Rocket Pool and SuperForm), leaving 3 unique selections for comparison.

Given that there was a max award of one 100K OP grant per project, some of the Grants Council recipients that ultimately received a grant were actually ranked outside of the initial Grants Council Top 5 ranking. For the comparative analysis, we compare the initial Top 5 selections for each group which are listed in the table below.

Futarchy Top 10 Rankings

Rank Project Predicted Superchain TVL Increase after 84 days Grant Amount Received from Futarchy
1 Rocket Pool $59M 100k OP
2 SuperForm $48M 100k OP
3 Balancer & Beets $47M 100k OP
4 Avantis $44M 100k OP
5 Polynomial $41.2M 100k OP
6 Silo $41.2M -
7 Extra Finance $39.7M -
8 Reservoir Protocol $38.8M -
9 Sushiswap $37.7M -
10 Gyroscope $37.4M -

Grants Council Top 10 Rankings

Rank Project Grants Council Aggregate Ranking Score Grant Amount Received from Grants Council
1 Extra Finance 34 100k OP
2 Rocket Pool 27 (100k OP from Futarchy)
3 SuperForm 26 (100k OP from Futarchy)
4 Gyroscope Protocol 19 100k OP
5 Reservoir Protocol 17 100k OP
6 Balancer & Beets 16 (100k OP from Futarchy)
7 QiDAO 11 100k OP
8 Silo 10 100k OP
9 HAI Protocol (Reflexer) 9 -
10 SynFuture 7 -

Comparing Futarchy vs Grants Council Project Selection

Our goal of this comparative analysis is to explore which decision-making method selected projects that saw a higher increase in Superchain TVL after ~3 months. Increase in Superchain TVL is our pre-specified outcome of interest that stems from the Optimism Season 7 intent, and the ~3 month (84 day) time period was constrained by operational launch delays pushing the start date mid-March rather than February, and needing to end by June 12 to align with the the end of Optimism Governance Season 7.

Examining Superchain TVL data, we see that the Futarchy cohort showed significantly higher aggregate TVL growth ( a difference of about $32.5 Million in TVL). This difference is driven primarily by Futarchy selecting Balancer & Beets on Base — an outlier that the Grants Council ranked 6th, narrowly missing their Top 5 cutoff by one point on their scoring rubric. Note that because the Grants Council ranked Balancer & Beets 6th (just missing the Top 5 cutoff), the margin between these decision-making mechanisms may be narrower than the raw TVL difference suggests, especially if we repeated this many times or increased the n of grant recipients. Caution should always be used when comparing very small sample sizes (3 unique projects in each group).


Source: OSO; data from DefiLlama

Another takeaway from comparing Top 10 rankings in each decision-making cohort is the fact that overall, 6/10 of the Top 10 (of 22 total projects) for each cohort overlapped, though precise ranking varied. Both Futarchy and the Grants Council ranked Rocketpool highly (first and second, respectively), which has remained at $0 Superchain TVL. It appears there was some confusion around Rocketpool’s efforts to incentivize rETH (RocketPool-staked ETH) Superchain TVL versus increasing RocketPool (the protocol) Superchain TVL.

(Our definition, as displayed on Butter’s UI: Difference in 7-day trailing average TVL on Major Superchain L2s between 2025-03-20 and 2025-06-12 in USD as reported on DefiLlama. “Major Superchain L2s” are defined as: BOB, Base, Ink, Lisk, Mode, OP Mainnet, Polynomial, Soneium, Swellchain, Unichain, World Chain.)

Finally, examining TVL by chain, the increase in Superchain TVL occurred primarily on Base followed by Optimism.


Source: OSO; data from DefiLlama

:light_bulb:Key takeaway: Futarchy and Grants Council selected 2 of the same projects. Of the remaining 3, Futarchy outperformed Grants Council in terms of identifying and selecting projects that drove a greater 84-day Superchain TVL increase. This effect was driven primarily by Futarchy selecting outlier Balancer & Beets.

The v1 data offers suggestive evidence that Futarchy at least does not select very different, or any worse (in terms of Superchain TVL increase), projects than the Grants Council.

In addition, as depicted in the above table, Futarchy demonstrated higher variance in project selection outcomes. While Futarchy successfully identified the top 2 Superchain TVL performers, it also selected the single worst-performing project out of all projects selected by Futarchy, Grants Council, or both cohorts. Grants Council showed more consistent but moderate results and selected projects that performed closer to the median, avoiding major failures while also missing top performers.

  • This pattern suggests that forecasters may have been more willing to bet on high-risk, high-reward projects
  • While the Grants Council committee structure may have favored lower-variance selections with more predictable outcomes.

Both mechanisms failed to identify several mid-tier winners that saw substantial TVL growth.

:light_bulb:Key takeaway: Futarchy demonstrated higher variance in project selection outcomes than Grants Council selections.

Access the data to dive deeper here :right_arrow_curving_down::

Note that both OSO and Butter source their data from DefiLlama, though small discrepancies in Superchain TVL increases may arise from differing date-interpolation methods and asset USD valuations at query time.

Prediction Accuracy

Another interesting data point lies in the forecaster predictions themselves, as a key goal of Futarchy is to drive more accurate information aggregation. In our play money experiment, Futarchy predictions significantly overestimated actual TVL growth across all projects. Market participants predicted substantially higher TVL increases than projects themselves estimated.This may have been due to the play money structure creating no downside risk for inflated predictions. Additionally, unlike prediction markets, decision markets allow forecasters to influence actual future outcomes — in this case, allocating multiple 100K OP grants - which may incentivize participating projects to game outcomes.

Additionally, the 50M play token initial liquidity amount may have served as an anchor that skewed predictions and impacted price discovery. Strategic voting to influence Top 5 grant allocations may have further distorted forecast accuracy. We do not have Grants Council data on predicted TVL increase amounts as this is not part of their typical assessment process.


Source: OSO; data from DefiLlama

:light_bulb:Key takeaway: One of our goals was to explore a novel mechanism to make better predictive decisions (eg allocating grants). If we look at the actual predictions (predicted TVL increase), final TVL was approximately 200M too high.

Next steps: Improving upon v1

A key learning from v1 was that prediction accuracy was low in our Futarchy Contest. We believe future iterations could benefit from incorporating real skin-in-the-game incentives to improve the accuracy of predictions. We could also explore stronger experimental design improvements such as using an outcome metric less influenced by external forces, implementing longer evaluation periods to better asses grant impact, and expanding sample sizes for higher-powered comparisons to continue to study the differences between various decision-making structures. Specifically:

  • Real skin-in-the-game incentives: Future iterations could improve forecaster accuracy by incorporating downside for inaccurate predictions (i.e., real money incentives), especially when the incentives to influence future decisions are strong.
  • Forecaster quality > quantity: v2 could improve by prioritizing overall forecaster quality over quantity through enhanced sybil protection and clearer instructions and project information shared with forecasters. Real skin-in-the-game directly addresses the sybil problem and incentivizes high-quality forecasters to participate
  • Stronger experimental design = higher confidence learnings: If we want to better compare decision-making structures, or answer questions about the impact of grant decisions on subsequent outcome metrics, we should design future experiments accordingly, eg
    • Increase sample size
    • Improve wording of the decision question
    • Thoughtfully select an outcome metric that is less influenced by external factors

Wrapping this all up

Exploring our v1 Futarchy data suggests that Futarchy performs at least as well as traditional decision-making structures (i.e. the Grants Council) at identifying which projects would most increase Superchain TVL ~3 months after receiving a grant. There is suggestive evidence that Futarchy selected higher-performing projects — at least in the context of this single experiment. To recap several key takeaways:

  • What Futarchy did well: Identify projects with largest Superchain TVL increase after 84 days
  • What Futarchy did not do well: Accurately predict the amount that projects would increase in Superchain TVL
  • Futarchy selections showed higher variance in project TVL increase than Grants Council selections.
  • Please note that an impact analysis of the projects selected is part of a separate forthcoming analysis

Moving forward, given that Futarchy is a more decentralized, scalable, and transparent decision-making structure than existing alternatives, we’re interested in continuing to learn more about how and when it might make sense for the Optimism Collective to lean into Futarchy.

We believe this could make sense for decisions that:

  • Benefit from the wisdom of the crowds (concave decisions)
  • Need incentivized accuracy to draw out otherwise inaccessible information
  • Where scalability is important

The council structure may continue to be appropriate for decisions that:

  • Suffer from forced compromises among large groups of non-experts (convex decisions)
  • Require a high amount of historical or Optimism specific context
  • Are supported by high touch processes

Of course, a huge thank you to everyone who participated (forecasters — please take 5 mins to complete the forecaster experience survey here). While we still have much to learn about Futarchy in practice, you have helped us take a big step forward. As always, please feel free to drop any thoughts or suggestions, in the comments below :folded_hands: :right_arrow_curving_down:

4 Likes

For the sake of understanding what drove these numbers, could you please also see what they look like in other reference currencies? Euros and ETH would both be useful.

It’s possible that a lot of this is price appreciation within a DEX or even currency effects. For example, the USD is down a full 10% vs the Euro from the opening of Futarchy to today.

It also looks like the Futarchy market was burdened with not just choosing directionally, but also actual numbers. It looks like it predicted ~$239m TVL increase, but with a realized gain of ~$31m.

Finally, t’s also important to remember that the Grants Council’s expertise is in sourcing, filtering, and closely evaluating specific grant plans. This had strong results.

Contrast with Futarchy, where Grants Council was provided a list of projects without details on what those projects would do with any grant. In fact, two of the Grants Council’s selections apparently had no way to actually accrue TVL according to the methodology, yet were still included on the list of choices. This highlights the low-information environment the GC was asked to select in, when the GC is normally tasked with selecting from a high-information environment, which is just a different skill set.

In short, there was no coherent selection process on the Grants Council side. Speaking only for ourselves, we did not even vote for a full slate of choices because we were essentially presented with names on a spreadsheet without much else, opting to leave voting slots blank. With no upside, no downside, and no information, we would stress that the Grants Council makes a poor control group for this experiment.

The better control would probably be random selection from the list at the start of the season?

Anyway, interesting data, but it’s hard to draw many direct conclusions given the limitations. We would recommend trying to figure out how to mitigate those limitations prior to running another round.

2 Likes

Contrast with Futarchy, where Grants Council was provided a list of projects without details on what those projects would do with any grant. In fact, two of the Grants Council’s selections apparently had no way to actually accrue TVL according to the methodology, yet were still included on the list of choices. This highlights the low-information environment the GC was asked to select in, when the GC is normally tasked with selecting from a high-information environment, which is just a different skill set.

In short, there was no coherent selection process on the Grants Council side. Speaking only for ourselves, we did not even vote for a full slate of choices because we were essentially presented with names on a spreadsheet without much else, opting to leave voting slots blank. With no upside, no downside, and no information, we would stress that the Grants Council makes a poor control group for this experiment.

To address this element of the experiment - the Grants Council was given full autonomy as to how to run this application process (for the projects also participating in the futarchy contest) and the Foundation advised the Grants Council to run the application process for these projects as the Council usually does. That was, to your point, because the Grants Council usually does have access to more information rich applications, which is part of what we wanted to compare between the different methods of decision making. If the Grants Council chose to run the application process for these projects any differently than usual, that was a decision made by the council. We agree that would reduce the difference in access to information between the two methods, which is something we were interested in observing.

The Grants Council’s expertise is in sourcing, filtering, and closely evaluating specific grant plans.

To underscore what the findings above indicate, I agree that the ability to source, provide higher-touch service, build relationships, and negotiate individual grant terms/plans is a strength of the council method of decision making. Futarchy has different strengths (and different weaknesses) and, therefore, it is likely each method is most effectively applied to different types of decision making.

Finally, we intentionally partnered with Uniswap on this experiment to allow us to learn more when they launch their version shortly. They were able to refine their experiment based on learnings from ours (they changed the prediction question, etc.) and we’ll be able to observe how their experiment performs without the constraint of play money. All of the shared learnings will also be considered when evaluating what a v2 might look like and if it makes sense in light of the given set of constraints.

1 Like