Experimenting with Futarchy* for Optimism Grant Allocation Decisions

Participating in the futarchy has been a really fun and interesting experience so far. :slightly_smiling_face:

I may add some reflections later, but for now:

Can you share the ballpark of how much superchain TVL increase the grant council expects from the different grantees?

It would be very interesting to know how these expert evaluations look as compared to those of the futarchy.

5 Likes

Hi all – lajarre from Butter here.

Thanks to all the participants!

The code used to compute 24-hour TWAP values is in this Github repo.

This implements the usual Uniswap v2 TWAP calculation partially off-chain for a 24-hour window ending on March 20 at 12:00 UTC.
It then transforms the resulting value into a 0-100% value using the invariant UP + DOWN = 1.

The repo includes the resulting files for transparency, in the ./data folder.
Happy to provide some more explanations if needed!

7 Likes

already forecasted hope to be amoung the top

2 Likes

We’d like to share some reflections on the recent futarchy experiment.

Overall, we found it to be an engaging and promising initiative. It was enjoyable to participate in, and it represents an interesting first step toward leveraging futarchy to inform decision-making in the Collective.

Using TVL as the main evaluation metric aligned well with the goals of the Season, particularly the intention to grow the Superchain’s TVL. The approach focused directly on that metric which we appreciated.

We’re supportive of the use of futarchy for future rounds, but we believe it would benefit from clearer participation criteria. We share the view that having a diverse set of perspectives contributes positively to the process:

While inclusivity is a core value in Optimism, fully open participation can sometimes lead to lower-quality outcomes. For example, there were TVL predictions that seemed overly optimistic—some protocols with around $1M in Superchain TVL received 4x growth forecasts in just a few weeks, which felt somewhat unrealistic. Additionally, the lack of barriers to entry made it easy for teams to coordinate votes in favor of their own protocol, which could distort results.

We think it’s important to consider introducing mechanisms that ensure participants have sufficient context and alignment with the goals of the Collective. It might also be useful to analyze what percentage of participation came from Superchain-aligned builders, bootstrapers, and enthusiasts to better understand the level of meaningful engagement. We’re curious whether any participant composition criteria were considered during the selection process. For instance, was there any target ratio based on participant profiles, such as 50% Superchain builders, 35% bootstrapers, and 15% enthusiasts? Understanding whether such balances were taken into account could help improve future iterations of the experiment.

It’s understandable that setting up the smart contracts for this type of experiment involves significant technical complexity, but from the user’s perspective, having to sign six transactions just to interact with a single candidate project feels excessive. That said, the interface showed noticeable progress over time, which we appreciated.

We truly appreciate that the experiment was conducted on Unichain, where fees are negligible, but even so, we found this to be the clearest pain point in the entire experience.

In short, we see a lot of potential in this experimental approach. With a few refinements in participation structure and a bit more friction to encourage thoughtful engagement, futarchy could become a valuable tool in the Collective’s governance toolkit.

4 Likes

For reference, copying over from telegram the link to Butter’s Notion doc with includes the starting Superchain TVL’s for each protocol as of March 20th based on the 7-day trailing average.

2 Likes

Thanks for sharing your feedback on the experiment.

To me, it seems that your question about “participant composition” was aimed to be addressed via the weights of how many OP-PLAY tokens different people started with. While anyone could participate, known Citizens and Retro Funders, etc, where given more OP-PLAY to start with, therefore having a larger influence. More details in @elizaoak’s initial post.

  • All forecasters will have the opportunity to influence additional grant decisions
  • Optimism governance contributors with OP attestions (including Citizens, Top 100 Delegates, Council and Committee members, Retro Funding Guest Voters, and NumbaNERDs) are eligible for higher amounts of PLAY tokens (voting power)

However, it’s not clear that those closer to Optimism in the past or with additional attestations have better information when it comes to these predictions.

So far, some of the initial best performers (traders) on the leaderboard didn’t have any extra attestations and only started with 50 OP-PLAY. But we’ll see how that holds when we get the final results!

2 Likes

The calculations for the leaderboard is a bit of a mystery to me. Seems like half of the time results come out the way I would expect them to, and half of the time they don’t. This time they are really confusing.

Right now, we are only a few days into the evaluation period, but as of this moment, none of the protocols have reached the goals that the futarchy had set for them.

Since I have only shorts, I would expect to have only profits, no losses - but according to the leaderboard I have a 46% loss. How is that possible?

I would truly appreciate it if someone would explain the maths, or share a link to the official calculation. Maybe @elizaoak ?

As a general piece of feedback, I think it would be helpful in any future repetitions of such an experiment to openly share all algorithms used to calculate PnL for participants. This could help raise the general level of understanding of how the futarchy works, leading to more reliable voter behavior. My sense is that, this time, a lot of people may have been making predictions without a clear understanding of what their bets ‘mean’.

4 Likes

@Michael should be able to help explain leaderboard calcs

3 Likes

Hey @joanbp,

Responded in TG but will respond here for visibility.

The main point is that the leaderboard hasn’t been updated since trading stopped, and it was using the market price which was set by users (not based on fundamental TVL data).

In the future I was considering adding some kind of profit-estimation using the defillama TVL data where we could see as time goes on who is “winning”. But that hasn’t happened yet.

Secondary point is that when I pushed the update to integrate the daily protocol TVL growth updates it reset the leaderboard data to an older timestamp, but that is fixed now, the current data shown on the leaderboard is based on market predictions at market closing.

3 Likes

I thought I was using an extreme example to make a point in this post.

But it’s becoming reality.

The lower and lower OP token price is disturbing the Futarchy experiment more and more.

1 Like

A cool, related thread from Shutter Network:

1 Like

GM! :v:

Here are some thoughts that arose during the process of participating in the Futarchy experiment. Better late than never, right? :sweat_smile:

Of course, I join in the congratulations and widespread celebration of us exploring this prediction mechanism, which I’m sure will bring many contributions to the OP collective and the web3 ecosystem in general. :clap: :clap: :clap:

Here are three tensions and their corresponding suggestions from my humble perspective:

1.- Making it painful to spend money.

  • Tension: I found it somewhat counterintuitive to have a set amount of Play tokens for each project. I think it gave me a sense of false abundance that I wouldn’t experience if I had a limited amount of money.

  • Suggestion: Have a fixed amount that everyone decides how to use. Of course, that would require having the ability to “sell” an up/down token within a project’s market to seek profit and then jump to another market with that.

  • Bonus: Instead of UP/DOWN tokens (Polymarket style), it could be just a token with the project name that would start at the price of the project’s own estimate of its growth. I think that would make it more intuitive and easier for forcasters to understand.

2.- Make the prediction happen before defining the UP/DOWN ratio

  • Tension: When the experiment began, there wasn’t much published information about the projects and the use of resources to increase TVL. For this reason, my first UP/DOWN purchases were based more on balancing prices that I felt were exaggerated on both sides. In other words, I felt like I was betting on the entire project (and not analyzing how I would use the 100K grant). This, I think, encourages more mature projects with a higher level of mindshare to be better evaluated; which, ironically, may be the ones that need the grant the least .

  • Suggestion: It might be interesting if the platform’s UX would prompt you to review the projects’ proposals (one by one), and for forcasters to predict whether or not this will increase the TVL (and that this translates into the number of UP/DOWN tokens to buy), more like a blind bet, instead of starting with the current market price.

  • Bonus: I even think the initial market TWAP should be set by the projects themselves (in their proposal/info disclosure) instead of everyone starting with the same (50M) number. This way, one could analyze whether the project is under/overvaluing its TVL increase forecast. There could even be incentives for projects whose estimates are closer to reality (to prevent them from setting a very high initial amount).

3.- Make whales have less influence on price changes.

  • Tension: At some point, I realized how much influence I had on the market (if I swapped 100% of my UP tokens for DOWN tokens), giving me the feeling of being a whale (even though I started with 100 PLAY). I can already imagine what it will be like for those who had 500. At one point, I hesitated to sell because that would change the top five, which I find unrealistic in a market with so many participants. Furthermore, a ‘degen’ attitude could exploit this, looking only for their own personal gain (by using that caos) regardless of who wins the grant (and benefits the ecosystem).

  • Suggestion: Add extra (passive) liquidity to prevent such abrupt movements. I assume that could define the maximum percentage of influence a forecaster could have on the decision.

  • Bonus: I also thought it would be interesting, to add more realism to the issue, to add bots or players that act more randomly. I understand that Robin Hanson himself recognizes these actors as “noise traders” (the same ones that exist in the real market).


Of course, I assume that some of these suggestions have technical limitations that I’m unaware of, or may be based on misunderstandings. I hope they are read as an honest account of my experience as a participant, with the intention of making them more clear in the future.

Needless to say, I welcome any comments or corrections. :nerd_face:

3 Likes

In principle, making TVL predictions through Futarchy from within Japan could be considered gambling and may lead to criminal liability.
It should be fine if only valueless tokens, like experimental tokens, are being staked.

However, in actual production use, it’s important to be aware of the number of countries where this might be illegal. If that number is significant, it would be necessary to issue appropriate warnings.

5 Likes

:trophy: Announcing Forecaster Rewards

UPDATED June 26, 2025: Results table corrected due to calculation error where the rewards script incorrectly processed percent profit values as decimals (e.g., 790% read as 7.9 instead of 790), significantly underweighting the percentage profit component. Rankings and reward distributions have been adjusted accordingly on the table below. The methodology and weighting (25% nominal / 75% percentage) remains unchanged as originally announced. Original Twitter announcement reflects outdated results.

Thank you again to everyone who participated in in our Futarchy Grants Contest, which officially wrapped up at the end of Optimism Governance Season 7 on June 12 with the play money markets closing. As announced in February prior to the experiment, “At the end of Season 7 (June 2025), top forecasters will be eligible for rewards, based on the accuracy of their predictions.”

We are happy to announce the following forecaster reward distribution:

:1st_place_medal:First place: 3K OP

:2nd_place_medal:Second place: 2K OP

:3rd_place_medal:Third place: 1K OP

:trophy:4-30th place: 4K OP distributed based on prediction accuracy (combination of profit % and nominal profit)

These reward amounts are not indicative of rewards in future iterations.

Final Ranking Address Nominal Profit % Profit Final Reward (OP)
1 skydao.eth 395.34 790.67% 3,000.00
2 0x87b7f62ce23a8687eaf0e2c457ad0c22ca3554bf 102.11 204.22% 2,000.00
3 beefybadger.eth 86.37 172.74% 1,000.00
4 0xe422d6c46a69e989ba6468ccd0435cb0c5c243e3 263.68 75.34% 241.48
5 galechus.eth 61.80 123.59% 213.31
6 alexsotodigital.eth 106.37 106.37% 209.81
7 launamu.eth 242.30 53.85% 199.15
8 pgov.eth 155.24 77.62% 191.38
9 brichis.eth 223.78 49.73% 183.92
10 pumbi.eth 91.16 91.16% 179.81
11 graven.eth 51.94 103.87% 179.27
12 james.eth 124.47 71.13% 166.61
13 mubaris.eth 47.26 94.53% 163.15
14 0x3750b281e14b0fcd4880399210ce03a7f570fbd7 47.04 94.09% 162.39
15 0x8bb5bd528c067708a206cac909f13522d9390da0 45.11 90.22% 155.71
16 kapanv.eth 41.77 83.55% 144.20
17 trsantos.eth 38.60 77.19% 133.23
18 0xfd1af514b8b2bf00d1999497668bff26ccdf4c8a 37.88 75.76% 130.76
19 0x5de5f5fea5af07d799a9eb41f436cdf79ccfa133 37.07 74.14% 127.96
20 0x5bbdab0419f4a7811e02a0d3bdfadf3b5dbdeab0 35.43 70.85% 122.28
21 zenbit.eth 34.69 69.39% 119.76
22 gr8collector.eth 34.51 69.03% 119.14
23 ailadady.eth 32.66 65.32% 112.74
24 0xjean.eth 31.66 63.32% 109.28
25 fenya.eth 31.32 62.64% 108.11
26 cryptoleks.eth 31.01 62.01% 107.03
27 0x79b2562f934cfebaecdd643ed39165863c417b0d 30.71 61.43% 106.02
28 0x9080f29e6ad496ff50c398afc7326e192973d155 30.71 61.41% 105.99
29 0x0936289d6c53b6907001f6f5060084de52edd99b 30.07 60.13% 103.78
30 tekr0x.eth 30.05 60.10% 103.73

Note: Final results were updated on June 26 to correct a technical error in how percentage values were processed by our rewards calculation script. The methodology and weighting (25% nominal / 75% percentage) remains unchanged as originally announced.

We will be in touch with recipients via email to kick off the KYC and rewards delivery process.

To define prediction accuracy and determine these rankings, we computed a composite score called Combined Profit, which incorporates both nominal profit and percent profit. We weighted percent profit at 75% and nominal profit at 25% to reflect a balance between forecasting efficiency and overall impact on price.

This weighting was chosen to mitigate the outsized influence of participants who held more play tokens and thus had greater profit potential, while still acknowledging that scale carries some value. To recap, forecasters with select OP gov attestations were eligible for higher baseline amounts of play tokens, in an effort to proxy for higher conviction in a play money setting with no real money deposits to quantify conviction and expertise.

Percent profit serves as a proxy for accuracy relative to resources used, making it especially relevant in our context, where many high-performing forecasters had no access to OP attestations and thus couldn’t increase their play token allocation. Thus, Combined Profit is calculated as follows:

Combined Profit_i = 0.25(Nominal Profit_i) + 0.75(Percent Profit_i)

If someone is ranked first, second, or third by Overall Profit, they receive a fixed amount of 3K, 2K, 1K OP respectively. For 4th-30th ranked forecasters, we calculate reward share from the remaining pool of 4K OP based on Overall Profit:

Reward Share_i = (Combined Profit_i / Total Combined Profit_N) * 4000

Note that in the Final Rewards calculations, we used the final on-chain data directly from Butter. Note that the Leaderboard PnL was indicative but we used Butter’s final on-chain data for final reward amounts. Final PnL calculations apply on-chain CFM contract rules, which are based on rewarding participants based on the final Superchain TVL Increase values, observed on June 12th and reported via an oracle on-chain. Earlier leaderboard calculations from Butter’s script might have varied slightly due to TVL values being based on approximations. The Leaderboard has been updated to reflect the final on-chain values for Nominal Profit and % Profit.

& For everyone who participated: Please take 5 mins to complete this forecaster experience survey and help us improve future versions. Of course, don’t hesitate to share your thoughts on this experience including the rewards calculation and as always, stay optimistic! :red_circle::sparkles:

3 Likes

Congrats to all the winners. And well done to the other participants ( there are no losers IMO). It was indeed a fun experience, engaging in the community, learning about the mechanics of futarchy, and following the progress until resolution.

However, at the risk of sounding entitled, the rewards from OP are quite disappointing. Doesn’t really suggest much interest or enthusiasm about the Futarchy experiment.

This was a well-publicised campaign, and rewards were a key factor in that promise.

The participation process was demanding with analysis and research of 22 projects, voting Yes and No on them multiple times, with each transaction needing to be signed 6 times. This was a month-long process. The signing process was so atrocious that for me all 476 participants deserve at least a 100 OP just for participating.

The winners haven’t fared better, rewarding the top 30 of 476 whilst sharing only 4K OP amongst all those outside the top 3 is flat out wishy washy. I mean a lot more reward has gone out for minting $1 NFTs, so probably just a matter of priorities.

A lot of people probably would have participated in exploring a concept they were curious about, Futarchy, but to gamify it, heighten expectations for rewards, constant check-ins over 3 months and then not really deliver any reward feels a bit off.

Definitely feels like it might have been better to play with real money, but even then as Richard Hanson points out in a recent article(will share), organisations seeking the help of a decision market to find the best path should in fact commit to providing substantial liquidity to the markets as a form of expense in the quest for accuracy. The equivalent of such liquidity in a market with play tokens should be rewards that incentivise truthful and rigorous participation.

Perhaps, we might need a Futarchy for “Futarchy Rewards”

@Rb.E thanks for your thoughtful feedback. A futarchy for futarchy rewards does sound interesting..

& We’re sorry to hear that you’re disappointed with the rewards amounts and genuinely appreciate your participation. We were working within many non-negotiable constraints during this experiment, and the rewards budget was one of them.

We hope to be able to have much more flexibility on things like reward amounts if we run a future experiment.

Btw, you mentioned a recent Hanson article on the topic of liquidity and accuracy – pls feel free to share that here!

1 Like