Experimenting with Futarchy* for Optimism Grant Allocation Decisions

Participating in the futarchy has been a really fun and interesting experience so far. :slightly_smiling_face:

I may add some reflections later, but for now:

Can you share the ballpark of how much superchain TVL increase the grant council expects from the different grantees?

It would be very interesting to know how these expert evaluations look as compared to those of the futarchy.

4 Likes

Hi all – lajarre from Butter here.

Thanks to all the participants!

The code used to compute 24-hour TWAP values is in this Github repo.

This implements the usual Uniswap v2 TWAP calculation partially off-chain for a 24-hour window ending on March 20 at 12:00 UTC.
It then transforms the resulting value into a 0-100% value using the invariant UP + DOWN = 1.

The repo includes the resulting files for transparency, in the ./data folder.
Happy to provide some more explanations if needed!

6 Likes

already forecasted hope to be amoung the top

1 Like

We’d like to share some reflections on the recent futarchy experiment.

Overall, we found it to be an engaging and promising initiative. It was enjoyable to participate in, and it represents an interesting first step toward leveraging futarchy to inform decision-making in the Collective.

Using TVL as the main evaluation metric aligned well with the goals of the Season, particularly the intention to grow the Superchain’s TVL. The approach focused directly on that metric which we appreciated.

We’re supportive of the use of futarchy for future rounds, but we believe it would benefit from clearer participation criteria. We share the view that having a diverse set of perspectives contributes positively to the process:

While inclusivity is a core value in Optimism, fully open participation can sometimes lead to lower-quality outcomes. For example, there were TVL predictions that seemed overly optimistic—some protocols with around $1M in Superchain TVL received 4x growth forecasts in just a few weeks, which felt somewhat unrealistic. Additionally, the lack of barriers to entry made it easy for teams to coordinate votes in favor of their own protocol, which could distort results.

We think it’s important to consider introducing mechanisms that ensure participants have sufficient context and alignment with the goals of the Collective. It might also be useful to analyze what percentage of participation came from Superchain-aligned builders, bootstrapers, and enthusiasts to better understand the level of meaningful engagement. We’re curious whether any participant composition criteria were considered during the selection process. For instance, was there any target ratio based on participant profiles, such as 50% Superchain builders, 35% bootstrapers, and 15% enthusiasts? Understanding whether such balances were taken into account could help improve future iterations of the experiment.

It’s understandable that setting up the smart contracts for this type of experiment involves significant technical complexity, but from the user’s perspective, having to sign six transactions just to interact with a single candidate project feels excessive. That said, the interface showed noticeable progress over time, which we appreciated.

We truly appreciate that the experiment was conducted on Unichain, where fees are negligible, but even so, we found this to be the clearest pain point in the entire experience.

In short, we see a lot of potential in this experimental approach. With a few refinements in participation structure and a bit more friction to encourage thoughtful engagement, futarchy could become a valuable tool in the Collective’s governance toolkit.

2 Likes

For reference, copying over from telegram the link to Butter’s Notion doc with includes the starting Superchain TVL’s for each protocol as of March 20th based on the 7-day trailing average.

1 Like

Thanks for sharing your feedback on the experiment.

To me, it seems that your question about “participant composition” was aimed to be addressed via the weights of how many OP-PLAY tokens different people started with. While anyone could participate, known Citizens and Retro Funders, etc, where given more OP-PLAY to start with, therefore having a larger influence. More details in @elizaoak’s initial post.

  • All forecasters will have the opportunity to influence additional grant decisions
  • Optimism governance contributors with OP attestions (including Citizens, Top 100 Delegates, Council and Committee members, Retro Funding Guest Voters, and NumbaNERDs) are eligible for higher amounts of PLAY tokens (voting power)

However, it’s not clear that those closer to Optimism in the past or with additional attestations have better information when it comes to these predictions.

So far, some of the initial best performers (traders) on the leaderboard didn’t have any extra attestations and only started with 50 OP-PLAY. But we’ll see how that holds when we get the final results!

1 Like

The calculations for the leaderboard is a bit of a mystery to me. Seems like half of the time results come out the way I would expect them to, and half of the time they don’t. This time they are really confusing.

Right now, we are only a few days into the evaluation period, but as of this moment, none of the protocols have reached the goals that the futarchy had set for them.

Since I have only shorts, I would expect to have only profits, no losses - but according to the leaderboard I have a 46% loss. How is that possible?

I would truly appreciate it if someone would explain the maths, or share a link to the official calculation. Maybe @elizaoak ?

As a general piece of feedback, I think it would be helpful in any future repetitions of such an experiment to openly share all algorithms used to calculate PnL for participants. This could help raise the general level of understanding of how the futarchy works, leading to more reliable voter behavior. My sense is that, this time, a lot of people may have been making predictions without a clear understanding of what their bets ‘mean’.

2 Likes

@Michael should be able to help explain leaderboard calcs

2 Likes

Hey @joanbp,

Responded in TG but will respond here for visibility.

The main point is that the leaderboard hasn’t been updated since trading stopped, and it was using the market price which was set by users (not based on fundamental TVL data).

In the future I was considering adding some kind of profit-estimation using the defillama TVL data where we could see as time goes on who is “winning”. But that hasn’t happened yet.

Secondary point is that when I pushed the update to integrate the daily protocol TVL growth updates it reset the leaderboard data to an older timestamp, but that is fixed now, the current data shown on the leaderboard is based on market predictions at market closing.

2 Likes

I thought I was using an extreme example to make a point in this post.

But it’s becoming reality.

The lower and lower OP token price is disturbing the Futarchy experiment more and more.

A cool, related thread from Shutter Network:

GM! :v:

Here are some thoughts that arose during the process of participating in the Futarchy experiment. Better late than never, right? :sweat_smile:

Of course, I join in the congratulations and widespread celebration of us exploring this prediction mechanism, which I’m sure will bring many contributions to the OP collective and the web3 ecosystem in general. :clap: :clap: :clap:

Here are three tensions and their corresponding suggestions from my humble perspective:

1.- Making it painful to spend money.

  • Tension: I found it somewhat counterintuitive to have a set amount of Play tokens for each project. I think it gave me a sense of false abundance that I wouldn’t experience if I had a limited amount of money.

  • Suggestion: Have a fixed amount that everyone decides how to use. Of course, that would require having the ability to “sell” an up/down token within a project’s market to seek profit and then jump to another market with that.

  • Bonus: Instead of UP/DOWN tokens (Polymarket style), it could be just a token with the project name that would start at the price of the project’s own estimate of its growth. I think that would make it more intuitive and easier for forcasters to understand.

2.- Make the prediction happen before defining the UP/DOWN ratio

  • Tension: When the experiment began, there wasn’t much published information about the projects and the use of resources to increase TVL. For this reason, my first UP/DOWN purchases were based more on balancing prices that I felt were exaggerated on both sides. In other words, I felt like I was betting on the entire project (and not analyzing how I would use the 100K grant). This, I think, encourages more mature projects with a higher level of mindshare to be better evaluated; which, ironically, may be the ones that need the grant the least .

  • Suggestion: It might be interesting if the platform’s UX would prompt you to review the projects’ proposals (one by one), and for forcasters to predict whether or not this will increase the TVL (and that this translates into the number of UP/DOWN tokens to buy), more like a blind bet, instead of starting with the current market price.

  • Bonus: I even think the initial market TWAP should be set by the projects themselves (in their proposal/info disclosure) instead of everyone starting with the same (50M) number. This way, one could analyze whether the project is under/overvaluing its TVL increase forecast. There could even be incentives for projects whose estimates are closer to reality (to prevent them from setting a very high initial amount).

3.- Make whales have less influence on price changes.

  • Tension: At some point, I realized how much influence I had on the market (if I swapped 100% of my UP tokens for DOWN tokens), giving me the feeling of being a whale (even though I started with 100 PLAY). I can already imagine what it will be like for those who had 500. At one point, I hesitated to sell because that would change the top five, which I find unrealistic in a market with so many participants. Furthermore, a ‘degen’ attitude could exploit this, looking only for their own personal gain (by using that caos) regardless of who wins the grant (and benefits the ecosystem).

  • Suggestion: Add extra (passive) liquidity to prevent such abrupt movements. I assume that could define the maximum percentage of influence a forecaster could have on the decision.

  • Bonus: I also thought it would be interesting, to add more realism to the issue, to add bots or players that act more randomly. I understand that Robin Hanson himself recognizes these actors as “noise traders” (the same ones that exist in the real market).


Of course, I assume that some of these suggestions have technical limitations that I’m unaware of, or may be based on misunderstandings. I hope they are read as an honest account of my experience as a participant, with the intention of making them more clear in the future.

Needless to say, I welcome any comments or corrections. :nerd_face:

1 Like

In principle, making TVL predictions through Futarchy from within Japan could be considered gambling and may lead to criminal liability.
It should be fine if only valueless tokens, like experimental tokens, are being staked.

However, in actual production use, it’s important to be aware of the number of countries where this might be illegal. If that number is significant, it would be necessary to issue appropriate warnings.

3 Likes