Retro Funding S7 – Onchain Builders – Eval Algos

Hey everyone, Carl from Open Source Observer here, sharing the evaluation algorithms for Retroactive Public Goods Funding (Retro Funding) Season 7: Onchain Builders. The Onchain Builders Mission seeks to reward protocols and dapps that demonstrate meaningful onchain usage, drive TVL growth, and support interop across the Superchain. See here for more on the mission’s objectives.

This season, Optimism’s Retro Funding is pioneering a “metrics-driven, humans-in-the-loop” approach. Instead of voting on projects, citizens vote on evaluation algorithms. Each algorithm represents a different strategy for rewarding projects. Algorithms will evolve in response to voting, community feedback, and new ideas over the course of the season.

Here’s a tl;dr of the three algorithms:

Algorithm Goal Best For Emphasis
Superscale Reward clear-cut leaders Large, established projects with high usage Adoption (favors projects with high recent activity)
Acceleratooor Prioritize fast-growing projects New and emerging projects gaining traction Growth (favors projects with the largest net increase in impact)
Goldilocks Achieve a more balanced distribution Consistently active projects with steady engagement Retention (favors projects maintaining impact over time)

Metrics Overview

Project-Level Metrics

Each project’s score is based on four key metrics:

  1. TVL. The average Total Value Locked (in USD) during the measurement period, focusing on ETH, stablecoins, and eventually other qualified assets.
  2. Transaction Counts. The count of unique, successful transaction hashes that result in a state change and involve one or more of a project’s contracts on the Superchain.
  3. Gas Fees. The total L2 gas (gas consumed * gas price) for all successful transactions that results in a state change and involve one or more of a project’s contracts on the Superchain.
  4. Monthly Active Users. The count of unique Farcaster IDs linked to addresses that initiated an event producing a state change with one or more of a project’s contracts.

Time-Based Variants

For each core metric (i.e., transactions, gas fees, TVL, user counts), we present three variants for comparing the values for the current period vs. the previous period. Each algorithm we present has a different weighting of these variants.

  1. Adoption. The value for the current period only (e.g., the current month’s transaction count). This captures the current level of adoption.
  2. Growth. The positive difference between the current period and the previous period (e.g., net increase in TVL). Values are clipped at 0 if they decrease (no penalty for a decline, but no bonus, either).
  3. Retention. The minimum between the current period and the previous period (e.g., how much of last month’s TVL is still here). This variant rewards projects that post more consistent metrics over time.

Proposed Algorithms

We’ve developed an initial three candidate algorithms for evaluating impact. Each algorithm highlights different priorities: Superscale focuses on established, high-volume projects, Acceleratooor emphasizes rapid growth, and Goldilocks tries to balance adoption, growth, and retention.

We’ve included details on each algorithm below. We also have a Hex dashboard where you can see data on each algorithm.

Important: as projects are still signing up for the current measurement period, we simulated the results for each algorithm using past Retro Funding participants and historical data. The “top projects” below are based on applications from RF4 and data from Dec / Jan.

Superscale

This algorithm aims to reward projects with significant current usage and established impact. It places heavier weights on the most recent values of TVL and transaction metrics, and less weight on user numbers. This strategy aims to embody the philosophy of “it’s easier to agree on what was useful than what will be useful”. You should vote for this algorithm if you want to keep things simple and give the projects that are having the most impact right now the recognition they deserve.

Weightings & Sample Results

Top projects (using simulated data):

  1. Aerodrome
  2. Aave
  3. Virtuals Protocol
  4. Moonwell
  5. Zora

Acceleratooor

This algorithm seeks to reward projects experiencing rapid growth over the current measurement period. In particular, it emphasizes growth (i.e., net increases) in TVL and transaction volume. The goal is to spot breakout stars and accelerate them. This is a good algorithm to vote for if you want to create a strong signal for rising projects that the Superchain is the place to be.

Weightings & Sample Results

Top projects (using simulated data):

  1. Aave
  2. Aerodrome
  3. Virtuals Protocol
  4. Compound Finance
  5. Paragraph

Goldilocks

This algorithm seeks to evenly balance various aspects of impact. It places a moderate weight on each metric and prioritizes retention over sheer growth. The goal is to support steady, sustained contributions across the board rather than “spiky” projects that only excel in one area. This is a good algorithm to vote for if you want to support a wide range of projects.

Weightings & Sample Results

Top projects (using simulated data):

  1. Aerodrome
  2. Zora
  3. Aave
  4. Layer3
  5. LI. FI

How to Vote

Now it’s time for Citizens to choose which algorithm to use.

Here is the link to the Snapshot vote for Citizens (Badgeholders) to decide which algorithm(s) best aligns with our collective goals.

One of the proposed evaluation algorithms for the Onchain Builders mission will be chosen using approval voting. This means you can approve one or multiple options. For an algorithm to be selected, it must meet the following conditions:

  • Quorum: At least 30% (44) of badgeholders must participate in the vote.
  • Approval Threshold: An algorithm must receive approval from at least 51% of quorum (22 badgeholders)

The algorithm with the most approvals, provided it meets the thresholds, will be used for the remainder of Season 7.


To learn more about this mission, please see the Onchain Builders Mission Details. We also have more in-depth documentation on these algorithms in these algorithms in the Retro Funding repo.

If you have ideas on refining these models, let us know in the comments. And if you’d like to contribute to making these algorithms better, you can submit a PR or open an issue here.

We’re excited to see what the community thinks and will keep iterating based on your feedback!

5 Likes

Hi Carl

Thank you for making these things so accessible.

Looking at the hex dashboard, would I be right to think that all three proposed algorithms produce highly similar overall distributions?

As I understand it, of the 133 projects in your simulation, Aerodrome, Aave, Virtuals Protocol, Zora, Layer3 and LI.FI would all be in top 10 no matter what and receive 30% of the round budget (all of them hitting the round cap of 5%)?

So, really, the choice of the voters will mostly affect the distribution between the projects that are in the top half, but not in the absolute top, is that correct?

3 Likes

Hi Joan

First, let me emphasize that:

  • this is a simulation based on January data and applications from RF4, and
  • there will be 6 measurement periods, so even if the algorithms stayed exactly the same (which they won’t), we should expect significant movement in the top projects from month-to-month

That said, projects that post two consecutive months of strong growth will do well under any of the three algorithms. The projects you mentioned definitely fit that description. And, generally, if a project can keep that streak going through July, it should do very well in the overall mission (eg, get 5% of the total pool).

Looking at the hex dashboard, would I be right to think that all three proposed algorithms produce highly similar overall distributions?

Yes, all three algos do produce a “capped power law” distribution. However, the graph looks a bit compressed on mobile. You can see more of the variance on desktop. As an example, the 30th project gets 50% more funding under one of the simulated algos.

If you plot the Y axis on log scale, you can see the relative differences more clearly – and they get bigger as you go further down the curve.

So, really, the choice of the voters will mostly affect the distribution between the projects that are in the top half, but not in the absolute top, is that correct?

I don’t think so. Consider a project that grows really fast from a low base – it should do well in 2/3 of the algos (Acceleratooor and Superscale) but bad in Goldilocks. Then consider a project that has high numbers, but no growth in the current month – it should do well in 2/3 of the algos (Superscale and Goldilocks) but bad in Acceleratooor.

2 Likes

Thank you! That’s very helpful. :slightly_smiling_face:

3 Likes

Perhaps this is the wrong place to be bringing this up as a general point, but it’s relevant, so will do it here.

Why is there such emphasis on Farcaster? It has become a gating mechanism to access many parts of governance. Not only is this poor UX, but it undermines the idea that Optimism and the Superchain will maintain credible neutrality. There are no other instances where users or contributors are forced to utilize a specific chain and/or protocol in order to access governance.

Specific to this case, what is the problem that reliance on Farcaster association is solving?

2 Likes

This probably warrants a separate thread, but from a retro funding perspective let me say the following:

  • user numbers have the lowest weight overall across algos
  • we are happy to use any other user-labeling model built off the superchain data or other public datasets
  • this is a great area for data scientists to contribute; the best place to start is to open an issue here and link a notebook or query that demonstrates the idea
2 Likes

Ok, but you still didn’t answer

It’s 10%-25% of the project-level score, and presumably wasn’t added by accident. What is the problem that is solved for by using Farcaster IDs? Maybe it’s the right answer, but it’s not clear what the question is.

What is the problem that is solved for by using Farcaster IDs? Maybe it’s the right answer, but it’s not clear what the question is.

The intent is to have several measures of network activity, including one that captures user numbers. All other things being equal, $1M in TVL deposited by a single address with no history is less impactful than $1M in TVL deposited by a bunch of users with some prior reputation.

Options for estimating users include:

  1. Using addresses
  2. Using something else linked to addresses (eg, FIDs)
  3. Using a filtered address model or combination of models (eg, removing likely Sybils, bots, etc)

There was strong pushback against 1, so we started with 2. We’d love to see more work on 3.

Why would that be the case? Is impact not = economic activity?

If you’re trying to estimate users, though, relying on a database with fairly low user numbers seems pretty strange. Farcaster captures a very small number of users.

Also, why would a Farcaster account be particularly suited to identifying unique users? They’re easy to create. We created one specifically to get our OP Atlas identifier, and it’s done zero activity beyond account creation. That is separate from another product-specific Farcaster account we have. (Note that Farcaster has been, to some extent, polluted by requiring anyone accessing OP governance to create throwaway accounts.)

Not trying to be difficult. Just don’t understand where you’re coming from yet, and trying to get there from here.

1 Like

Haha no worries, appreciate the back and forth.

I think there are two distinct questions here. Let me unpack them.

1. Why are we considering user numbers in the first place?

In RF4, many citizens cared about capturing user numbers somehow. So we’re starting this mission with some weight given to a user metric. Perhaps there’s a future iteration that includes an algo with zero weight to users.

2. If there has to be a user metric, why are we using Farcaster IDs?

Yep it’s easy to spin up a lot of Farcaster accounts (though it costs $7 a pop, so not a particularly cheap attack vector). It’s even easier (and costs $0) to spin up a lot of wallet addresses. As I explained earlier, we’re simply starting somewhere. I would love to see more robust, non-Farcaster based models get proposed.

One technical point to clarify is that we are using Farcaster IDs as a feature in a model. Every feature gets normalized to 0-1, so the actual number is not the important thing. It’s the distribution that matters. Our hope is that active Farcaster users is a better proxy for actual users than active addresses.

We can also compare different user models side-by-side and see if we think they offer improved signal. For example, here’s a quick chart I threw together (using the test data) comparing monthly active addresses vs farcaster users. There’s a pretty strong correlation between the two metrics, with a few outliers that I’ve highlighted on either side. This means that we could use either metric and for most projects the result would be the same. But for some projects (eg, pods.media and Dmail) the choice of metric makes a big difference. Ideally this is the type of analysis that other community members do and that leads us to better metrics / weights.

Finally, the numbers in the Dune chart you showed seem too low. I’m not able to click into it to see the actual query, but I tried to reconstruct it from raw Farcaster data and I get about ~150K MAUs in January.

Don’t get me wrong: I am not here to defend Farcaster IDs as a good proxy for actual human users. I’m here to say: help us find a better one. Or phase it out entirely.

2 Likes

Guess our key critique here is that we don’t believe the distribution of Farcaster addresses is simply a random selection of actual humans, and will skew the results in ways that are not intended or desirable. Farcaster is a specialized social network, and there’s no particular reason to think that activity or presence on Farcaster maps to economically important/active users onchain.

As a thought experiment, replace Farcaster with Facebook. Would you have any reason to think Facebook identities tied to onchain addresses would be meaningful? What about Twitter? Or Truth Social? Or Blue Sky?

We would probably all agree that would not be useful in isolation and to the exclusion of other methods. Just because Farcaster uses an onchain component itself doesn’t suggest that the users there are especially relevant to Optimism activity. For all we know, it could even be that the lowest value users are clustered there, making it a negative signal and not mere noise. We don’t know, and no one has made a case.

This is not directed at you specifically, but it’s probably time for disclosures about conflicts of interest whenever anyone from Foundation pushes Farcaster as a gating mechanism or otherwise special treatment that would never be given to any other project building on the Superchain. We would, however, happily support it as one option or one measurement metric amongst many, though.

3 Likes

First off, I’m excited to be part of another RPGF round, big thanks to Jonas and everyone at the foundation for your continued efforts keeping retro funding alive! Also excited to see Optimism continue to take big swings in the experiment that is RPGF. Next, thank you Carl and OSS team for the work on this, and for the extremely well-written writeup.

Here’s how I voted:

  • Onchain builders: Superscale
  • Dev tooling: Arcturus

I made these choices because rewarding projects that have already demonstrated significant usage and impact is exactly what RPGF is all about. That being said, I have concerns:

  1. Since projects are still signing up for the current measurement period, we can’t definitively see how our chosen algorithm will reward projects. While this reduces bias, it makes it difficult for me to understand how I am actually rewarding projects.
  2. I say this every round - I strongly feel that revenue should be deducted from impact. Not deducting revenue puts public goods projects which cannot generate revenue on unequal footing with other projects that can. To be clear, my intention is not to exclude projects that generate revenue, just to reward them less. I would love RPGF to incentive projects to maximize their potential to do good, as opposed to pursuing revenue opportunities. See here for my full thoughts on this.
  3. Why is there no longer an “open source reward multiplier”? Are all these projects open source? I understand that the multiplier received some backlash for potentially being flawed, but I still strongly believe we should prioritize rewarding open source projects.
  4. I’m not in favor of using Farcaster as a measure of users for Sybil resistance. My Farcaster address is unique to Farcaster, and I believe everyone should use one unique address per service for privacy and security reasons. Encouraging the use of a single address for all activities is risky and could lead to significant losses if not managed carefully. Instead, it might be worthwhile to consider a community-run competition to identify and exclude Sybils, scammers etc.
  5. You have given badgeholders extremely little control this round, which opens the program up to potential gaming. Inputs like stars or forks should not be considered, as they can be easily manipulated. If there is a lack of trust in badgeholders to make subjective judgments, perhaps we should replace them with a panel of experts.

Thanks for reading :saluting_face:

2 Likes

We would, however, happily support it as one option or one measurement metric amongst many, though.

100% and this is super actionable. TY!

1 Like

Thanks for the write up!

Just a quick clarification: inputs like stars and forks (in the devtooling round) are only used as an initial “pretrust” seed for EigenTrust. Basically, if two projects had exactly the same usage and engagement from onchain builders, then the one with higher pretrust would rank higher. For more details, see here.

Of course, there will always be ways to try to game the system, but simply accruing lots of stars / forks or publishing useless NPM packages won’t work here.

I would like to echo this too.

While Farcaster has positive aspects—such as being built on the OP Stack and I am in facor of supporting protocols within OP Stack—it should not be the exclusive onboarding mechanism, nor should it gatekeep participation.

If Sybil attacks and bots are the primary concern, alternative metrics could be implemented. For example:

  • KYC-verified users: Leverage Gnosis Pay cardholders (all KYC-compliant), though their total numbers are limited. Analyze their connected addresses to map relationships and identify patterns.[https://dune.com/queries/4120201]

  • CEX-linked addresses: Use addresses associated with centralized exchanges (Coinbase, Kraken, Binance) as a broader dataset. Refine this pool using criteria like Optimism activity, account age, balance, attestations, or overall engagement across the Superchain.Another data source could be Superseed(part of Superchain) participant.
  • Non-KYC addresses: Even non-KYC addresses could be filtered by activity history, account age, and asset holdings to ensure legitimacy.
    Just sharing couple of thoughts here, I am sure, if we think about it, there are many other option to consider.
2 Likes

Thanks for the feedback @OPUser !
I’ve tried to capture your comments in this issue.
Feel free to comment or add more ideas there.