Retro Funding S7 – Dev Tooling – Eval Algos

Hey everyone, Carl from Open Source Observer here, sharing the evaluation algorithms for Retroactive Public Goods Funding (Retro Funding) Season 7: Dev Tooling.

The Dev Tooling mission seeks to reward projects based on the utility they provide to builders in developing onchain applications. See the here for more on the mission’s objectives.

This season, Optimism’s Retro Funding is pioneering a “metrics-driven, humans-in-the-loop” approach. Instead of voting on projects, citizens vote on evaluation algorithms. Each algorithm represents a different strategy for rewarding projects. Algorithms will evolve in response to voting, community feedback, and new ideas over the course of the season.

Here’s a tl;dr of the three algorithms:

Algorithm Goal Best For Emphasis
Arcturus Reward widely adopted projects Established, high-impact devtooling projects Prioritizes total dependents and the economic weight of those dependents
Bellatrix Prioritize fast-growing tools New or rapidly expanding devtooling projects Applies a steep decay factor on older forms of GitHub engagement, favors Rust over npm
Canopus Balance various forms of contribution Tools with high developer collaboration & contributions Puts stronger emphasis on GitHub engagement and developer reputation

Metrics Overview

The algorithms all rely on a value chain graph linking dev tools to onchain builders and use OpenRank’s EigenTrust implementation to distribute trust between projects iteratively.

Nodes

There are three types of core nodes in the graph:

  • Onchain Projects, which hold economic pretrust and pass it on.
  • Developers, who gain partial trust from the onchain projects they contribute to.
  • Devtooling Projects, which receive trust both directly from onchain projects, via package dependencies, and indirectly from developers’ GitHub engagement.

Edges

Links between nodes, i.e., edges, are derived from the following relationships:

  1. Onchain Projects → Devtooling Projects (Package Dependencies). For each package in the onchain project’s Software Bill of Materials (SBOM), if that package is owned by a devtooling project, we add an edge from the onchain project to the devtooling project.
  2. Onchain Projects → Developers (Commits). Whenever a developer commits code (or merges PRs) in an onchain project’s GitHub repository, we add an edge from the onchain project to that developer.
  3. Developers → Devtooling Projects (GitHub Engagement). Whenever a developer (from the set recognized above) engages with a devtooling project (commits, PRs, issues, forks, stars, or comments), we add an edge from that developer to the devtooling project.

Pre-trust Assumptions

Metrics about projects and developers are used as a way of seeding the EigenTrust algorithm with pretrust assumptions. All other things being equal, edges that involve nodes with higher pretrust values will be weighted more heavily in the EigenTrust algorithm.

Pretrust metrics are applied to each node in the graph:

  1. Onchain Projects. Transaction volumes, gas fees, and user counts over the past 180 days.
  2. Devtooling Projects. Count of published packages and GitHub metrics (stars, forks).
  3. Developer Reputation. Active months contributing to one or more onchain projects.

Weighting Assumptions

After constructing the graph and computing pretrust scores, we assign final weights to each edge before EigenTrust runs:

  1. Alpha: the portion of the EigenTrust output taken from the pretrust values vs. the iterative trust propagation step.
  2. Time Decay: an exponential decay factor so that newer events get weighted more heavily than older events.
  3. Link and Event Types: coefficients for weighting certain types of links more heavily (e.g., Rust imports more heavily than npm imports).

Each algorithm has its own set of weighting assumptions.

Proposed Algorithms

We’ve developed an initial three candidate algorithms for evaluating impact. Each algorithm highlights different priorities: Arcturus focuses on established adoption, Bellatrix puts more emphasis on emerging toolchains, and Canopus aims to look more closely at the quality of the developer community around a project.

We’ve included details on each algorithm below. We also have a Hex dashboard where you can see data on each algorithm.

Important: as projects are still signing up for the current measurement period, we simulated the results for each algorithm using past Retro Funding participants and historical data. The “top projects” below are based on onchain builders applications from RF4 and devtooling projects from RF3.

Arcturus

Arcturus rewards projects with significant current adoption by focusing on total dependents and downstream impact. It emphasizes the number of developers and onchain users benefiting from these tools, ensuring that cornerstone infrastructure is properly recognized. By applying only a modest discount to older events, Arcturus is best suited for rewarding established projects that many builders rely on.

Weightings & Sample Results

Top projects (using simulated data):

  1. Ethers.js
  2. OpenZeppelin
  3. wevm
  4. Hardhat
  5. Prettier Solidity

Bellatrix

Bellatrix is designed to spotlight projects and toolchains that are gaining in momentum. It applies a steep decay factor on older GitHub engagements and favors modern tooling preferences, notably prioritizing Rust over npm packages. This approach makes Bellatrix ideal for giving an edge to devtooling projects that are on the rise.

Weightings & Sample Results

Top projects (using simulated data):

  1. wevm
  2. Ethers.js
  3. OpenZeppelin
  4. Hardhat
  5. Foundry

Canopus

Canopus recognizes projects that have large active developer communities. It considers package dependencies alongside developer interactions, but puts extra weight on key GitHub engagement signals – such as commits, pull requests, and issue discussions. The result is a more balanced distribution of rewards across a broader variety of projects (not just the very biggest or newest). This approach can be a bit more susceptible to noise, since it values a wider range of contribution types, but it aims to ensure broader open source contributions are well-represented.

Weightings & Sample Results

Top projects (using simulated data):

  1. wevm
  2. Foundry
  3. Ethers.js
  4. OpenZeppelin
  5. DefiLlama

How to Vote

Now it’s time for Citizens to choose which algorithm to use.

One of the proposed evaluation algorithms for the Dev Tooling mission will be chosen using approval voting. This means you can approve one or multiple options. For an algorithm to be selected, it must meet the following conditions:

  • Quorum: At least 30% (44) of badgeholders must participate in the vote.
  • Approval Threshold: An algorithm must receive approval from at least 51% of quorum (22 badgeholders)

The algorithm with the most approvals, provided it meets the thresholds, will be used for the remainder of Season 7.


Here is the link to the Snapshot vote for Citizens (Badgeholders) to decide which algorithm(s) best aligns with our collective goals.

To learn more about this mission, please see the Dev Tooling Mission Details. We also have more in-depth documentation on these algorithms in the Retro Funding repo.

If you have ideas on refining these models, let us know in the comments. And, if you want to contribute to making these algorithms better, submit a PR or open an issue here.

We’re excited to see what the community thinks and will keep iterating based on your feedback!

8 Likes

I’m curious if direct contributions to the OP Stack within these devtools is considered in the weighting of these models? For example, Viem maintains an OP Stack Extension which is used on the major OP Stack bridges and applications. If not, just a suggestion to improve the model!

2 Likes

Coming soon, I hope! We considered having projects submit a link to their changelog showing direct OP Stack contributions. Curious if you have ideas for a trust-minimized way of verifying this?

1 Like

First off, I’m excited to be part of another RPGF round, big thanks to Jonas and everyone at the foundation for your continued efforts keeping retro funding alive! Also excited to see Optimism continue to take big swings in the experiment that is RPGF. Next, thank you Carl and OSS team for the work on this, and for the extremely well-written writeup.

Here’s how I voted:

  • Onchain builders: Superscale
  • Dev tooling: Arcturus

I made these choices because rewarding projects that have already demonstrated significant usage and impact is exactly what RPGF is all about. That being said, I have concerns:

  1. Since projects are still signing up for the current measurement period, we can’t definitively see how our chosen algorithm will reward projects. While this reduces bias, it makes it difficult for me to understand how I am actually rewarding projects.
  2. I say this every round - I strongly feel that revenue should be deducted from impact. Not deducting revenue puts public goods projects which cannot generate revenue on unequal footing with other projects that can. To be clear, my intention is not to exclude projects that generate revenue, just to reward them less. I would love RPGF to incentive projects to maximize their potential to do good, as opposed to pursuing revenue opportunities. See here for my full thoughts on this.
  3. Why is there no longer an “open source reward multiplier”? Are all these projects open source? I understand that the multiplier received some backlash for potentially being flawed, but I still strongly believe we should prioritize rewarding open source projects.
  4. I’m not in favor of using Farcaster as a measure of users for Sybil resistance. My Farcaster address is unique to Farcaster, and I believe everyone should use one unique address per service for privacy and security reasons. Encouraging the use of a single address for all activities is risky and could lead to significant losses if not managed carefully. Instead, it might be worthwhile to consider a community-run competition to identify and exclude Sybils, scammers etc.
  5. You have given badgeholders extremely little control this round, which opens the program up to potential gaming. Inputs like stars or forks should not be considered, as they can be easily manipulated. If there is a lack of trust in badgeholders to make subjective judgments, perhaps we should replace them with a panel of experts.

Thanks for reading :saluting_face:

7 Likes

in the current implementation, is there any extra point given to project building natively on OP stack ? I understand that there was a discussion to give preference to devs building on OP Stack to promote usage of Superchain

3 Likes

Currently, no. But definitely on the table for future iterations.

See my response to another post above:

1 Like

@ccerv1 can you please elaborate on how month-over-month changes are reflected in this category? Specifically, what monthly changes to dev tool performance can we anticipate resulting in changes in algo results?

In the Onchain Builders category, its clear how changes in project performance result in changes in their respective scores in the various algorithms. But since — at least in my understanding — the algos in Dev Tooling are based on trust values accrued over a longer time frame, it’s much less clear to me.

Are the following statements accurate?

  • all else equal, a higher Alpha weight will result in lesser changes each month
  • all else equal, a higher Time Decay weight will result in greater changes each month

Hi @spengrah

Responding to both points:

all else equal, a higher Alpha weight will result in lesser changes each month

Correct. When Alpha is higher, the impact of any particular timestamped event will be less.

all else equal, a higher Time Decay weight will result in greater changes each month

I think your intuition is correct, except the Time Decay (decay_factor) parameter goes the other direction. A higher value makes the model less sensitive to recent events. Perhaps we should give the variable a more explicit name, eg, event_half_life_in_years.

Anyway, here’s a more complete explanation:


The code applies an exponential decay model to weight events. The basic mathematical formula is:

v_edge = event_weight * exp(-λ * t)

where:

  • v_edge is the decayed event weight.
  • event_weight is the weight assigned based on the event type (from event_type_weights).
  • t is the time difference in years (i.e., how long ago the event occurred).
  • λ (lambda) is the decay constant.

The decay constant is computed as:

λ = ln(2) / decay_factor

This makes the decay_factor the half-life of the event’s weight—the time it takes for the weight to drop to half its initial value.

Examples

Case 1: decay_factor = 1.0

For decay_factor = 1.0, the decay constant is:

λ = ln(2) / 1.0 ≈ 0.6931

Then the weight decay is calculated as follows:

  • At t = 0 years:

    exp(-0.6931 * 0) = exp(0) = 1.0
    

    So, v_edge = event_weight * 1.0.

  • At t = 1 year:

    exp(-0.6931 * 1) ≈ 0.5
    

    So, v_edge ≈ event_weight * 0.5.

  • At t = 2 years:

    exp(-0.6931 * 2) ≈ 0.25
    

    So, v_edge ≈ event_weight * 0.25.

Case 2: decay_factor = 0.5

For decay_factor = 0.5, the decay constant is:

λ = ln(2) / 0.5 ≈ 1.3863

Then the weight decay is:

  • At t = 0 years:

    exp(-1.3863 * 0) = exp(0) = 1.0
    

    So, v_edge = event_weight * 1.0.

  • At t = 0.5 years:

    exp(-1.3863 * 0.5) ≈ exp(-0.6931) ≈ 0.5
    

    So, v_edge ≈ event_weight * 0.5.

  • At t = 1 year:

    exp(-1.3863 * 1) ≈ 0.25
    

    So, v_edge ≈ event_weight * 0.25.

Summary

  • For decay_factor = 1.0:
    The decay constant is approximately 0.6931, meaning the weight halves every 1 year.

  • For decay_factor = 0.5:
    The decay constant is approximately 1.3863, meaning the weight halves every 0.5 years.

This exponential decay function adjusts the event’s weight based on how long ago the event occurred, while scaling it by the event’s inherent importance from event_type_weights.

2 Likes

thanks! So with values of 0.8, 0.8, and 0.5, respectively, each of the algos should have pretty slow changes on that factor relative to the monthly timeframe?

Also, one other question (apologies if I’ve missed the answer elsewhere): when does the event data start? Does it start at the beginning of each project, or at the beginning of the evaluation period?

2 Likes

I voted for Arcturus and Canopus.

Rationale: We should encourage diversity and fresh ideas, however, I don’t see any reason to favor particular technologies or languages (Bellatrix). I like the idea of rewarding ongoing collaboration and current reputation (Canopus), but I’m unsure if the available data is strong enough to reliably and meaningfully support the evaluation of these things. For that reason, I would also accept a focus on established, well-loved devtools (Arcturus).

As for the distribution, please see my comments in this post. Basically, I would prefer more winners, rather than different winners.

2 Likes

Sorry for the delay, closing the loop on your questions:

when does the event data start?

The developer activity goes all the way back to the start of the project. You can see it here in this sample data.

So with values of 0.8, 0.8, and 0.5, respectively, each of the algos should have pretty slow changes on that factor relative to the monthly timeframe?

In theory, yes, though it will depend on the mix of projects that end up in the round and how much developer attention they attract. FWIW, this is a good issue for a data scientist to model and do some sensitivity analysis on. Going to create an issue to look into this!

1 Like

Since I don’t see a lot of people here appreciating the Bellatrix algorithm, I’ll share my humble perspective as a potential recipient in the program.

When we first saw the announcement that season 7 will be focused on dev tooling, we got extremely excited. We had been working on a powerful time-traveling debugger for several years and we always hoped that we can eventually open-source it. The Optimism RetroPGF program looked like the perfect opportunity to do so, so we meticulously planned our release around it.

Our first release got published in early March and boy, did we get a positive reception. In just two weeks, we’ve reached nearly 1000 stars on Github, we’ve topped the Hacker News home page for several hours and got into conversations for eventually supporting nearly every blockchain ecosystem. If you are curious what provokes this excitement, you can check out the 6-minute intro video for our debugger that’s available here:

I very much like the reasoning behind all of the 3 proposed algorithms and I think what would work best is a combination of them.

The hope that we’ll get this very positive reception is what motivated us to release CodeTracer as an open-source project and I hope we can serve as example for other teams that consider turning their proprietary software into a public good. Bellatrix is the algorithm that can provide a much needed kick-start for such projects.

2 Likes