Retro Funding 4: Impact Metrics, a Collective Experiment

Since one of our Collective Values is Iterative Innovation, we are continuing to refine retroactive public goods funding (retro funding) with round 4. The focus here is on collaboratively curating impact metrics, guided by the principle of Metrics-based Evaluation. Unlike previous Retro Funding rounds, Retro Funding 4 shifts from voting on individual projects to engaging badgeholders in selecting and weighting metrics that measure various types of impact.

This following runthrough will showcase the stages of this process—from the initial surveys and workshop insights to post-workshop adjustments—and how we’re integrating your feedback to shape a metrics framework that accurately reflects the impact we aim to reward.

Retro Funding 4’s New Direction: Metrics-based evaluation

The core hypothesis of Retro Funding 4 is that by using quantitative metrics, our community can more accurately express preferences for the types of impact they wish to see and reward. This approach marks a significant shift from earlier Retro Funding iterations where badgeholders voted directly on projects. In RF4, the emphasis is on leveraging data to power impact evaluation, ensuring that metrics are robust, verifiable, and truly representative of the community’s values.

One important thing to keep in mind is that this is one of many experiments that we’ll be running this year and you can find all the details of what’s to come right here.

The Initial Survey: Establishing the baseline

The initial survey we sent out was a means of gauging community perspectives on qualitative and quantitative measurement approaches and gathering input on a potential first draft list of metrics.

Key findings included:

  • Balance in metrics: Badgeholders showed a relatively balanced interest in metrics assessing both quantitative growth and qualitative impact, indicating a need for an evaluation framework that incorporates both dimensions.

  • Innovation and Open Source: Metrics related to innovation and open source scored significantly higher, reflecting a community preference for collaborative efforts.

  • Concerns about gaming metrics: A notable concern was the potential for gaming the system, highlighting the need for metrics that are difficult to manipulate and that genuinely reflect impactful contributions.

  • Engagement and a focus on quality: There was a strong preference for metrics that measure sustained engagement and ongoing quality vs one-time actions, highlighting the value placed on long-term impact.

The Workshop: Data Deep Dive and Refining Metrics

Building on the survey insights, the workshop we ran on May 7th enabled badgeholders to provide more in-depth feedback on the data-driven aspects of the impact metrics system as well as take a pass at selecting the most meaningful metrics grouped around:

Network growth

These are metrics designed to quantify the expansion and scalability of the network. They focus on observable, direct measurements and aim to capture high level activity.

Network quality

This aims to evaluate the trustworthiness and integrity of interactions on the network. They focus on depth and reliability.

User growth

The focus here is on measuring new, active and consistent participation in the network.

User quality

The quality measurements are aimed at assessing the value and consistency of user contributions, not just their numbers.

For a detailed look at the survey results, key moments and discussions, you can review the workshop space here.

You can also explore this blog post by Open Source Observer that recaps the work they’ve done so far to map up the metrics. OSO is collaborating with the Optimism Collective and its Badgeholder community to develop the impact metrics for assessing projects in RF4.

Post-Workshop Survey: Refining metrics

Following the workshop discussions and feedback, and since not all Badgeholders were able to attend, a detailed survey was sent out to get further feedback on refined metrics, divided into the four identified segments:

  • Network Growth: Feedback suggests a need for simple, intuitive, and directly measurable metrics like gas fees and transaction counts

  • Network Quality: There ia general support for metrics that consider the quality and trustworthiness of transactions, with suggestions to refine how ‘trusted transactions’ are defined and measured

  • User Growth: A preference for metrics that accurately capture genuine user engagement emerged, with calls for clearer definitions of what constitutes an ‘active’ or ‘engaged’ user

  • User Quality: Responses highlighted the challenge of measuring user quality without arbitrary metrics, suggesting a need for ongoing discussion and refinement to develop quality metrics that truly reflect contributions to the ecosystem

Refining and implementing feedback

The insights from the surveys together with the discussions in the workshop are leading us to the next stage of metrics refinement. A key focus across the workshop and the second survey has been the granularity of the trusted users element so to respond to and integrate the insights, Open Source Observer (OSO) is working on the following:

Trusted user model
Based on some good suggestions and ideas for how to improve upon the “trusted user” model coming from the second survey shared, OSO are exploring several additional sources of data for the model.

Important Note: Open Source Observer is only considering sources that leverage existing public datasets (no new solutions), are privacy-preserving, and can be applied to a broad set of addresses (>20K).

Action items:

  • Extending the trusted user model: The current model relies on three trust signals (Farcaster IDs, Passport Scores, and EigenTrust by Karma3 Labs). OSO will now move forward with a larger set of potential trust signals such as:
  • Optimist NFT holders
  • Sybil classifiers used by the OP data team
  • Social graph analysis
  • Pre-permissionless Farcaster users
  • ENS ownership
  • OSO will be running some light scenario analysis and choose the best model(s). They will also model a variety of options and share a report on the results.

Small fixes:
In the second survey, badgeholders also made a number of straightforward suggestions for how metrics could be standardized and implemented more consistently. These include:

  • Limiting all relevant metrics to the 6 month Retro Funding 4 evaluation window
  • Creating versions of some metrics on both linear and logarithmic scales
  • Creating versions of “user growth” metrics that look at a larger set of addresses vs just the “trusted user” set
  • Copy-editing to make descriptions clearer

Action items:

  • OSO will proceed directly with making these changes

New metrics:

There were several metrics that had previously been considered and were proposed again in the feedback survey. These include looking at gas efficiency; power users; a larger set of addresses than just trusted users; and protocols that are favored by multisig users.

Action items:

  • OSO will implement these new metrics where feasible

Next Steps: Testing metrics out

The next step in this process will be to take the metrics set for a test drive. In the week of June 3rd, Badgeholders will have the chance to test out the metrics through an interface that simulates IRL voting. Badgeholders will also have the opportunity to submit extra feedback per metric as they test each one out.

Once this final round of feedback is collected and analyzed, we will be able to come back together in a session to discuss and decide on the final set of metrics to be used in Retro Funding 4.

14 Likes

Just wanted to give an update that we have made good progress since this post was drafted to incorporate all of your feedback:

  • all of the small fixes have been completed
  • there have been a number of improvements to the trusted user model.

Bonus: If you want to see how the metrics are derived, we recently created a lineage graph. You can also view the logic and metadata about each model from here.

Finally, let me just say that we feel immense pressure to do our part in making this experiment a success. All the feedback we’ve gotten on metrics has been so valuable – and your continued engagement is essential.

Thank you!

8 Likes

I love the openess of the OP platform❤️,my only concern is the ,how can I serve more

Quick note here - the team working on the UI are fixing some bugs before testing can begin. We’re now aiming to start testing on June 10th

Thank you & stay tuned :hourglass:

2 Likes

Testing of Voting Functionality

We ask the community to help us test the retro round 4 voting interface from June 10th - June 16th. While we’re processing data from project applications to power relevant metrics, this round of testing works with dummy data.

Providing feedback on Metrics

Retro round 4 is experimenting with metric-based voting, with the hypothesis that by leveraging quantitive impact metrics, badgeholders are able to more accurately express their preferences for the types of impact they want to reward, as well as make more accurate judgements of the impact delivered by individual contributors.

After many iterations of the impact metrics based on badgeholder feedback, full overview here, we present a first set of 8 metrics in the interface, with 4 more to be available in the following days. You can use the comment feature to provide feedback on the implementation of these metrics!

Testing Functionality

  1. Wallet connect: Connect your wallet.
    a. Were you able to connect your wallet without any issues?
  2. Discovering Metrics: Search for metrics, sort metrics, and use the navigation bar to jump between metrics. Give us a sense of the experience:
    a. How easy was it to search for specific metrics?
    b. Were you able to sort metrics effectively?
    c. Did the navigation bar function as expected when jumping between metrics?
    d. Any suggestions to improve the discovery of metrics?
  3. Interacting with Metrics: Add metrics to the ballot, comment on metrics (by clicking on each metric to open the metric page and access the comments box), explore their calculation, etc.
    a. Were you able to comment on metrics without any issues?
    b. Did the exploration of metric calculations provide the information you needed?
    c. Any difficulties or suggestions for interacting with metrics?
  4. Manage your ballot: weight metrics and see how the allocation changes, use the OS multiplier.
    a. Is it clear how to weight the metrics on your ballot? Are the implications of weighting metrics clear?
    b. Did you find enough information on the open source multiplier to make an informed decision about using it?
    c. Any challenges or recommendations for managing the ballot?
  5. Submit your ballot: Complete the survey and submit your ballot.
    a. Were the steps to complete the survey and the survey questions clear and straightforward?
    b. Did you encounter any issues while submitting your ballot?
    c. Any feedback on the final submission step?

If you find an issue, submit here: https://deform.optimism.io/voting-testing

5 Likes

Compared to the last RetroPGF this feels amazing. I would love to have the option to exclude projects once my ballot is finished by just adding a checkbox next to each project name, but maybe this breaks the entire purpose of choosing metrics only.

Comments sorted by “Top Comments” put the top comment at the end when it should be at the top.

Not a bug but it will be nice to see the entire description of a ballot on the first page without having to go into it

The intro mentions 20 options but there are only 12

If weights must add up to 100% please add a counter showing the current state of my ballot

Also, no idea what the locks next to the ballot % do. If I click on them the ballot changes, maybe a “mouse over” description is needed.

Can you elaborate on this question:

Can we add some Share function for Twitter or Farcaster at the end?

3 Likes

A few updates on RF4 Impact Metrics:

  1. We’ve updated the interactive docs for all our metrics models. You can find them here.

  2. We’ve made a series of refinements based on badgeholder feedback to the Trusted User model. The working definition is below. Note, this description will be displayed in the voting UI for all relevant metrics.

    A “trusted user” represents an address linked to an account the meets a certain threshold of reputation. Currently, there are several teams in the Optimism ecosystem building reputation models in a privacy-preserving way. This metric aggregates reputation data from multiple platforms (Farcaster, Passport, EigenTrust by Karma3Labs), and the Optimist NFT collection. In order to be consider a trusted user, an address must meet at least two of the following requirements as of 2024-05-21: have a Farcaster ID of 20939, have a Passport score of 20 points or higher, have a Karma3Labs EigenTrust GlobalRank in the top 50,000 of Farcaster users, or hold an Optimist NFT in their wallet.

  3. We’ve added a new experimental impact metric based on the badgeholder social graph.

    OpenRank Trusted Users: Count of addresses in the badgeholder “web of trust” who have interacted with the project over the RF4 scope period (October 2023 - June 2024). EigenTrust, aka OpenRank, is a reputation algorithm being applied by Karma3Labs to the Farcaster social graph. To seed the “web of trust”, we begin with a list of 132 badgeholder addresses, look up their Farcaster IDs (present for 68 of the 132 addresses), and use OpenRank to identify those users’ 100 closest connections. The result is a set of around 5000 addresses that have the closest social connection to the badgeholder community. Finally, we counts the number of addresses in the web of trust who have interacted with a given project. Note: this is an experimental metric designed and results in an even smaller but potentially higher signal subset of users than the “trusted user” model applied elsewhere.

Over the coming days we will continue to make additional improvements to the metrics and incorporate user feedback from the voting UI.

3 Likes

I apologize if this is not the right place to lodge this complaint, but I believe viewing gas & transaction count from a purely top-level perspective is a mistake. It encourages apps to deploy their own thin wrappers around or replacements of onchain utilities and disincentivizes the building of public goods which are meant to function by calls from other contracts. I am not sure if I have seen this discussed anywhere, but it felt like a pretty big miss to me for a round that was supposed to support “onchain builders” broadly speaking (vs say “onchain apps”)

I agree that just looking at gas is not a good idea. It penalizes contracts that are gas optimized, leading to bad outcomes.