Joan's RPGF5 Reflections

In line with what I did for RPGF3 and RPGF4, I will use this thread to share my reflections concerning RPGF5.

In this first post I will collect my take-aways from the process of participating as a reviewer and appeals reviewer in RPGF5.

And then Iā€™ll return later to share my reflections on the voting process in a second post.

Preconditions

Round scope

RPGF5 is designed to reward contributors to the OP Stack. Impact must have been generated between October 2023 - August 2024, and the following 3 categories are recognized:

  • Ethereum Core Contributions
  • OP Stack Research & Development
  • OP Stack Tooling

As application turnout for the round was lower than expected, the round design was updated to allow voters to decide on the round sizing with the stipulation that the token allocation should be no less than 2M OP, and no more than 8M OP.

Basic voting design

Along with the total allocation, each voter is asked to vote on the distribution of funds between the three mentioned categories, and on the distribution of funds between the individual applicants within one of the categories.

Each category will be assigned two groups of voters: A group of expert voters (guests and citizens) and a group of non-expert (citizens only) voters.

Review Process

Main overall impression

From a reviewer standpoint, the RPGF4 review left me with two main wishes: More clarity around what reviewers need or need not check, and better software tools available to reviewers.

In RPGF5, we took a big step forward with regard to clarity on the review task. The software tools also improved somewhat - though I would have to say that I believe that there is still room for further improvement.

Clarity, objectivity and consistency

Whereas Iā€™m generally in favour of recognizing subjective evaluation as a perfectly valid part of the voting process, I see reviewers as administrative workers tasked with upholding a set of predefined rules and criteria on behalf of the community. In order to be able to do this reliably and consistently, the guidelines must be clear and actionable, and there should be little room or need for personal opinions in the process.

In RPGF5 I was very happy to see the introduction of a reviewerā€™s checklist which attempted to turn the abstract application rules into concrete checks for the reviewers to perform, as well as a list of specific rejection reasons that corresponded 1:1 with the checklist items.

Much discussion came out of this - arguably the round eligibility criteria were lacking some nuances, forcing reviewers to reject applications that they would have very much liked to include in the round. Looking at the bigger picture, though, I consider this review round a success:

The review task was much clearer to me than in previous rounds, discussions among reviewers were much more focused around the defined criteria and their limits, and it seems to me that the outcome was a higher degree of concensus among reviewers on what projects must be rejected and why.

Also, when you canā€™t just make exceptions based on personal whims, any problems with the defined rules become very clear. Iā€™m sure that this will help shape the rules and checklists in future rounds.

Software

I was invited to help check out Charmverseā€™s updates before the review was kicked off. Thanks, @Jonas and @ccarella!

I enjoyed doing a bit of testing, some issues were definitely improved in this round, and I would be happy to take part in this bit of the process again in the future.

It would be awesome if the test process could over time become a bit more organized with clearer follow-up on the issues found - at least, as a tester, what you always wish to know when you have done your bit is: What issues will the developer team try to tackle, and (when) is further testing needed?

There is a great difference between ā€œtrying the new software to confirm that it worksā€ and ā€œtesting with the objective of finding both obvious and less obvious issues and improvement potentialsā€. I could see this turning into a regular OP contribution path over time.

Hours spent

I didnā€™t time myself, but I believe I spent about 25 hours in total on the review process (not including the testing mentioned above, but including watching the kickoff recording, reviewing ~100 applications, doing some background research, discussing with other reviewers, handling appeals and taking notes for this post).

A norm of 15 minutes per review had been discussed in the context of the Collective Feedback Commission previous to the review round, and I think this round supported this estimate.

In contrast to previous rounds, I felt that my time was generally spent on productive and focused work in this round.

Improvements and future requests

Below, I will share a list of quick notes on what I see as the biggest achievements in this review round, and the most important things that I would love to see improve in future rounds.

Iā€™m happy to comment on any of these if needed - just ask. :slight_smile:

Improvements

  • As already mentioned, the reviewer checklist and rejection reasons that 1:1 represented the items on the reviewer checklist were a BIG step up!

  • There were quite a large number of applications per reviewer, but assigning reviewers to 1-2 categories was helpful in easing the mental load.

  • Having small predefined reviewer teams supported accountability (visibility and, I think, a nice sense of team spirit).

  • It was great to have private Telegram channels for reviewer discussions! Good conversations, supportive atmosphere. Also, nice to be able to see channels for all categories for context and inspiration.

  • The new ā€˜My Workā€™ tab in Charmverse was awesome!

Future requests

  • I would like more clarity around how to handle projects that seem to (somewhat) fit several categories. If we include them everywhere, they are likely to be rewarded multiple times for the same impact; if we exclude them, they run the risk of not being accepted in a future round and end up without rewards.

  • The application rules / eligibility criteria should say something about the eligibility of projects that have already been rewarded in previous RPGF rounds.

  • It would be good for reviewers and voters alike to have API access to all application data. OP Atlas was mentioned once or twice in the review process, but I donā€™t know how to access the data thereā€¦ is there a way to do that?

  • There were still some issues around data integrity (missing github and organizational information). Issues were being handled as they surfaced, but for this reason too it would be good to be able to access the original data somehow in cases of doubt.

  • Often, important details are clarified in the review process. Reviewers and applicants both contribute with various comments and discussions. Appeals also tend to contain a lot of important context. It would be good, I think, to spend some time considering how this information can best be carried over into the voting round.

  • It would be nice for all reviewers to have full read access to comments related to all applications under review. Right now, it seems like we only have access to the applications to which we are directly assigned. Having access to all would give better transparency and context.

  • While the new ā€˜My Workā€™ tab was great, I missed being able to sort applications based on various criteria, especially Category.

  • I sorely missed being able to open applications in a new tab/window in Charmverse

  • The UI should make sure that reviewers always leave a text comment to explain their thinking when they reject an application, along with the specific rejection reason from the predefined list.

  • There were some difficulties related to the way applicants were asked to represent themselves in terms of organizational structure and projects. Reviewers need to be able to easily see which organization a project/application belongs to, and a list of all projects/applications belonging to a given organization.

  • Even so, applicants should be adviced to make sure that all needed information is included in every application! There were cases where applicants referred to funding information given in another application - which not only makes the reviewerā€™s job more complex, but also poses an obvious problem for voters if only one of these applications actually makes it into the voting round.

  • In Charmverse, it would be nice to be able to see the voting status for each application in the overview (oneā€™s own vote AND the total number of votes for and against)

  • In the list of rejection reasons, adding a ā€œWrong categoryā€ option would probably help streamline the process.

  • As a very small detail, the ā€˜hiddenā€™ private ā€œReviewer notesā€ in Charmverse (top right corner link in the individual application view) are not useful and should probably just be removed to avoid potential misunderstandings.

  • Ohā€¦ And the bug that causes all reviewer votes to be reset if the last reviewer decides to undo his vote really should be fixed. :wink:

12 Likes

Voting Process

Voting format and my role as a voter

In RPGF5, the overall goal was to reward OP stack contributions, ie. the core technical infrastructure of Optimism and the Superchain, including its research and development.

Non-expert citizens were separated from expert citizens and guest voters. Experts were instructed to not interact with non-experts, so as to not mess with the experiment design.

I was in the group of non-experts.

Within each main group, three sub-groups were formed to address each of the three round categories: Ethereum Core Contributions, OP Stack R&D and OP Stack Tooling.

I was in the Ethereum Core Contributions sub-group.

As it happens, I had been assigned to the other two categories as a reviewer, so I got a nice overview of all applications which helped me during the voting process.

Voters were asked to vote on a) the total round budget, b) the splitting of that budget between the three categories, and c) the allocation of funds to projects within the category to which the voter was assigned.

Voting rationale

Total round budget

I voted for the maximum allowed round budget of 8M OP.

While there were fewer eligible applications than expected, my impression is that the quality was high. Even in a larger pool, these applications would have probably attracted most of the funding (in RPGF3, the top 40 OP Stack projects received 6.5M+ OP in funding).

Especially for Ethereum Core Contributions and OP Stack R&D, we are looking at massive open source software projects with hundreds of github contributors each.

There are other contributors in the Optimism ecosystem who deserve retro funding, but I can think of noone that deserves it more than these developer teams. Without them there would very literally be no Superchain.

Thus, whereas I do have some doubts about the 10M OP awarded for any kind of onchain activity in RPGF4, and the 10M+ OP recently distributed in Airdrop 5, - I believe that RPGF5 aims to reward precisely the kind of public goods that retroactive public goods funding was originally invented to support.

What eventually made me settle on the maximum budget of 8M OP was this comparison with RPGF4 and Airdrop 5. Letā€™s keep our big perspective glasses on:

RPGF was not designed to directly incentivize onchain activity or demand for blockspace or sequencer revenue, but rather to secure the public goods that are needed to create value for developers and users alike and thus, over time, support more and better (values aligned) onchain activity.

Thatā€™s the flywheel we should be aiming for.

So. The Foundation had hoped or expected to see more RPGF5 applications; letā€™s incentivize more such projects (and applications) in the future.

Category budget split

I voted to allocate 40% of the total budget to Ethereum Core Contributions (30 projects), 45% to OP Stack R&D (29 projects) and 15% to OP Stack Tooling (20 projects).

The two first categories had more applications, and those projects were generally bigger, more substantial and had more github contributors as compared to those in the OP Stack Tooling category. They were also more consistently free to use. The budget should reflect all of that.

I gave some extra weight to OP Stack R&D based on the rationale that Ethereum contributors can apply for funding in the entire Ethereum ecosystem, but OP Stack R&D must be sustained by the Superchain.

Project allocations (Ethereum Core Contributions)

My process towards allocating funds within the category assigned to me was:

  • Read all applications
  • Group similar applications (programming languages, concensus and execution clients, major guilds/organizations/reseach groups, library implementations, etc.)
  • Consider the relative impact of these groups and the projects within them

After this, I used the voting UI to sort the projects into impact groups and manually adjusted the suggested allocations.

I used the metrics provided by Open Source Observer (especially the github contributor count, star counts and the age of the projects), as well as some basic research of my own around market penetration and such for context.

I also made a principal decision to support diversity (of languages, implementations, etc.) by rewarding certain ā€˜smallerā€™ implementations (smaller market share, fewer contributors) equally with some of their larger ā€˜competitionā€™. Diversity and alternatives will keep us alive in the long run.

I considered previous funding but decided to only use it to support my general understanding of the ā€˜scaleā€™ of the projects. RPGF5 is only meant to reward recent impact, so there should be no need to subtract funding given in RPGF3. The rules offered no guidance on how to handle applications that had also been rewarded in RPGF4 using the same impact time scope as RPGF5. Besides, only few projects are careful to specifically point out their recent impact in the application, so it was hard to use this as a basis for nuanced allocation.

I would like to see future versions of the application questionaire require applicants to describe a) their overall impact AND b) their impact within the roundā€™s specified time frame. And as mentioned in my previous post, the round eligibility rules should make it clear how reviewers and voters are expected to evaluate projects that have already received retro funding in other retro rounds with overlapping time scope. These improvements would help everyone better understand what impact voters should be rewarding.

Voting UX

Cohesion

The voting UI clearly improves from round to round.

In this round, I liked the more cohesive voting experience where the UI offered to take us step by step through the process of choosing a budget, scoring impact and finally allocating funds.

Flexibility

I enjoyed the flexibility of being able to go back and re-evaluate the budget after having studied the categories and projects more carefully. Similarly, there was nice flexibility in being able to pick a basic allocation method and then customize to your hearts content. And it was even possible to re-submit your ballot as a whole if you had a change of heart after having submitted it the first time.

I missed having that same flexibility in the impact scoring step; there was no link to take you back to the previous project, and no way to reset and go through the impact scoring process as a whole again. In theory you would only perform this step once, but when you work with a new UI, it is always preferable to be able to explore a bit and then go back and ā€œdo it rightā€. Conversely, it is never nice to be led forward by an interface that will not allow you to go back and see what just happened, or change a decision.

(As a side-note, being able to go back and possibly reset things also makes it easier to test things before the round as it allows you to reproduce and explore errors before reporting them).

Speaking of flexibility, I would also have liked to be able to skip the impact scoring step entirely and go directly to allocation using some custom method of my own.

Furthermore, I personally find it very difficult to think about impact in an absolute sense, as is necessary to score projects one by one without first going through all of them. I understand and appreciate that this design was a deliberate choice, but maybe in a similar round in the future there could be an alternative impact scoring option that presents an overview of projects in one list view, with a second column for setting the impact scores (potential conflict of interest-declarations could be in a third column, or an option in the impact score column). The project names in the list should link to the full project discriptions, to be opened in a separate tab.

I imagine it would be amazing for voters to be able to choose to assess projects one by one, or by looking at the category as a whole and comparing the relative impact. You might even allow people to go back and forth between the two processes/views and get the best of both worlds.

(Pairwise already offers a third option of comparing two projects at a time and leaving it to the algorithm to keep track of things for you. Offering a choice between multiple methodologies is awesome. Being able to try out all of them, go back and forth and mix and match would be incredible!)

Metrics

I loved that this round experimented with providing the human voter with both objective/quantitative data (from Open Source Observer) and qualitative data (from Impact Garden). Another provider of qualitative testimonials is Devouch.

The qualitative data available is still too sparse to be really useful, but Iā€™m sure that will change over time.

For me, this combination of relying on responsible, human, gracefully subjective and hopefully diverse and multidimensional decisions made on the basis of objective data, presented in a clear and helpful way, is the way to go.

In that sense, I think RPGF5 was the best round we have had so far, and I hope to see lots and lots of incremental improvement in the future, continuing down that road.

One specific thing that I would love to see is applicants declaring an estimate of the number of people who have contributed to their project - or maybe the number of people who stand to receive a share of any retro rewards they might get? Obviously, rewards are free profit, and projects can do with it as they please (I like that), but it would be good context for voters. In this round, OSO kindly provided github stats, which definitely work as useful heuristics, but a project could have many more contributors than just the people who commit things to github. Some types of projects are not code projects at all. It would be very cool to know more about the human scale of the operations that funding goes towards.

Other notes

Discussion among badgeholders

I felt that there was a remarkable lack of debate among voters this time. The telegram channels were almost entirely silent. There was a kickoff call and one zoom call to discuss the budget - only a handful of voters participated in this. (Thanks to Nemo for taking the initiative!)

I donā€™t know if the silence may in part have been due to the official request to not discuss voting in public as it could ruin the experiment with guest voters?

In any case, I find it a shame. Surely, better decisions are made when people exchange ideas and learn from one another. And working together tends to be a pleasant way to spend time on a subject.

As for the guest voter experiment, I look forward to learning more when it has been evaluated! In future, I would love to see some experimentation with mixing experts and ā€˜regularā€™ citizens and encouraging discussions and learning on both sides.

Transparency

I like the balance struck by having public voting for the total budget and the category split, but private voting for the individual project allocations.

Time spent

The UI was nice and efficient. As mentioned, I did some reading and pre-processing of my own before using the UI, and there were the two zoom calls. And some time is needed afterwards for reflection and evaluation of the process (resulting among other things in this post).

In total, I may have spent about 10 hours on the voting process of RPGF5.

It is relevant to note that I had the benefit of already knowing the projects of the two other categories and having spent time on the eligibility criteria of the round during the review process. Without this, I would have needed more time for reading and researching prior to voting.

In future rounds, I would be happy to spend a bit more time on (sync/async) deliberations with other badgeholders.

6 Likes

as a badgeholder ā€“ thanks for these thoughtful reflections! i was surprised by the low volume of discussion as well, maybe a contributing factor is that the telegram groups were split in two?

I also contribute at gitcoin, which developed the voting UI, and I felt similarly about the workflow for going back and re-evaluating projects! good notes for future rounds.

1 Like

Hey Joan, we have been experimenting with different variants of ā€œcontributorā€. Of course, the hard part is that there has to be some public proof of contribution. When it comes to GitHub contributions, we have versions that look at code contributions as well as non-code contributions (eg, opening issues, commenting on issues, etc). We are also experimenting with different versions based on pattern of activity (eg, someone who comments on an issue once is not the same as someone who is responding to issues regularly). Personally, I loved the collaboration with OpenRank (see Create your own Developer Reputation Scores - OpenRank Web Tool Intro as a way of creating a pool of the most trusted developers. Curious for your thoughts on this and ideas for future iterations!

2 Likes

This is a fantastic write up. Thanks for your insights and feedback Joan.

1 Like

Yeah. Thatā€™s a tough one.

On the one hand - not all kinds of contributions inherently come with any kind of public proof. Some contributions donā€™t even take place in public. Thatā€™s a fact of life.

On the other hand - we do need to be able to justify our decisions, and basing them on objective data is the least contentious way to do that.

I will give it some thought.

It feels to me like we should strive to make systems that can handle the complex reality; not reduce reality to what our systems can handle.

Maybe we can come up with ways of making objective data that speak to the things we canā€™t directly count and measure. Some kind of phenomenological approach, maybe.

In a way thatā€™s central to what Impact Garden and Devouch do. We canā€™t always directly measure the usefulness of a project, but we can collect subjective statements from a lot of people and this then becomes our objective data: ā€œThese people said this or that under these conditions, and here is what we know about each of themā€¦ā€

The github contributor count and similar objective data points taken from the public internet are a great place to start! But there must be ways to

  1. create objective data from the subjective (or just not inherently public) parts of reality

and

  1. make systems that can acknowledge the fact that important parts of life take place in ways that are not visible to the algorithm - but relevant to it. Possibly this is where human citizens will truly get a chance to make themselves indispensable. Because we see things that the algorithm doesnā€™t, and they matter to us. And if we are smart, we make the algorithm care about what matters to us.

I know. That was more philosophy than data science. But itā€™s just a starting point. Like the github contributor count. Your words made me think. Iā€™ll think some more. :slightly_smiling_face:

2 Likes