Joan's RPGF4 Reflections

In this post I will try to share some points that I take away from the process of participating as a reviewer and appeals reviewer in RPGF4.

In a second post (below) I will share my reflections on the voting process.

Preconditions

Round scope

The aim of RPGF4 is to reward onchain builders who have deployed contracts to the Superchain and contributed to the success of Optimism by increasing demand for blockspace and driving value to the Collective.

Metrics-based evaluation

Along with a more narrow focus, this round also differed from the previous round in that the voting design prescribed a new form of automated, metrics-based evaluation.

Details on the thinking behind the metrics-based design can be found here, and I would like to repeat what is pointed out in that document:

A metrics based voting experience is only viable for a subset of impact today and can’t be applied to evaluate all contributions to the Optimism Collective.

At the same time, an objective and metrics-based approach - for better and for worse - rules out subjective human judgement at times when it might be tempting to use it. There have definitely been times during this round when I felt frustrated about this.

With that said, I appreciate the idea of smaller, more targeted rounds and the possibility to experiment with different ways of evaluating impact.

Impact definition

I’m glad that the decision was made to use the Foundation’s fallback definition of “profit = impact”, implying that no previous grants would be subtracted from this round’s OP allocation.

While I do think that previous funding should ideally be taken into account, I believe that the road towards formally defining ‘impact’ and ‘profit’ is still long, and we should not rush it or pretend that we are already there.

Experimental attitude

As mentioned, I value the experimental attitude behind Optimism’s RPGF.

Part of this, it seems to me, is to accept the possibility of failure.

While this round does have some features that I find problematic, it is essential in any experiment to allow things to run their course and not accidentally hide possible design flaws by trying to apply common sense on an individual basis.

Review process

I spent quite a lot of hours on the review process, as a reviewer and appeals reviewer.

Two main things, I think, could relatively easily be improved in future rounds: More clarity around what reviewers need or need not check, and the software tools available to reviewers.

What must reviewers check?

As reviewers, we were offered a handy reviewer guide. But while this document contained just a short list of fairly specific things to look for (“What to review projects on”), reviewers are also expected to ensure that applications comply with the general application rules, and it is my impression that this left quite a bit of room for subjective interpretation - and uncertainty.

Some specific points around which there seemed to be ambiguity:

  • Github repo age. The guidelines say: “Does the repo hold no code or is it newly created, raising a red flag?” What is a red flag? A warning signal? A disqualification?

  • Unverified contracts. Reviewers are being asked to verify that the applicant has “made their contract code available in a public Github repo”. What happens when a contract shows as unverified on e.g. optimistic etherscan? How would a reviewer know if the deployed contract corresponds to the code in the Github repo?

  • OSO slug. This was arguably the most mysterious topic in the review process.

  • Downstream contracts. During the review process, it surfaced that the automatic verification performed by OSO considered anything downstream of a project’s stated deployer as in scope. While it makes sense that it could be impractical for some projects to explicitly list all of their contracts in their application, this does raise a number of questions with regard to the review process and the task of the reviewers. In particular, exactly what code do reviewers need to attest to having seen?

Ideally, reviewers should be given instructions that clearly specify what action they must take when they see this or that. And they need to know exactly what they need to look for.

This requires the reviewers to understand the automatic verification process clearly, such that they know what they can safely ignore.

Especially in a round like this with so much emphasis on objectivity and quantitative metrics, it seems important that eg. overlapping impact, transaction counts and transaction timing should be automatically checked …and possibly subjected to random sample tests by appointed badgeholders with the necessary skillset.

Other badgeholders could be asked to notify these specialists in case of doubt, but abstain from rejecting applications based on their own tests.

I think more clarity will lead to more consistency and delegation of responsibility. Feedback and questions from the human reviewers can be used to inform the automatic process - and the automatic process can apply the learnings consistently.

Reviewer tooling

Suggestions for improvement of the Charmverse UI that was used in the review process:

  • Filtering. Reviewers absolutely need to be able to easily see which projects are assigned to them and the current status of these projects. The same is true for appeals reviewers.

  • Notifications. It would be awesome if there was some kind of automatic notification in case a reviewer has missed a review, and also whenever new projects are assigned to a person.

  • Rejection reasons. We need the option to reject an application due to missing or incomplete information. It does not feel good to choose ‘spam’ or ‘deceiving badgeholders’ in cases where an application is simply incomplete for reasons that the reviewer can’t know.

  • Reviewer notes. The current interface offers 3 kinds of free text comments: a) The almost invisible ‘reviewer notes’ in the top right corner; b) the public comments on the bottom of the screen; and c) the explanation attatched to rejections. I think the reviewer notes are of little use at this point, and it might be better to remove them completely to avoid confusion. It would be nice to have the option of also offering an explanation when you accept an application. The public notes on the bottom of the page actually proved quite useful in one or two cases where an applicant pitched in and explained something.

  • Application links. Links to project website, Github, contracts etc. really should be shown as links and not just as text. Having to copy-paste these things into new browser tabs every time very, very quickly grows tiresome!

  • Changing votes. The flow around a reviewer changing his opinion could be better. In some cases, trying to change a vote also resulted in the cancellation of the four other reviewers’ votes - which was all the more problematic as due to anonymized user names it wasn’t always possible to see who you needed to contact if you had accidentally done this…

  • Data integrity. It is really important that reviewers can see exactly what applicants put in their applications. In a few cases, bugs in the UI prevented reviewers from accurately seeing parts of the application data. Applications were mistakenly being rejected because of this until the bugs were discovered.

In a future round, maybe a few reviewers could be employed to test out the software before the start of the actual review process?

It would also be good to allow/request applicants to review their own applications in the same UI(s) that reviewers and voters will be using - that way, any data integrity issues are much more likely to be discovered early on.

Also, maybe some sort of collaboration could happen between Charmverse and Retrolist.app. In this round, I would say that Retrolist did a better job at reliably mirroring the application data and providing helpful links etc. But reviewers still had to use Charmverse because that’s where the voting functionality was.

8 Likes

Voting process

Voting format

In RPGF4, badgeholders were asked to vote on metrics, not applications or projects.

16 possible metrics were available, and each badgeholder was allowed to mix and match these as he liked and give each chosen metric a weight (a percentage, such that all chosen metrics would add up to 100%).

In addition, there was the option of applying an ‘open source reward multiplier’ which would multiply the effects of applying the chosen metrics across open source projects by up to 3x.

Time spent

Compared to previous rounds, the voting process was extremely ‘efficient’: It does not take long to add a few metrics to a ballot and submit them.

Including the pre-voting workshop on metrics, a kickoff call, time to read up on each metric and reflecting on their usefulness, some experimenting in the actual voting interface, as well as participating in a bit of discussion on telegram and a few badgeholder calls to exchange notes and ideas - I probably spent less than 10 hours in total on the voting part of RPGF4. For comparison, I spent more time on the review process.

(Writing this takes time too, of course, but hey -)

Discussions

Aside from a couple of dedicated discussion calls with a handful of participants, it is my impression that there was less discussion in the badgeholder group in this round than in the previous round.

Values and ethics

My main critique in this round is that the format rules out ‘subjective’ evaluations - ie. ethics.

Rewarding any and all kinds of onchain activity feels wrong to me. What kind of Phoenix will we be summoning like that?

Debating what actually constitutes a ‘public good’ takes time and energy, and we will likely never agree - but the same is true of ‘good objective metrics’. We need the discussions, and we need to find ways to make decisions that don’t just ignore the complex human reality.

Generating gas fees, or initiating transactions, or even having many human users, does not necessarily mean that your impact is good.

And even though we may not be able to collectively define a ‘public good’ in the same concise way that we define ‘gas fee’, we can certainly create mechanisms that allow human citizens to weigh in on what is or isn’t a public good - in the same way that we, in this round, were allowed to weigh in on what is or isn’t a good metric.

Public voting

Votes will be public this time. I don’t feel good about that.

It is true that choosing metrics rather than applications makes things less ‘personal’. And the risk of voters selling votes or voting under threat would be there in any case, as the online voting happens unsupervised and anyone can take a photo of their own ballot and share it (which means that they can also be pressured into doing so).

Still, with a ‘secret’ vote - possibly subject to checks performed by appointed scrutineers working in confidentiality - badgeholders would have a fighting chance of remaining relatively anonymous and just doing their work. Or even be public figures who keep their actual voting to themselves.

Different badgeholders may approach these things differently, or even adjust their approach over time according to their life situation.

With a public vote, all badgeholders are being pushed out into the public spotlight and placed in a position where they could have to defend their votes. That may be appropriate for delegates (who are representatives), but imo not for citizens.

RPGF involves allocating large sums of money, and debates on Twitter and Discord do get very heated at times. Badgeholders are humans, and I’m seeing simultaneous moves from various sides towards wanting to know more about us, analyze our onchain activity, require us to use a Farcaster account with verified addresses, pull in our ENS, etc. In the near future, whether we like it or not, many of us will be very easy to track down by a disgruntled applicant, online and offline.

In the past months, between RPGF3 and RPGF4, I have received messages on various platforms from people who reached out to me only because I’m an Optimism badgeholder. That is what it is - I don’t necessarily see anything wrong with reaching out. I have also received an offer of free tickets for a real life event. To be honest, I have no idea when a friendly offer like that crosses the line to something that might be understood as a bribe. My educational background offers no help with this.

In any case, there is certainly a gray area. Some people will go to some lengths to try and ensure support for their projects in the future. I do my best to avoid conflicts of interest. Knowing that my vote in RPGF3 was secret/anonymous has helped me stay neutral at times when I might have otherwise felt under pressure. With public voting an important layer of protection falls away. Psychologically, and in a worst-case scenario also physically.

Metrics under consideration

In line with my view on public voting I will not share my ballot here. If anyone really badly wants to know, they can look it up later when all of the votes are disclosed.

But I’m happy to share my reflections on the metrics. (And I have obviously voted according to the views that I present here).

Gas fees and transactions

Gas fees will feed future retro funding rounds. As such it is a very clean and obvious metric to use. Even if a contract does nothing meaningful or good aside from that, by generating gas fees it does help keep Optimism alive. (I hope we all want Optimism to do more than just grow fat, but it certainly needs to feed itself).

Transactions is a simpler metric than gas fees. Transactions must take place for gas fees to be generated, but gas fees also depend on the size and timing and perceived value of transactions. Many transactions could mean a lot of meaningful activity - or spam, or clumsy coding. I don’t think this metric captures anything important that is not already captured, better, by ‘gas fees’.

Addresses vs. users

I’m torn here.

On the one hand, I would like to see Optimism as a great habitat for humans. Not just bots. Many human users could mean that a project has the support of - and/or supports - a lot of humans. That would be a potential indicator of a public good.

On the other hand, machines can do great work too. They can certainly contribute to public goods, and in many contexts we absolutely need them. Using machines for good should be rewarded.

Even more importantly: Humans, too, can be used in many ways. Depending on the context, a user can be a customer, an employee, a slave or…?

Addictive, bribing and attention-grabbing platforms may have many users.

In the physical world, you would (I hope) never blindly reward someone for having many employees without asking about the company’s product and the working conditions.

I’m very wary about rewarding the use of humans without some kind of holistic human evaluation of the process.

Trusted users, power users and badgeholders

We can think of different ways to recognize a human user, and we can even go a step further and discuss what particular ‘types’ of users indicate about a project.

This could be very useful - but only when we have some context.

Applications operated by machines obviously don’t have trusted users.

Users who like their privacy will not be trusted. Applications that offer privacy will not have a lot of trusted users.

Furthermore, by making OP rewards dependent on specific groups of users, we might end up incentivizing platforms to optimize around the use of those specific humans. Being offered free tickets may still be good fun, but do we really want to encourage all of the platforms out there to be constantly grabbing for the attention and action of a few known badgeholders and their friends - by making their funding dependent on it? I don’t.

Daily vs. monthly

I see no reason why a good contract would necessarily require constant activity.

A combined requirement for ‘daily’ and ‘human’ could potentially create some very unhealthy incentives. (Of course, some great projects might also score high on such a metric. But the metric in itself does not tell us if a project is great or dystopian.)

Linear vs. logscale

A logscale has the very nice property that rewards are distributed according to merit; projects that score higher on some metric M will get more than those that score less, but rewards are not only funnelled towards a few giants that dwarf everyone else. Instead, the funds are distributed in a way that incentivizes initiative of all sizes, promoting diversity and competition.

Let’s be clear: Impact = profit can not mean that any project which generates $x in gas fees should receive $x in retro funding. That would be both silly, dangerous and very bad business.

Open source multiplier

There has been a lot of discussion around this metric. I recognize that it may not entirely work as intended yet. But that is true for RPGF in general.

I would like to point out that ‘open source’ is defined as a collective value of Optimism. One of the values that badgeholders are currently being asked to promise to uphold.

Given this, we should take our own medicine and create an incentive for us all to help us get better at defining and recognizing open source. So - I endorse the use of this metric. That should make it worthwhile to put some effort into improving the details.

Going forward

Now that we have tried this once, maybe we could experiment with more complex metrics. Maybe something like “many human users is great, but only if [something]”?

I could also imagine a setup where some kind of ‘subjective’ human filter is applied first (similar to what we saw in RPGF3, but maybe in more of a binary way similar to the initial review, to collectively decide which projects are indeed public goods) and then we apply objective metrics to decide who gets how much.

What I would really love, though, is to experiment with using objective metrics to inform the decisions of humans, without mistaking the metric for the goal.

7 Likes

Thank you for taking the time to provide such thoughtful and actionable feedback. We’ll be sure to work with @Jonas and the team to improve the experience for the next round.

3 Likes

Awesome!

Thank you for your care and effort. :slightly_smiling_face:

1 Like