Pairwise: Community Signaling in Retro Funding 4 - No badge required 😉

Pairwise is making voting to distribute RetroPGF simple and fun. We are now pivoting to make signaling RetroPGF accessible to everyone in the Optimism Community! Pairwise is an open-source, off-chain voting dapp (like Snapshot) that streamlines community signaling by letting users select between just two options and then aggregating their choices into a quantifiable result. Pairwise is designed to be user-friendly and intuitive. Pairwise converts simple subjective inputs into objective, measurable outputs, minimizing the cost and cognitive burden of voting.

Pairwise real potential lies in aggregating data from various voters, allowing them to concentrate on areas where they have expertise. This means voters will spend more time in their areas of expertise.

This has incredible potential to increase project discovery and network effects within our community. This is why with this new version, we aim to give more community stakeholder groups the opportunity to participate in RetroPGF. We will amplify the voices of token holders, delegates, RetroPGF recipients, and badgeholders as community signaling.

Community Signaling as a Metric in Retro Funding 4

Retro Funding 4 is all about Metrics and one of the challenges on a heavy metrics result is that metrics can be manipulated and reductionist. Goodhart’s Law states: When a measure becomes a target, it ceases to be a good measure.

Dive deeper on this in this post from Owocki.

Furthermore, impact is more than TVL and number of unique transactions. Not every $100 transaction has the same impact. The impact of a project facilitating remittances would be considered by most to be more valuable than a meme coin, even if the onchain data says otherwise.

Pairwise :handshake: Retro Funding 4

We are eager to put qualitative data onchain and give badgeholders the option to factor in community signaling as an onchain metric for Retro Funding 4 and we have a beautiful new version of our app to do it with!

With Pairwise, the community can engage, discover, review, and rank the types of projects that interest them in Retro Funding 4. We will break the applicants into several categories which will contain 15-30 projects and the community will be able to rank the projects within each category, and then rank these categories as well. In this way, each voter can analyze just a portion of the projects and we can aggregate the results into one final ranking.

Pairwise WIP demo

Major V2 improvements:

  • Mobile first
  • Voter feedback
  • Tinder-style filtering
  • Sound effects & background music
  • Pseudonymous account abstraction

Also the community will not allocate a number of tokens to each project, instead they only rank the projects using filtering and pairwise assessment. This greatly reduces the cognitive load of voting and gamifies the user experience.

The challenges to overcome are the tight timeline and engagement. We will launch a marketing campaign to maximize participation with rewards included from our RetroPGF 3 rewards!

Our proposal is to create four metrics, one for each of four stakeholder group; Badgeholders Rank, OP Hodlers Rank, Builders Rank & Delegates Rank. However we also could create 1 metric and give each stakeholder group 25% of the influence of an aggregated result if adding 4 metrics to the voting UI is seen as too much. Check voting power distribution below.

Voting power distribution


  • Link: Optimistic Etherscan Token Balances
    • WHALE: 10M OP, 50 holders, Votes: 100
    • Diamond: 1M OP, 190 holders, Votes: 50
    • Platinum: 100k OP, 1000ish holders, Votes: 25
    • Gold: 10k OP, Countless Holders, Votes: 10
    • Silver: 1k OP, Countless Holders, Votes: 5
    • Bronze: 500 OP, Countless Holders, Votes: 3
  • Snapshot to be taken May 1st.


  • Link: Optimism Tally Delegates
    • Diamond: 1M Delegated OP, 23 delegates, Votes: 8
    • Platinum: 100k Delegated OP, 40 delegates, Votes: 5
    • Gold: 15k Delegated OP, 64 delegates, Votes: 3
    • Silver: 5k Delegated OP, 175 delegates, Votes: 2
    • Bronze: 2.5k Delegated OP, 185 delegates, Votes: 1
  • Snapshot to be taken June 18th.


  • 1 address 1 vote
  • Retro Funding 4 Badgeholders on June 18th


  • 1 address 1 vote
  • Recipient projects in Retro Funding 4

Note: If a user holds multiple badges, they will be accounted for as Badgeholder, Delegate, or Recipient in that order. Everyone except delegates can also be accounted for as an OP holder. The reason for this is to protect user anonymity; otherwise, we could risk potential counter-engineering of anonymity.

Every one of those stakeholders will have a Pairwise ranking and those rankings will be aggregated. In order for for a bucket to provide relevant information to the collective we need:

  • Badgeholders: 30 participating and at least 5 finalizing each category
  • Recipients: 100 participating and 10 finalizing each category
  • Delegates: 100 participating in Pairwise and 10 finalizing each category
  • Holders: 400 participating and 20 finalizing each category

If one of the groups doesn’t meet the criteria, it will not be qualified to be a metric for voting in Retro Funding 4. At least 1 group needs to meet the minimum engagement requirements or no data will be added to the round.


The app will be ready for community signaling as soon as possible after projects are finalized. We propose to begin the signaling before the official vote begins and end a week before the voting has ended so that the final results are solidified for the last week of badgeholder voting.

Ranking Math

To determine the final rankings, we are debating three potential distributions that will be shared shortly in the comments below. Our goal is to gather feedback from the community on which distribution model will give the ideal distribution.

Reward Distribution for Pairwise Voting

The reward system is designed to promote stakeholder engagement. Pairwise is as fun as voting gets and we want to spice it up rewarding stakeholders signaling their preferences to the collective. We are allocating a total of 4,000 OP from our RetroPGF 3 Rewards, with 1,000 OP designated for each stakeholder group (Badgeholders, Recipients, Delegates, and Holders). Each participant’s chance of winning increases with more categories they vote in.

Raffle Tickets:

  • For every 10 points earned, participants receive one raffle ticket.
  • More tickets mean higher chances of winning.

How to Get Points::

  • Category Participation: Finalizing the first category gives 10 points.
  • Weighted Bonus Points (For Delegates and Hodlers bucket):
    • Whale: +20 points per category
    • Diamond: +15 points per category
    • Platinum: +10 points per category
    • Gold: +5 points per category
    • Silver: +3 points per category
    • Bronze: No additional points
  • Max Tickets: 25% of the total number of categories. Once you have ranked 25% of the categories, you do not get extra points for ranking more.

Note that each stakeholder group has an equal total of 1,000 OP for distribution. A Holder may accumulate more points than a Badgeholder, but this will not affect the distribution among Badgeholders—it only enhances their chances within the Holder category.

At the end of the voting period, 10 winners will be randomly selected from the raffle tickets in each group. This ensures every participant has a chance to win, with active members enjoying improved odds thanks to their contributions to the collective. If a participant is both a Badgeholder and a Delegate, 1 vote will put them in both raffles!

Anti-Gaming Measures and Transparency

  • The Pairwise voting process is not like snapshot, it will be very easy to track how users use the application using traditional analytics tooling to separate authentic users from bots and people who are just clicking through to promote their own projects.
  • If the ballot submission looks like they are only farming the reward or just voting for their own projects without putting in an authentic effort to judge other projects, we will not give rewards to that voter, but we will still count the ballot. For the first round we will count every ballot even if we believe someone is farming the system for rewards.
  • We will maintain complete transparency in the reward calculation and distribution process. Everything will be public.

Join us in Shaping the Future of Pairwise

We would love to hear your feedback and shape this experiment collectively. We’ve received massive support from the collective and want to continue evolving to suit the needs of the community to the best of our capacity. We’ve been working on this prototype that will internally launch on May 15.

You can share your comments here, or DM me at @Zeptimusq on Telegram, where we can go more in-depth and schedule a demo call.


Thanks for this! We completely support the gamification of RPGFs and believe that this will be a good way to go forward. However, we have some concerns on the algorithm behind pairwise selection.

From a game theory perspective, the outcomes are entirely dependent on the order in which choices are presented to voters. In cases where there are no Condorcet winners, a pairwise selection process will not necessarily yield social optimum. Consider the following example:

Voter 1 prefers A (7) > B (2) > C (1)
Voter 2 prefers B (4) > C (3.5) > A (2.5)
Voter 3 prefers C (5) > A (4) > B (1)

Even if the social optimum is A, (13.5), we can reach any of the 3 choices:

  • We can reach A by pairing (B, C) in Round 1, then (A, B) in Round 2.
  • We can reach B by pairing (A, C) in Round 1, then (B, C) in Round 2.
  • We can reach C by pairing (A, B) in Round 1, then (A, C) in Round 2.

Given this, we’d love to have more clarification on the transparency pairwise selection mechanism, and happy to iterate with design :slight_smile:

– Jay, Stanford Blockchain | Twitter


Pairwise is using the Budget Box algorithm, and is very similar to the Elo rating system which is the dominant way of doing this work in the default world (chess, video games, professional and college sports, etc.) Pairwise converts these simple subjective inputs into objective, measurable outputs, minimizing the cost and cognitive burden of voting.

The main difference between the 2 algorithms is that the budget box takes into account the previous comparisons not only the 2 that are facing at that time, giving a more accurate result with less comparisons.

What we are doing for this version is using pairwise comparison to order the list, which users can drag and drop to organize according to their preferences before casting their votes.And then to decide the rankings, we are exploring different distributions that we will share here asap. I love your passion to engage in iterating the design!!

has pairwise been used in previous RPGF rounds?

Hey @jackanorak
Yes, it was deployed on and some users created some
ranked attested lists which if they were badgeholders, their lists were showing up in the RPGF3 Voting App UI

Also, we processed all applicants and provided detailed 31 categories that was done in a very short time window after project validation before the round was started.

Now we are going to have it with a renovated UX based on the feedback we had. Improvements based on the main feedbacks were:

  • Easier and more accessible so we brought it up on mobile. You can pause, and resume whenever you want.
  • Anonymity with a zkProof method.
  • Enhanced BudgetBox-Algo for better distribution.
  • Adding roles like holders, so everyone can vote and make a signal to badgholders.
  • Wheightening the votes based on different roles.

This makes governance more fun, more accurate, and more engaging for all OP community. For better visibility, I added some screenshots, but please don’t miss the PROTOTYPE.

1 Like

@Zeptimus while experimentation around Retro Funding voting is always welcome, I want to share some guidelines and concerns around this.

Guidelines for impact metrics:

  1. Impact metrics should be complete, meaning they need to include every project
  2. Metrics need to be reproducible and verifiable
  3. Be ready by June 7th so it can be ingested and experimented with like all the other metrics. We’re aiming to start the final curation of metrics with badgeholders on June 12th or earlier.

Some concerns and risks I want to highlight:

  1. This would create an attack vector for the round, in which projects will aim to achieve a high rank on Pairwise.
  2. The sign up for projects will close on June 6th, so the timeline for reviewing all projects and getting the metric ingested into the system seems very problematic.

Here’s a personal suggestion on how Pairwise could run an informative experiment here. You could explore the research question of how the outcomes of the Pairwise voting system compare to the outcomes of Retro Funding 4 voting using impact metrics.
Retro Funding 4 is experimenting with the hypothesis that by leveraging quantitive metrics, citizens are able to more accurately express their preferences for the types of impact they want to reward, as well as make more accurate judgements of the impact delivered by individual contributors. You could run a Pairwise vote on retro funding projects adjacent to Retro Funding 4 voting and ask the community which voting outcomes they prefer! This could give the community relevant insights into which system produces preferable outcomes.


Thanks Jonas, for your guidance.

The idea of running an experiment to compare the outcomes of Pairwise with Retro Funding metrics is excellent. Pairwise has shown promising results in individual tests, and we are eager to see how it performs on a larger scale aggregating the results.

This would apply to any metric and especially emphasis on quantitative ones they are gameable. What is so strong about Pairwise is that in order to get a high rank you have to be voted. There is nothing you can do as a project owner to raise your Pairwise rank other than do a great job that stakeholders wanna vote on your project.

We can’t get it done by June 7. One challenge we’ve been facing is the reactive nature of our workflow in response to announcements. Let’s experiment with this round and if the outcomes are positive, we can refine the product for future Retro Funding initiatives.


Patch Notes: Pairwise Update

Update Highlights:

  • Removed Whale Category: The Whale category has been removed from the Delegates bucket.
  • Voting Power Adjustment: The voting power of Diamond Delegates has been increased from 7 to 8

New Rule Implemented:

  • If a user holds multiple badges, they will be accounted for as Badgeholder, Delegate, or Recipient in that order. Everyone except delegates can also be accounted for as an OP holder. The reason for this is to protect user anonymity; otherwise, we could risk potential counter-engineering of anonymity.

Hello! First, I want to congratulate you for making reviewing proposals and projects simpler and more fun. And I believe that paying attention to a friendly and intuitive user experience is something that can mean a lot in onboarding new people with less technical profiles.

I’d also like to share a question / reaction / suggestion that came to me.

Is there a reason why you chose the ‘tinder’ style for ‘Project Filtering’? How does someone swiping left affect the ‘score’?

I mention it because, although it is quick and common, it leaves out what seems to me to be the ‘heart of good governance’, which is the reason and integration of why ‘you disagree’/‘don’t like’ something.

My hypothesis: if we increase our ability to differentiate ‘lack of preference’ (not what you would like but still in the tolerable range) from ‘objections’ (the belief that this decision can negatively affect established goals) and our ability to adapt the proposals so that those red lights go out, then we will have greater learning and a higher level of anti-fragility.

A suggestion could be to have ‘multiple options’ (instead of the dichotomy of right / wrong) such as:
a) you love it
b) it seems good enough to you
c) you have an objection
(and in that last one open a section to express what it is, maybe using hashtags/cloud words or more ‘multiple options’).

These inputs can be reviewed by the team and/or trigger channels for the co-creation of agreements.

I understand that this ‘complexes’ the process, but I think it would give us better data, isn’t it?


This was a requested feature from feedback we got when we launched the first version for RetroPGF 3 list creation. Several people told us that there were some projects they just didn’t believe deserved any funding and having them come up in the pairwise assessment portion time and again was “annoying.” So we made a filter section that very explicitly states that if you click our red trash button, you do not want to give that project any points. We had to remove the swiping functionality as that was hard to do well in a web app, and our initial user research showed people would not download an app.

I love the suggestion! In future versions we absolutely do want to have something like this. I wouldn’t confound it with the intial filtering, as that is specifically just to make the pairwise voting better (fewer choices, makes for faster voting) but what we were thinking is making a more complex points system where voters would earn points for giving feedback on projects that they reject and projects that they gave high rankings.

We are very excited for the gamification of Pairwise. We are laying out the basic infrastructure in this version, but hopefully in the next version we can really gamify the experience and gather this kind of data.


Determining the Final Ranking

For calculating the distribution of a voters’ voting power we originally used the formula in the Budget Box Whitepaper. However, for this application, to improve UX we want to avoid requiring the extra pairwise comparisons that would be required for this method and instead create a simple scoring algorithm so that each rank within each category gets a specific amount of points and the categories’ rank also gets a specific amount of points. This will speed up the voting experience and remove cognitive overload.

  • Each category will have 15-30 projects.
  • Voters will filter projects before ranking.
  • Projects that do not pass the filter stage will receive 0 points from that voter.
  • At least 21% of the projects in a category must pass the filter and be ranked.
  • Delegates and OP Token Holders have different voting power weights whereas Recipients and Badgeholders are 1 address - 1 vote.

Pairwise Voting Overview

  1. The voter will be ranking the projects within each category, and after they have ranked two categories, they will also create a category ranking for the categories they have ranked.
  2. There are four stakeholder groups and each will have its own completely independent voting result.
  3. A voter that is part of multiple stakeholder groups only has to vote once, and their vote will be used in each group that they qualify for.
  4. Each voter has a voting weight in their stakeholder group. Badgeholders and Recipients all have a weight of 1 vote, OP Holders and Delegates have different voting weights depending on OP amounts held/delegated (see post above).
  5. If someone has a voting weight of 10 as a delegate, their ballot will be counted 10 times.
  6. Every voter is submitting a ballot for each category that they rank. The projects they rank within a category each receive a score and those scores are combined with the other voters’ ballots for that category to give a weighted average score for each project within each category.
  7. Every voter is also submitting a ballot for ranking the categories that they reviewed. Their overall category rank ballots are also calculated by a weighted average score just like projects.
  8. The weighted average category score for each category is then multiplied by the weighted average score for each project within each category to get a final score for each project.
  9. The final outcome will greatly depend on the voting algorithm that is used to give scores for each rank within each category.
  10. Each category will have a set score for each rank, this scoring algorithm will depend on how many projects are in the category. The scores for the ranking of categories will follow the same pattern as the scores for the ranking of projects within the category (with the exception of the handicap modification). This scoring algorithm is discussed in detail in the next sections.


Scoring formulas we are considering

We reviewed several different scoring formulas and have narrowed it down to 3. Gaussian, Poisson our own custom algorithm. We have not chosen the final scoring formula yet and would love your feedback! The first decision to make is which baseline distribution formula we should use.

1. Normal (Gaussian) Distribution

  • f(x): The score for a project (or category) with rank 𝑥 (first, sixth, eighteenth, etc)
  • 𝑥: The rank number of a project inside a category (for scoring projects) OR the rank number of a category among all categories (for scoring categories).
  • 𝜎: The desired standard deviation of the distribution. The smaller this value is set, the more weight goes to top ranked projects. We expect most configurations will use a value between 3 & 7 however any positive number can be used.

The Gaussian distribution is the middle way of the distributions, it gives the top 7 to 14 projects a score above 1% when comparing the extremes of the expected configuration space (ignoring the handicap).

2. Poisson Distribution

  • λ: The average number of events. The smaller this value is set, the more weight goes to top ranked projects. We expect most configurations will use a value between 1.7 & 1.98 however any number between 1 & 2 can be used.

The Poisson distribution is very top heavy, the winners of each category get much larger scores, only the top 5 to 6 projects get a score above 1% of the points for the entire configuration space (ignoring the handicap). This means that the projects not in the top 5 of someone’s ballot for each category get almost none of that voter’s voting power.

3. Our Custom Distribution

Our custom distribution is the most egalitarian, it gives the top 8 to 23 projects a score above 1% when comparing the extremes of the expected configuration space (ignoring the handicap). In this distribution the preferences expressed lower in the rankings have more of an impact, but being the winner of a category of someone’s ballot has the least impact of the 3 distributions.

  • r: The total number of projects in the category or when ranking categories, the total number of categories. Determined by round data.
  • δ: The angle of the slope. The larger this value is set, the more weight goes to top ranked projects. We expect most configurations will use a value between 0.5 & 10 however any number between 0 & 180 can be used.

To better adjust these scoring formulas to the task at hand we made several modifications, although we are also considering using the normal distribution without modifications.

Modifications on top of the Algos

While the normal distribution could be used without modifications, we wanted to further customize the scoring algorithms to our use case using the following modifications.

Modification #1: Add a minimum to bump up the long tail of projects.

In these distributions, for a high r, the long tail of projects were given a very small amount. Because these projects being ranked passed the first filter, we assume the user wants to give the bottom projects more votes than the distributions were providing. This simple modification allows us to do that.

  • f(x): The score for rank x given the chosen distribution method
  • m: The minimum percentage every project should get from the voting algorithm. The higher this number the more votes are allocated to the long tail of lower ranked projects. We expect most configurations will use a value between 0.5% & 1.5% however any number between 0 & 3% can be used.

Modification #2: Add a variance to give rankings more importance

In these distributions, for a low r the difference between the ranks was smaller than expected (e.g. if r = 2 ,the top rank gets 52% of the votes and the bottom rank gets 48% votes, this is not different enough) and with modification #1, the long tail was also getting almost the same amount for the last several ranks. To create a larger difference between the ranks, we created this simple modification that brings more of a power law shape to the distribution.

  • f’(x): The score for rank x given the chosen distribution method with Mod1
  • v: The variance. The higher this number the more votes are allocated to the top ranked projects and the bigger the difference is between top ranked projects. We expect most proposed configurations will use a value between 0.5 & 8 however any number between 0 & 1000 can be used.

Modification #3: Normalize to a percentage & add a handicap for projects in larger categories

Without this modification, a project that gets first place out of 15 projects actually scores the same or more points than a project that gets first place out of 30 projects! When the actual result should be the exact opposite. This is a simple modification that gives a linear handicap to categories with more projects. If a category has 30 projects each project will receive a boost of h percent multiplied by what their final score would be without the handicap, if it has 25 projects it will get a boost of 2h/3, if it has 20 projects it will get a boost of h/3, and if it has 15 projects, it will not receive a handicap at all. Before the handicap is applied, the scores after mod 1 and 2 are applied are normalized to a percentage. The ranking of categories does not need to apply this mod.

  • f’’(x): The score for rank x given the chosen distribution method with Mod 1 & Mod 2
  • ∑ f’'(x_i): Sum of scores: All the scores for each project from the previous step added together so that we can make the project scores normalized to a percentage.
  • h: The handicap. The higher this number, the larger the bonus that goes to projects from large categories. We expect most proposed configurations will use a value between 15% & 175% however any positive number can be used.

Play with our Dashboard!

I know this sounds crazy, but we want to open it up to the Optimism community to play with, and even propose, your own Pairwise voting algorithm! Our team will pick the best distribution on the 22nd of June.

We have created this dashboard for everyone to copy and play with to propose a configuration.

When playing with the dashboard, the best strategy is to focus on one distribution at a time and tune the parameters for that single scoring formula (either Normal, Modified Normal, Modified Poisson or Custom). The parameters impact each distribution in very different ways. For example, 110% handicap for our custom distribution is totally reasonable, whereas for Modified Poisson, it doesn’t make any sense at all.

Here is a video walk-thru to explain how to use the spreadsheet

We are excited to see what other data analysis enthusiasts, like @StanfordBlockchain and others come up with! Your thoughts and insights will absolutely help us choose our final distribution.

We will also post our proposed algorithms here as well. Once some proposals come through, feel free to review them here on the forum AND FORK THEM. It might be easier to iteratively improve other people’s algo than starting with our dashboard (which is a little raw).


Ok here is a first draft of some proposed parameters using the custom distribiution:

I used the playground ballot to compare the scoring algo for categories with 30 projects and categories with 15 projects and tried to iterate until it looked fair.

If you are in the top 3 projects in a small category, you are at a slight disadvantage, but if you are not in the top 3 you are slightly better off being in the smaller category.

Thats what I tuned for in this one.

1 Like

Thanks for creating the dashboard @griff! It was fun playing with the parameters, and those are the best ones!

Zep M.Poisson Voting Algo

I focused on 15-20 projects, as the data from applications indicate that this is the right number for our categories in Pairwise. Why did I max every variable in the algorithm? Because I wanted to “sum” up all the possibilities! :smiley:

What I like about those parameters is theclear distinction they create among the top-ranked projects. While I might be compromising slightly on the lower-ranked projects, that’s not as important. What voters really care is about their first options and last one still being accounted isnt like there parameters are ignore them or anything. This is the best way, we ensure the best rankings by effectively signaling top preferences of stakeholders.

What I like about those parameters is the clear distinction they create among the top-ranked projects. While It might be compromising slightly on the lower-ranked projects, that’s not as critical. It’s essential to note that participants often prioritize top choices, and the lower-ranked ones are still accounted for. It’s not like these parameters ignore them or anything. This is the best way to ensure that preferences of stakeholders are effectively highlighted.

1 Like

I made a new version, I think this one is even better, with the Mod’d Normal distribution.

It has a nice balance between rewarding the top winners and makeing sure the bottom votes count.

Screenshot 2024-06-19 at 4.27.00 PM

If you are in a category with 30 projects, there is an advantage to be in the top 4 over being in the top 2 of a category with only 15, but if you are in the bottom 20 you are at a dissadvantage and that in between area is about the same.

Another version with our custom distribution formula

This one I optimized the version I made last week, and I think it looks really nice.

I focused on making it so that if a projects gets first place in a category of 15 projects, it gets about the same as 2nd place in a category of 30 projects.
2nd place in a category of 15 projects, gets about the same as 4th place in a category of 30 projects.
3rd place in a category of 15 projects, gets about the same as 6th place in a category of 30 projects.

and so on…

I like this one.