[grant update] bleu's Farcaster sybil detection

bleu_builders · October 1, 2024, 5:22pm

Hello Optimism community!

The bleu team is excited to start building a sybil detection algorithm for Farcaster using social graph data. Our goal is to enhance the integrity of Farcaster and the ecosystem around it.

We’ll post bi-weekly updates here as comments. Your feedback is invaluable for us!

For transparency, you can track our progress on our GitHub Repository.

We look forward to your insights as we go!

bleu_builders · October 1, 2024, 5:24pm

Farcaster Social Graph Project Update #1

Hello everyone!

We’ve officially kicked off the Farcaster Social Graph project and have begun working on our first milestone: “Requirements Definition and Data Extraction”. This milestone should be complete in the next few weeks!

Milestones Tracker

Milestone	Type	Status	ETA
Requirements Definition and Data Extraction	Critical	In progress	2 weeks
Algorithm Design and Architecture	Critical	Planned	3 weeks
Algorithm Development and Evaluation	Critical	Planned	5 weeks
Algorithm Integration and Monitoring	Critical	Planned	2 weeks

Data Extraction

We plan to run an instance of Hubble to get Farcaster links data and work on replicating it into a Postgres database for further analysis. This approach seems to be our best path forward to get the graph and apply the Sybil Detection algorithms.

Upcoming Tasks

Before our next update, we aim to complete the following tasks:

Run a private instance of the Farcaster network with Hubble
Extract Warpcast users along with their connections
Create a script to track Warpcast user activity

bleu_builders · October 15, 2024, 12:10am

Farcaster Social Graph Project Update #2

Hey folks! We’ve made significant progress on the project since our last update. We’ve completed our first milestone and are well into the next phases of the project.

Milestones Tracker

Milestone	Type	Status	ETA
Requirements Definition and Data Extraction	Critical	Wrapping up	2 weeks
Algorithm Design and Architecture	Critical	In progress	3 weeks
Algorithm Development and Evaluation	Critical	In progress	5 weeks
Algorithm Integration and Monitoring	Critical	Planned	2 weeks

Data Extraction

We’ve pivoted from our initial plan to run a private Hubble instance. Instead, we’re now using the Neynar API, which provides data in Parquet format. This shift allows for more efficient processing and reduces infrastructure costs. Key achievements in data extraction:

Extracted key metrics about Farcaster users (likes, casts, recasts, follower/following counts)
Built a dataset of over 11,000 labeled users (sybil/benign) using BotOrNot and manual verification
Transformed Farcaster data into an undirected social network (focusing on mutual connections between users)

SybilSCAR Implementation

After exploring several graph-based Sybil detection algorithms, including SybilRank and SybilBelief, we’ve selected SybilSCAR as our primary method. SybilSCAR offers the best balance of speed, memory allocation, and accuracy for our needs.

We’ve begun applying it to our transformed Farcaster social network data and are developing a Sybil detection API. This API will manage and update datasets while providing an endpoint to check whether a Farcaster user is classified as a Sybil or a benign user.

Upcoming Tasks

Document first milestone
Extract gitcoin passports of users
Conduct thorough testing of the API results
Deploy the Sybil detection API to production
Enhance the training dataset to improve overall accuracy

bleu_builders · October 31, 2024, 6:24pm

Farcaster Social Graph Project Update #3

Hey folks! We’ve been refining the graph-based algorithm, testing, and improving our training dataset. While we’ve been able to accomplish less than we wanted in the last two weeks, we’ll pick up some speed as another teammate is jumping on the project!

Milestones Tracker

Milestone	Type	Status	ETA
Requirements Definition and Data Extraction	Critical	Wrapping up	2 weeks
Algorithm Design and Architecture	Critical	In progress	3 weeks
Algorithm Development and Evaluation	Critical	In progress	5 weeks
Algorithm Integration and Monitoring	Critical	Planned	2 weeks

Key Progress

Gitcoin Stamps Extraction

We extracted Gitcoin badges from noisy users detected by the graph-based algorithm, enriching our training and test datasets. These stamps provide an additional layer of verification for users who use Warpcast and have registered at least one stamp on Gitcoin.

Improved Graph-based Model and Model Testing

Since the API must support a large amount of data, it is important to have an algorithm that can be executed asynchronously and to provide fast benign/sybil checks. We’ve built on top of the graph-based SybilScar algorithm to have a smaller memory footprint for the full dataset so that we can get a sybil response for each farcaster user in that dataset with subsecond response times.

Upcoming Tasks

Warpcast Bot Reporting Frame
First milestone documentation
Start developing a machine learning-based detection model

bleu_builders · November 19, 2024, 1:04pm

Farcaster Social Graph Project Update #4

Hello everyone! We’ve developed a frame to report sybil accounts in Warpcast, as well as started documenting our key decisions and made important progress on the dataset inspection and improvement.

Milestone tracker

Milestone	Type	Status	ETA
Requirements Definition and Data Extraction	Critical	Wrapping up	2 weeks
Algorithm Design and Architecture	Critical	Wrapping up	1 week
Algorithm Development and Evaluation	Critical	In progress	5 weeks
Algorithm Integration and Monitoring	Critical	Planned	2 weeks

Key progress

Farcaster Frame

We launched Report Sybil, a Farcaster frame that enables users to identify and report sybils. Through community participation, we aim to use these reports to enhance our dataset’s quality in two key ways: first, by expanding detection beyond automated bots to include manually-operated Sybil accounts, and second, by leveraging human judgment, which typically provides more accurate identification in these cases. There is an Add Report Sybil frame to add this cast action on Warpcast.

Documentation of the first milestone

We started documenting the progress and decisions made so far on the Project Documentation. The idea is to register the main technical decisions made during development, as well as provide a technical overview.

Label reliability

Our primary challenge at the moment lies in obtaining reliable data labels to use in the Machine Learning model development. A manual inspection revealed that approximately 75% of samples labeled as ‘human’ in the Bot or Not dataset were actually bots. Given this high rate of mislabeling, the dataset proves unreliable for human samples. While we attempted to identify human users by filtering Warpcast users with valid ENS names, this approach also yielded unsatisfactory results as the filtered dataset still contained bots. We need to explore alternative methods for collecting genuine human samples.

Currently, we aim to obtain higher quality human labels from the reporters of the new cast action feature, as well as other type of verifications, such as Gitcoin stamps.

Challenges

Report Sybil community adoption

While the Report Sybil cast action is ready for use, its effectiveness relies on community participation and organic sharing within the Warpcast ecosystem. As Bleu team’s reach alone is limited within this environment, we’re seeking community support to help expand our tool’s visibility and usage. Our first step is to contact Warpcast community members who are committed to addressing the Sybil challenge and encourage them to share our reporting tool, besides sharing it in Discord groups.

Upcoming tasks

Improve reliability of human-labeled samples
Machine Learning model development

bleu_builders · December 19, 2024, 8:52pm

Farcaster Social Graph Project Update #5

Hi folks! Excited to share that we’ve finished at all our initial milestones and we’re now actively working on development and integration. Our ensemble ML model is showing promising results with a 0.98 AUC score on the test set and we’ve created an initial API endpoint. The next steps are related to improve the model and integrate the API with the previously developed frame.

Milestones Tracker

Milestone	Type	Status	ETA
Requirements Definition and Data Extraction	Critical	Finished	Finished
Algorithm Design and Architecture	Critical	Finished	Finished
Algorithm Development and Evaluation	Critical	In progress	2 weeks
Algorithm Integration and Monitoring	Critical	In progress	2 weeks

Key progress

List of human users

We attempted several strategies to compile a high-confidence list of human accounts, including:

Filtering for accounts with validated ENS names
Selecting users with high Gitcoin scores (above 40)
Identifying accounts with validated ownership of an X account

However, even after applying these conditions, the resulting lists still consisted predominantly of bots (over 80%). Further investigation of a random sample of users with casts revealed an even higher proportion of bots (greater than 90%). Given these challenges, we determined that the most reliable approach was to manually inspect filtered samples, particularly focusing on examining the accounts they followed. This manual verification process resulted in a validated list of 416 human accounts, which were incorporated as labels together with the ~4k bot labels from Bot or Not (with 90% accuracy after inspection).

Development of initial machine learning model

Our experimental ensemble model combines XGBoost, Random Forest, and LightGBM classifiers, leveraging over 70 features extracted from the Neynar dataset. The ensemble achieved a strong AUC score of 0.98 on the test set. However, it’s important to note that our training data represents only approximately 0.5% of total Farcaster users, so these results should be interpreted with caution until we can validate the model’s performance across a broader user base.

The current feature set encompasses three main categories shown below. The full set of features each one encompasses can be seen on the project’s repo.

User identity metrics, which capture account-specific characteristics and verification status
Network analysis metrics that examine user interactions and connection patterns
Temporal behavior metrics, which track posting frequency and activity patterns over time
Content engagement metrics that analyze statistics about replies, reactions and participation on the social network in general
Reputation meta metrics, which capture verifications and scores values

In the coming weeks, we plan to expand this research by evaluating additional models, refining our feature selection to identify the most impactful variables, and incorporating new features focused on content engagement metrics.

Integration of ML model and SybilSCAR

We developed a Python library for detecting sybil accounts using machine learning, transforming our initial notebook experiments into production code. This library has been successfully integrated with our existing social graph API, completing the design and architecture phase of the project.

Deployment of basic API

We have deployed an initial version of the API to enable early testing and feedback. It consists of an endpoint where you can pass an fid and get the user sybil probability.

Upcoming tasks

Provide a public endpoint for the API
Integration of the API with the developed Farcaster frame
Improvements on the current machine learning model:
- Feature selection to decrease computational costs
- Engineering of new features
- Experiments to guarantee the current results are applicable to all users

bleu_builders · January 7, 2025, 7:59pm

Farcaster Social Graph Project Update #6

Hi everyone! We’re happy to announce that the project is nearing completion. In the past weeks, we focused on upgrading the initial frame to Frames-V2, which delivers an enhanced user experience. This improvement is particularly important as the frame should be one of the main interaction points with our social graph algorithm for sybil detection.

Milestones Tracker

Milestone	Type	Status	ETA
Requirements Definition and Data Extraction	Critical	Finished	Finished
Algorithm Design and Architecture	Critical	Finished	Finished
Algorithm Development and Evaluation	Critical	Wrapping up	1 week
Algorithm Integration and Monitoring	Critical	Wrapping up	1 week

Key progress

Frame upgrade

We successfully migrated to Frames-v2 in Farcaster, enhancing user interaction capabilities. The upgraded frame system includes the following improvements:

Integration of EAS attestations for user feedback validation, meaning the users now attest when they are reporting someone as sybil or human
Implementation of reCAPTCHA to replace the previous text-based verification system
Enhanced user profile display showing fid, fname, profile picture, and sybil probability
Architecture prepared for database integration and social graph algorithm API connectivity
The overall user experience is more fluid and significantly better for mobile users

The frame can be accessed by clicking here, and after entering the fname to report, it will appear as shown in the image below.

Note that sybil probability values are not currently displayed since the social graph API hasn’t been deployed yet. We’re making final improvements to the API before its public launch, which will be coming soon. Also, we have some other UX improvements in mind.

Upcoming tasks

Launch sybil probability public API endpoint
Develop comprehensive analytics dashboard
Feature selection to reduce machine learning model computational overhead
API and algorithm final documentation

Topic		Replies	Views
[Mission Request] Farcaster social graph Governance Fund Missions 🏹 s6-mission-request	10	972	March 29, 2025
[grant update] bleu Governance Assistant AI - GovGPT and GovSummaries Grants Updates	11	576	April 1, 2025
Modularity Analysis of Voters using OpenRank for Optimism Retro 5 and Retro 6 Citizens 👥 retro-funding	0	67	February 20, 2025
[grant update] OP Govquests Governance Fund Missions 🏹 grant-update	11	300	February 18, 2025
Proposal for Mass Blockchain Adoption: A Loyalty-Based Concert Ticketing System on Farcaster	3	40	March 31, 2025

[grant update] bleu's Farcaster sybil detection

Farcaster Social Graph Project Update #1

Milestones Tracker

Data Extraction

Upcoming Tasks

Farcaster Social Graph Project Update #2

Milestones Tracker

Data Extraction

SybilSCAR Implementation

Upcoming Tasks

Farcaster Social Graph Project Update #3

Milestones Tracker

Key Progress

Gitcoin Stamps Extraction

Improved Graph-based Model and Model Testing

Upcoming Tasks

Farcaster Social Graph Project Update #4

Milestone tracker

Key progress

Farcaster Frame

Documentation of the first milestone

Label reliability

Challenges

Report Sybil community adoption

Upcoming tasks

Farcaster Social Graph Project Update #5

Milestones Tracker

Key progress

List of human users

Development of initial machine learning model

Integration of ML model and SybilSCAR

Deployment of basic API

Upcoming tasks

Farcaster Social Graph Project Update #6

Milestones Tracker

Key progress

Frame upgrade

Upcoming tasks

Related topics