[grant update] bleu's Farcaster sybil detection

Hello Optimism community!

The bleu team is excited to start building a sybil detection algorithm for Farcaster using social graph data. Our goal is to enhance the integrity of Farcaster and the ecosystem around it.

We’ll post bi-weekly updates here as comments. Your feedback is invaluable for us!

For transparency, you can track our progress on our GitHub Repository.

We look forward to your insights as we go!

3 Likes

Farcaster Social Graph Project Update #1

Hello everyone!

We’ve officially kicked off the Farcaster Social Graph project and have begun working on our first milestone: “Requirements Definition and Data Extraction”. This milestone should be complete in the next few weeks!

Milestones Tracker

Milestone Type Status ETA
:bar_chart: Requirements Definition and Data Extraction Critical In progress 2 weeks
:abacus: Algorithm Design and Architecture Critical Planned 3 weeks
:robot: Algorithm Development and Evaluation Critical Planned 5 weeks
:rocket: Algorithm Integration and Monitoring Critical Planned 2 weeks

Data Extraction

We plan to run an instance of Hubble to get Farcaster links data and work on replicating it into a Postgres database for further analysis. This approach seems to be our best path forward to get the graph and apply the Sybil Detection algorithms.

Upcoming Tasks

Before our next update, we aim to complete the following tasks:

  • Run a private instance of the Farcaster network with Hubble
  • Extract Warpcast users along with their connections
  • Create a script to track Warpcast user activity
2 Likes

Farcaster Social Graph Project Update #2

Hey folks! We’ve made significant progress on the project since our last update. We’ve completed our first milestone and are well into the next phases of the project.

Milestones Tracker

Milestone Type Status ETA
:bar_chart: Requirements Definition and Data Extraction Critical Wrapping up 2 weeks
:abacus: Algorithm Design and Architecture Critical In progress 3 weeks
:robot: Algorithm Development and Evaluation Critical In progress 5 weeks
:rocket: Algorithm Integration and Monitoring Critical Planned 2 weeks

Data Extraction

We’ve pivoted from our initial plan to run a private Hubble instance. Instead, we’re now using the Neynar API, which provides data in Parquet format. This shift allows for more efficient processing and reduces infrastructure costs. Key achievements in data extraction:

  1. Extracted key metrics about Farcaster users (likes, casts, recasts, follower/following counts)
  2. Built a dataset of over 11,000 labeled users (sybil/benign) using BotOrNot and manual verification
  3. Transformed Farcaster data into an undirected social network (focusing on mutual connections between users)

SybilSCAR Implementation

After exploring several graph-based Sybil detection algorithms, including SybilRank and SybilBelief, we’ve selected SybilSCAR as our primary method. SybilSCAR offers the best balance of speed, memory allocation, and accuracy for our needs.

We’ve begun applying it to our transformed Farcaster social network data and are developing a Sybil detection API. This API will manage and update datasets while providing an endpoint to check whether a Farcaster user is classified as a Sybil or a benign user.

Upcoming Tasks

  • Document first milestone
  • Extract gitcoin passports of users
  • Conduct thorough testing of the API results
  • Deploy the Sybil detection API to production
  • Enhance the training dataset to improve overall accuracy
1 Like

Farcaster Social Graph Project Update #3

Hey folks! We’ve been refining the graph-based algorithm, testing, and improving our training dataset. While we’ve been able to accomplish less than we wanted in the last two weeks, we’ll pick up some speed as another teammate is jumping on the project!

Milestones Tracker

Milestone Type Status ETA
:bar_chart: Requirements Definition and Data Extraction Critical Wrapping up 2 weeks
:abacus: Algorithm Design and Architecture Critical In progress 3 weeks
:robot: Algorithm Development and Evaluation Critical In progress 5 weeks
:rocket: Algorithm Integration and Monitoring Critical Planned 2 weeks

Key Progress

Gitcoin Stamps Extraction

We extracted Gitcoin badges from noisy users detected by the graph-based algorithm, enriching our training and test datasets. These stamps provide an additional layer of verification for users who use Warpcast and have registered at least one stamp on Gitcoin.

Improved Graph-based Model and Model Testing

Since the API must support a large amount of data, it is important to have an algorithm that can be executed asynchronously and to provide fast benign/sybil checks. We’ve built on top of the graph-based SybilScar algorithm to have a smaller memory footprint for the full dataset so that we can get a sybil response for each farcaster user in that dataset with subsecond response times.

Upcoming Tasks

  • Warpcast Bot Reporting Frame
  • First milestone documentation
  • Start developing a machine learning-based detection model
1 Like

Farcaster Social Graph Project Update #4

Hello everyone! We’ve developed a frame to report sybil accounts in Warpcast, as well as started documenting our key decisions and made important progress on the dataset inspection and improvement.

Milestone tracker

Milestone Type Status ETA
:bar_chart: Requirements Definition and Data Extraction Critical Wrapping up 2 weeks
:abacus: Algorithm Design and Architecture Critical Wrapping up 1 week
:robot: Algorithm Development and Evaluation Critical In progress 5 weeks
:rocket: Algorithm Integration and Monitoring Critical Planned 2 weeks

Key progress

Farcaster Frame

We launched Report Sybil, a Farcaster frame that enables users to identify and report sybils. Through community participation, we aim to use these reports to enhance our dataset’s quality in two key ways: first, by expanding detection beyond automated bots to include manually-operated Sybil accounts, and second, by leveraging human judgment, which typically provides more accurate identification in these cases. There is an Add Report Sybil frame to add this cast action on Warpcast.

Documentation of the first milestone

We started documenting the progress and decisions made so far on the Project Documentation. The idea is to register the main technical decisions made during development, as well as provide a technical overview.

Label reliability

Our primary challenge at the moment lies in obtaining reliable data labels to use in the Machine Learning model development. A manual inspection revealed that approximately 75% of samples labeled as ‘human’ in the Bot or Not dataset were actually bots. Given this high rate of mislabeling, the dataset proves unreliable for human samples. While we attempted to identify human users by filtering Warpcast users with valid ENS names, this approach also yielded unsatisfactory results as the filtered dataset still contained bots. We need to explore alternative methods for collecting genuine human samples.

Currently, we aim to obtain higher quality human labels from the reporters of the new cast action feature, as well as other type of verifications, such as Gitcoin stamps.

Challenges

Report Sybil community adoption

While the Report Sybil cast action is ready for use, its effectiveness relies on community participation and organic sharing within the Warpcast ecosystem. As Bleu team’s reach alone is limited within this environment, we’re seeking community support to help expand our tool’s visibility and usage. Our first step is to contact Warpcast community members who are committed to addressing the Sybil challenge and encourage them to share our reporting tool, besides sharing it in Discord groups.

Upcoming tasks

  • Improve reliability of human-labeled samples
  • Machine Learning model development
1 Like

Farcaster Social Graph Project Update #5

Hi folks! Excited to share that we’ve finished at all our initial milestones and we’re now actively working on development and integration. Our ensemble ML model is showing promising results with a 0.98 AUC score on the test set and we’ve created an initial API endpoint. The next steps are related to improve the model and integrate the API with the previously developed frame.

Milestones Tracker

Milestone Type Status ETA
:bar_chart: Requirements Definition and Data Extraction Critical Finished Finished
:abacus: Algorithm Design and Architecture Critical Finished Finished
:robot: Algorithm Development and Evaluation Critical In progress 2 weeks
:rocket: Algorithm Integration and Monitoring Critical In progress 2 weeks

Key progress

List of human users

We attempted several strategies to compile a high-confidence list of human accounts, including:

  • Filtering for accounts with validated ENS names
  • Selecting users with high Gitcoin scores (above 40)
  • Identifying accounts with validated ownership of an X account

However, even after applying these conditions, the resulting lists still consisted predominantly of bots (over 80%). Further investigation of a random sample of users with casts revealed an even higher proportion of bots (greater than 90%). Given these challenges, we determined that the most reliable approach was to manually inspect filtered samples, particularly focusing on examining the accounts they followed. This manual verification process resulted in a validated list of 416 human accounts, which were incorporated as labels together with the ~4k bot labels from Bot or Not (with 90% accuracy after inspection).

Development of initial machine learning model

Our experimental ensemble model combines XGBoost, Random Forest, and LightGBM classifiers, leveraging over 70 features extracted from the Neynar dataset. The ensemble achieved a strong AUC score of 0.98 on the test set. However, it’s important to note that our training data represents only approximately 0.5% of total Farcaster users, so these results should be interpreted with caution until we can validate the model’s performance across a broader user base.

The current feature set encompasses three main categories shown below. The full set of features each one encompasses can be seen on the project’s repo.

  • User identity metrics, which capture account-specific characteristics and verification status
  • Network analysis metrics that examine user interactions and connection patterns
  • Temporal behavior metrics, which track posting frequency and activity patterns over time
  • Content engagement metrics that analyze statistics about replies, reactions and participation on the social network in general
  • Reputation meta metrics, which capture verifications and scores values

In the coming weeks, we plan to expand this research by evaluating additional models, refining our feature selection to identify the most impactful variables, and incorporating new features focused on content engagement metrics.

Integration of ML model and SybilSCAR

We developed a Python library for detecting sybil accounts using machine learning, transforming our initial notebook experiments into production code. This library has been successfully integrated with our existing social graph API, completing the design and architecture phase of the project.

Deployment of basic API

We have deployed an initial version of the API to enable early testing and feedback. It consists of an endpoint where you can pass an fid and get the user sybil probability.

Upcoming tasks

  • Provide a public endpoint for the API
  • Integration of the API with the developed Farcaster frame
  • Improvements on the current machine learning model:
    • Feature selection to decrease computational costs
    • Engineering of new features
    • Experiments to guarantee the current results are applicable to all users

Farcaster Social Graph Project Update #6

Hi everyone! We’re happy to announce that the project is nearing completion. In the past weeks, we focused on upgrading the initial frame to Frames-V2, which delivers an enhanced user experience. This improvement is particularly important as the frame should be one of the main interaction points with our social graph algorithm for sybil detection.

Milestones Tracker

Milestone Type Status ETA
:bar_chart: Requirements Definition and Data Extraction Critical Finished Finished
:abacus: Algorithm Design and Architecture Critical Finished Finished
:robot: Algorithm Development and Evaluation Critical Wrapping up 1 week
:rocket: Algorithm Integration and Monitoring Critical Wrapping up 1 week

Key progress

Frame upgrade

We successfully migrated to Frames-v2 in Farcaster, enhancing user interaction capabilities. The upgraded frame system includes the following improvements:

  • Integration of EAS attestations for user feedback validation, meaning the users now attest when they are reporting someone as sybil or human
  • Implementation of reCAPTCHA to replace the previous text-based verification system
  • Enhanced user profile display showing fid, fname, profile picture, and sybil probability
  • Architecture prepared for database integration and social graph algorithm API connectivity
  • The overall user experience is more fluid and significantly better for mobile users

The frame can be accessed by clicking here, and after entering the fname to report, it will appear as shown in the image below.

Note that sybil probability values are not currently displayed since the social graph API hasn’t been deployed yet. We’re making final improvements to the API before its public launch, which will be coming soon. Also, we have some other UX improvements in mind.

Upcoming tasks

  • Launch sybil probability public API endpoint
  • Develop comprehensive analytics dashboard
  • Feature selection to reduce machine learning model computational overhead
  • API and algorithm final documentation
2 Likes