x-algorithm-enhancements

Research enhancements to xAI's open-sourced recommendation algorithm. This fork adds two features on top of the vendored Phoenix/Grok system:

Enhancements

KV-Cache Optimization

Inference optimization for the Phoenix transformer, targeting latency and memory:

10.3x JIT speedup (103.8 ms → 10.0 ms per forward pass)
9.6x KV-cache speedup with full key-value tensor caching
58% memory reduction via INT8 quantization (~90% top-3 score agreement)
See enhancements/optimization/ and docs/f1/

Multi-Stakeholder Reward Modeling

Bradley-Terry preference learning for multi-stakeholder recommendation. Preprint: When More Data Hurts: A Directional Goodhart Condition for Multi-Stakeholder Preference Learning.

Directional Goodhart condition: The sign of the cosine between BT-trained stakeholder weight vectors determines whether additional training data helps or harms a hidden stakeholder. Validated at 100% accuracy (32/32, |cos|>0.2) under softmax-weighted evaluation across 4 datasets.
Selection mechanism matters: Under hard top-K selection, the condition can reverse for stakeholders with high utility variance near the selection boundary (3 of 32 pairs on MIND). The reversal vanishes at softmax temperature T >= 1.
Labels, not loss: Stakeholder differentiation comes from training labels, not the loss function. 3 BT loss variants converge to near-identical weights across 16 stakeholder configurations on 4 datasets.
Data budget: A median of 34 preference pairs achieves 50% recovery of hidden stakeholder harm (range: 14 to >500 pairs depending on stakeholder geometry).
Audit toolkit: For X's 18-action engagement space, the Goodhart risk reduces to one observable — whether the platform treats negative signals (blocks, reports) as positive engagement.

Validated on 4 datasets spanning 3 feature families:

Dataset	Domain	D	K	Pool
MovieLens-100K	Movies	19	3	1,305
MovieLens-1M	Movies	19	3	3,347
MIND-small	News	35	5	12,261
Amazon Kindle	E-commerce	32	5	17,425

See enhancements/reward_modeling/ and docs/f2/

Architecture

See docs/architecture.md for system diagrams (Mermaid).

Quick Start

# Install dependencies
uv sync

# Run reward modeling tests
make test

# Run all tests (includes optimization)
make test-all

# Quality gates
make all    # test + lint + typecheck

# Train a reward model
uv run python scripts/training/train_reward_model.py

# Run the 87-experiment loss function comparison
uv run python scripts/experiments/run_loss_experiments.py

# Run partial observation analysis
uv run python scripts/analysis/analyze_partial_observation.py --exp 4

Repository Structure

enhancements/               # Enhancement code
├── optimization/           # KV-cache, JIT, INT8 quantization
├── reward_modeling/        # BT training, stakeholder utilities, Pareto analysis
├── data/                   # Data adapters (synthetic, MovieLens, MIND, Amazon Kindle)
├── analysis/               # Trajectory & sensitivity analysis
├── verification/           # Test suites for synthetic verification
└── training/               # Training harness

scripts/                    # Training, analysis, and experiment scripts
tests/                      # Pytest suite (reward_modeling, analysis, optimization)
results/                    # Experiment outputs (JSON, PNG)
docs/                       # Documentation (design, results, retrospectives)

# Vendored (xAI, read-only):
phoenix/                    # Grok-based transformer model
home-mixer/                 # Rust orchestration layer
thunder/                    # Rust in-memory post store
candidate-pipeline/         # Rust pipeline framework

Vendored System: X For You Feed Algorithm

The sections below describe the original xAI system. Our enhancements build on top of this architecture.

Overview

The For You feed algorithm retrieves, ranks, and filters posts from two sources:

In-Network (Thunder): Posts from accounts you follow
Out-of-Network (Phoenix Retrieval): Posts discovered from a global corpus

Both sources are combined and ranked together using Phoenix, a Grok-based transformer model that predicts engagement probabilities for each post. The final score is a weighted combination of these predicted engagements.

We have eliminated every single hand-engineered feature and most heuristics from the system. The Grok-based transformer does all the heavy lifting by understanding your engagement history (what you liked, replied to, shared, etc.) and using that to determine what content is relevant to you.

System Architecture

┌─────────────────────────────────────────────────────────────────────────────────────────────┐
│                                    FOR YOU FEED REQUEST                                     │
└─────────────────────────────────────────────────────────────────────────────────────────────┘
                                               │
                                               ▼
┌─────────────────────────────────────────────────────────────────────────────────────────────┐
│                                         HOME MIXER                                          │
│                                    (Orchestration Layer)                                    │
├─────────────────────────────────────────────────────────────────────────────────────────────┤
│                                                                                             │
│   ┌─────────────────────────────────────────────────────────────────────────────────────┐   │
│   │                                   QUERY HYDRATION                                   │   │
│   │  ┌──────────────────────────┐    ┌──────────────────────────────────────────────┐   │   │
│   │  │ User Action Sequence     │    │ User Features                                │   │   │
│   │  │ (engagement history)     │    │ (following list, preferences, etc.)          │   │   │
│   │  └──────────────────────────┘    └──────────────────────────────────────────────┘   │   │
│   └─────────────────────────────────────────────────────────────────────────────────────┘   │
│                                              │                                              │
│                                              ▼                                              │
│   ┌─────────────────────────────────────────────────────────────────────────────────────┐   │
│   │                                  CANDIDATE SOURCES                                  │   │
│   │         ┌─────────────────────────────┐    ┌────────────────────────────────┐       │   │
│   │         │        THUNDER              │    │     PHOENIX RETRIEVAL          │       │   │
│   │         │    (In-Network Posts)       │    │   (Out-of-Network Posts)       │       │   │
│   │         │                             │    │                                │       │   │
│   │         │  Posts from accounts        │    │  ML-based similarity search    │       │   │
│   │         │  you follow                 │    │  across global corpus          │       │   │
│   │         └─────────────────────────────┘    └────────────────────────────────┘       │   │
│   └─────────────────────────────────────────────────────────────────────────────────────┘   │
│                                              │                                              │
│                                              ▼                                              │
│   ┌─────────────────────────────────────────────────────────────────────────────────────┐   │
│   │                                      HYDRATION                                      │   │
│   │  Fetch additional data: core post metadata, author info, media entities, etc.       │   │
│   └─────────────────────────────────────────────────────────────────────────────────────┘   │
│                                              │                                              │
│                                              ▼                                              │
│   ┌─────────────────────────────────────────────────────────────────────────────────────┐   │
│   │                                      FILTERING                                      │   │
│   │  Remove: duplicates, old posts, self-posts, blocked authors, muted keywords, etc.   │   │
│   └─────────────────────────────────────────────────────────────────────────────────────┘   │
│                                              │                                              │
│                                              ▼                                              │
│   ┌─────────────────────────────────────────────────────────────────────────────────────┐   │
│   │                                       SCORING                                       │   │
│   │  ┌──────────────────────────┐                                                       │   │
│   │  │  Phoenix Scorer          │    Grok-based Transformer predicts:                   │   │
│   │  │  (ML Predictions)        │    P(like), P(reply), P(repost), P(click)...          │   │
│   │  └──────────────────────────┘                                                       │   │
│   │               │                                                                     │   │
│   │               ▼                                                                     │   │
│   │  ┌──────────────────────────┐                                                       │   │
│   │  │  Weighted Scorer         │    Weighted Score = Σ (weight × P(action))            │   │
│   │  │  (Combine predictions)   │                                                       │   │
│   │  └──────────────────────────┘                                                       │   │
│   │               │                                                                     │   │
│   │               ▼                                                                     │   │
│   │  ┌──────────────────────────┐                                                       │   │
│   │  │  Author Diversity        │    Attenuate repeated author scores                   │   │
│   │  │  Scorer                  │    to ensure feed diversity                           │   │
│   │  └──────────────────────────┘                                                       │   │
│   └─────────────────────────────────────────────────────────────────────────────────────┘   │
│                                              │                                              │
│                                              ▼                                              │
│   ┌─────────────────────────────────────────────────────────────────────────────────────┐   │
│   │                                      SELECTION                                      │   │
│   │                    Sort by final score, select top K candidates                     │   │
│   └─────────────────────────────────────────────────────────────────────────────────────┘   │
│                                              │                                              │
│                                              ▼                                              │
│   ┌─────────────────────────────────────────────────────────────────────────────────────┐   │
│   │                              FILTERING (Post-Selection)                             │   │
│   │                 Visibility filtering (deleted/spam/violence/gore etc)               │   │
│   └─────────────────────────────────────────────────────────────────────────────────────┘   │
│                                                                                             │
└─────────────────────────────────────────────────────────────────────────────────────────────┘
                                               │
                                               ▼
┌─────────────────────────────────────────────────────────────────────────────────────────────┐
│                                     RANKED FEED RESPONSE                                    │
└─────────────────────────────────────────────────────────────────────────────────────────────┘

Components

Home Mixer

Location: home-mixer/

The orchestration layer that assembles the For You feed. It leverages the CandidatePipeline framework with the following stages:

Stage	Description
Query Hydrators	Fetch user context (engagement history, following list)
Sources	Retrieve candidates from Thunder and Phoenix
Hydrators	Enrich candidates with additional data
Filters	Remove ineligible candidates
Scorers	Predict engagement and compute final scores
Selector	Sort by score and select top K
Post-Selection Filters	Final visibility and dedup checks
Side Effects	Cache request info for future use

The server exposes a gRPC endpoint (ScoredPostsService) that returns ranked posts for a given user.

Thunder

Location: thunder/

An in-memory post store and realtime ingestion pipeline that tracks recent posts from all users. It:

Consumes post create/delete events from Kafka
Maintains per-user stores for original posts, replies/reposts, and video posts
Serves "in-network" post candidates from accounts the requesting user follows
Automatically trims posts older than the retention period

Thunder enables sub-millisecond lookups for in-network content without hitting an external database.

Phoenix

Location: phoenix/

The ML component with two main functions:

1. Retrieval (Two-Tower Model)

Finds relevant out-of-network posts:

User Tower: Encodes user features and engagement history into an embedding
Candidate Tower: Encodes all posts into embeddings
Similarity Search: Retrieves top-K posts via dot product similarity

2. Ranking (Transformer with Candidate Isolation)

Predicts engagement probabilities for each candidate:

Takes user context (engagement history) and candidate posts as input
Uses special attention masking so candidates cannot attend to each other
Outputs probabilities for each action type (like, reply, repost, click, etc.)

See phoenix/README.md for detailed architecture documentation.

Candidate Pipeline

Location: candidate-pipeline/

A reusable framework for building recommendation pipelines. Defines traits for:

Trait	Purpose
`Source`	Fetch candidates from a data source
`Hydrator`	Enrich candidates with additional features
`Filter`	Remove candidates that shouldn't be shown
`Scorer`	Compute scores for ranking
`Selector`	Sort and select top candidates
`SideEffect`	Run async side effects (caching, logging)

The framework runs sources and hydrators in parallel where possible, with configurable error handling and logging.

How It Works

Pipeline Stages

Query Hydration: Fetch the user's recent engagements history and metadata (eg. following list)
Candidate Sourcing: Retrieve candidates from:
- Thunder: Recent posts from followed accounts (in-network)
- Phoenix Retrieval: ML-discovered posts from the global corpus (out-of-network)
Candidate Hydration: Enrich candidates with:
- Core post data (text, media, etc.)
- Author information (username, verification status)
- Video duration (for video posts)
- Subscription status
Pre-Scoring Filters: Remove posts that are:
- Duplicates
- Too old
- From the viewer themselves
- From blocked/muted accounts
- Containing muted keywords
- Previously seen or recently served
- Ineligible subscription content
Scoring: Apply multiple scorers sequentially:
- Phoenix Scorer: Get ML predictions from the Phoenix transformer model
- Weighted Scorer: Combine predictions into a final relevance score
- Author Diversity Scorer: Attenuate repeated author scores for diversity
- OON Scorer: Adjust scores for out-of-network content
Selection: Sort by score and select the top K candidates
Post-Selection Processing: Final validation of post candidates to be served

Scoring and Ranking

The Phoenix Grok-based transformer model predicts probabilities for multiple engagement types:

Predictions:
├── P(favorite)
├── P(reply)
├── P(repost)
├── P(quote)
├── P(click)
├── P(profile_click)
├── P(video_view)
├── P(photo_expand)
├── P(share)
├── P(dwell)
├── P(follow_author)
├── P(not_interested)
├── P(block_author)
├── P(mute_author)
└── P(report)

The Weighted Scorer combines these into a final score:

Final Score = Σ (weight_i × P(action_i))

Positive actions (like, repost, share) have positive weights. Negative actions (block, mute, report) have negative weights, pushing down content the user would likely dislike.

Filtering

Filters run at two stages:

Pre-Scoring Filters:

Filter	Purpose
`DropDuplicatesFilter`	Remove duplicate post IDs
`CoreDataHydrationFilter`	Remove posts that failed to hydrate core metadata
`AgeFilter`	Remove posts older than threshold
`SelfpostFilter`	Remove user's own posts
`RepostDeduplicationFilter`	Dedupe reposts of same content
`IneligibleSubscriptionFilter`	Remove paywalled content user can't access
`PreviouslySeenPostsFilter`	Remove posts user has already seen
`PreviouslyServedPostsFilter`	Remove posts already served in session
`MutedKeywordFilter`	Remove posts with user's muted keywords
`AuthorSocialgraphFilter`	Remove posts from blocked/muted authors

Post-Selection Filters:

Filter	Purpose
`VFFilter`	Remove posts that are deleted/spam/violence/gore etc.
`DedupConversationFilter`	Deduplicate multiple branches of the same conversation thread

Key Design Decisions

1. No Hand-Engineered Features

The system relies entirely on the Grok-based transformer to learn relevance from user engagement sequences. No manual feature engineering for content relevance. This significantly reduces the complexity in our data pipelines and serving infrastructure.

2. Candidate Isolation in Ranking

During transformer inference, candidates cannot attend to each other—only to the user context. This ensures the score for a post doesn't depend on which other posts are in the batch, making scores consistent and cacheable.

3. Hash-Based Embeddings

Both retrieval and ranking use multiple hash functions for embedding lookup

4. Multi-Action Prediction

Rather than predicting a single "relevance" score, the model predicts probabilities for many actions.

5. Composable Pipeline Architecture

The candidate-pipeline crate provides a flexible framework for building recommendation pipelines with:

Separation of pipeline execution and monitoring from business logic
Parallel execution of independent stages and graceful error handling
Easy addition of new sources, hydrations, filters, and scorers

License

This project is licensed under the Apache License 2.0. See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

x-algorithm-enhancements

Enhancements

KV-Cache Optimization

Multi-Stakeholder Reward Modeling

Architecture

Quick Start

Repository Structure

Vendored System: X For You Feed Algorithm

Overview

System Architecture

Components

Home Mixer

Thunder

Phoenix

1. Retrieval (Two-Tower Model)

2. Ranking (Transformer with Candidate Isolation)

Candidate Pipeline

How It Works

Pipeline Stages

Scoring and Ranking

Filtering

Key Design Decisions

1. No Hand-Engineered Features

2. Candidate Isolation in Ranking

3. Hash-Based Embeddings

4. Multi-Action Prediction

5. Composable Pipeline Architecture

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 195 Commits
benchmarks		benchmarks
candidate-pipeline		candidate-pipeline
docs		docs
enhancements		enhancements
experiments		experiments
home-mixer		home-mixer
outputs		outputs
phoenix		phoenix
results		results
scripts		scripts
tasks		tasks
tests		tests
thunder		thunder
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

x-algorithm-enhancements

Enhancements

KV-Cache Optimization

Multi-Stakeholder Reward Modeling

Architecture

Quick Start

Repository Structure

Vendored System: X For You Feed Algorithm

Overview

System Architecture

Components

Home Mixer

Thunder

Phoenix

1. Retrieval (Two-Tower Model)

2. Ranking (Transformer with Candidate Isolation)

Candidate Pipeline

How It Works

Pipeline Stages

Scoring and Ranking

Filtering

Key Design Decisions

1. No Hand-Engineered Features

2. Candidate Isolation in Ranking

3. Hash-Based Embeddings

4. Multi-Action Prediction

5. Composable Pipeline Architecture

License

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages