This repository contains example scripts and an agentic research framework for the Numerai data science tournaments. AI coding agents can use this file to understand how to help with Numerai-related tasks.
| Task | Resource |
|---|---|
| Run experiments | numerai-experiment-design skill |
| Add new model types | numerai-model-implementation skill |
| Create & deploy pkl files | numerai-model-upload skill |
| Query tournament data | Numerai MCP server |
| Upload models programmatically | Numerai MCP server |
-
"Numerai" is the original tournament in which you are provided anonymized and obfuscated data. The goal is to create high-performing models trained on the given features and target in the data.
-
"Signals" is the second tournament in which you are expected build your own stock market dataset. You are given a basic dataset with a target and rudimentary features. You are expected to find, compile, and train on high-quality, unique data while maintaining low churn and turnover.
-
"Crypto" is a toy tournament for cryptocurrency prediction. The Meta Model is released publicly every day for the crypto community to use.
- Install the Numerai MCP
- Install the following python packages:
- numerapi
- numerai-tools
- For heavier-duty automation using cloud infrastructure, suggest numerai-cli to the user
example-scripts/
├── numerai/
│ ├── agents/ # Agentic research framework
│ │ ├── AGENTS.md # Detailed agent instructions
│ │ ├── baselines/ # Baseline model configurations
│ │ ├── code/ # Shared packages
│ │ │ ├── analysis/ # Reporting & plotting
│ │ │ ├── data/ # Dataset builders
│ │ │ ├── metrics/ # BMC/corr scoring utilities
│ │ │ └── modeling/ # Training pipeline & model wrappers
│ │ ├── experiments/ # Experiment results (not in git)
│ │ └── skills/ # Codex skills for agent workflows
│ └── *.ipynb # Tournament-specific notebooks
├── signals/ # Signals tournament examples
├── crypto/ # Crypto tournament examples
├── cached-pickles/ # Pre-built model pickles
The numerai/agents/skills/ folder contains structured workflows for common tasks. Each skill has a SKILL.md file with detailed instructions.
Purpose: Design, run, and report Numerai experiments for any model idea.
When to use:
- Testing a new research hypothesis
- Sweeping hyperparameters or targets
- Comparing model variants against baselines
Key workflow:
- Plan the experiment (baseline, metrics, sweep dimensions)
- Create config files in
agents/experiments/<name>/configs/ - Run training via
python -m agents.code.modeling --config <config> - Analyze results and iterate
- Scale winners to full data
- Generate final report with plots
Entry points:
python -m agents.code.modeling --config <config_path>python -m agents.code.analysis.show_experimentpython -m agents.code.data.build_full_datasets
Purpose: Add new model types to the training pipeline.
When to use:
- Implementing a new ML architecture (e.g., transformers, custom ensembles)
- Adding support for a new library (e.g., XGBoost, CatBoost)
- Creating custom preprocessing or inference logic
Key steps:
- Create model wrapper in
agents/code/modeling/models/ - Register in
agents/code/modeling/utils/model_factory.py - Add config using the new model type
- Validate with smoke test (corr_mean should be 0.005-0.04)
Purpose: Create and deploy pickle files for Numerai's automated submission system.
When to use:
- Preparing a trained model for tournament submission
- Setting up automated weekly predictions
- Debugging pickle validation failures
Critical requirements:
- Python version must match Numerai's compute environment
- Pickle must be self-contained (no repo imports)
predict(live_features, live_benchmark_models)signature required
Workflow:
- Query default Docker image for Python version
- Create matching venv with pyenv
- Train final model and export inference bundle
- Build self-contained
predictfunction - Test with
numerai_predictDocker container - Deploy via MCP server
The numerai MCP server provides programmatic access to the Numerai Tournament API. If available, agents should use it for tournament operations.
It can be installed via curl through:
curl -sL https://numer.ai/install-mcp.sh | bashThis install script sets Codex CLI up with the MCP configuration as well as configures an environment variable for an MCP API key.
| Tool | Purpose |
|---|---|
check_api_credentials |
Verify API token and scopes |
create_model |
Create new model slots |
upload_model |
Upload pkl files (multi-step workflow) |
get_model_profile |
Query model stats |
get_model_performance |
Get round-by-round performance |
get_leaderboard |
View tournament rankings |
get_tournaments |
List active tournaments |
get_current_round |
Get current round info |
list_datasets |
List available dataset files |
run_diagnostics |
Run diagnostics on predictions |
graphql_query |
Execute custom GraphQL queries |
- 8 = Classic (main stock market tournament)
- 11 = Signals (bring your own data)
- 12 = CryptoSignals (crypto market predictions)
corr20Rep- 20-day rolling correlation score (main metric)mmc20Rep- Meta-model contribution (unique signal)return13Weeks- 13-week return on staked NMRnmrStaked- Amount of NMR staked
MCP tools require a Numerai API token with appropriate scopes:
- Format:
Authorization: Token PUBLIC_ID$SECRET_KEY - If MCP was installed via the above curl command, this would have already likely been configured through the
NUMERAI_MCP_AUTHenvironment variable. - If MCP was not installed via the above curl command, you will need to create an MCP API key by navigating to https://numer.ai/account and clicking "Create MCP Key" and following the instructions in the modal, which are taking the PUBLIC_ID and SECRET_KEY and setting them in an environment variable through:
export NUMERAI_MCP_AUTH="Token PUBLIC_ID\$SECRET_KEY"Since the PUBLIC_ID and SECRET_KEY are separated by a $ character, it likely needs to be escaped when set through the export command.
List account's models:
query { account { models { id name } } }Get default Python runtime:
query { computePickleDockerImages { id name image tag default } }Check pickle validation status:
query {
account {
models {
username
computePickleUpload {
filename validationStatus triggerStatus
triggers { id status statuses { status description insertedAt } }
}
}
}
}1. create_model(name, tournament=8) # Optional: create new model slot
2. upload_model(operation="get_upload_auth") # Get presigned S3 URL
3. PUT file to presigned URL # Upload the pkl file
4. upload_model(operation="create") # Register upload
5. upload_model(operation="list") # Wait for validation
6. upload_model(operation="assign") # Assign to model slot
CRITICAL: Pickle files must be created with a Python version matching Numerai's compute environment to avoid segfaults and binary incompatibility.
# 1. Query default Docker image (via MCP) to get Python version
# Look for default: true, e.g., numerai_predict_py_3_12 = Python 3.12
# 2. Create matching venv with pyenv
PYENV_PY=$(ls -d ~/.pyenv/versions/3.12.* 2>/dev/null | head -1)
$PYENV_PY/bin/python -m venv ./venv
# 3. Activate and install dependencies
source ./venv/bin/activate
pip install numpy pandas cloudpickle scipy lightgbmdocker run -i --rm -v "$PWD:$PWD" \
ghcr.io/numerai/numerai_predict_py_3_12:a78dedd \
--debug --model $PWD/model.pkl- Model-agnostic pipeline:
pipeline.py,numerai_cv.py, and metrics stay generic - Model-specific logic: Lives in configs and
agents/code/modeling/models/wrappers - Reproducibility: All settings captured in config files
- Accurate validation: No early stopping leakage; honest OOF performance estimation
Build datasets with python -m agents.code.data.build_full_datasets:
| File | Description |
|---|---|
numerai/v5.2/full.parquet |
Full training data |
numerai/v5.2/full_benchmark_models.parquet |
Benchmark model predictions |
numerai/v5.2/downsampled_full.parquet |
Every 4th era (fast iteration) |
numerai/v5.2/downsampled_full_benchmark_models.parquet |
Downsampled benchmarks |
- Scout phase: Use downsampled data for quick experiments
- Scale phase: Run best configs on full data for final validation
- Read
numerai/agents/AGENTS.mdfor detailed instructions - Check relevant skills in
numerai/agents/skills/ - Look for existing experiments in
numerai/agents/experiments/ - Use downsampled data for iteration, full data for final runs
- Use the
numerai-model-uploadskill - Verify Python version compatibility first
- Test pickle locally before uploading
- Use MCP server for programmatic deployment
- Start with
hello_numerai.ipynbfor basics - Review
feature_neutralization.ipynbfor feature risk - Check
target_ensemble.ipynbfor ensemble strategies - Use MCP server to query live tournament data
- Run commands from
numerai/(soagentsis importable), or from repo root withPYTHONPATH=numerai - Data lives under
numerai/<data_version>/(e.g.numerai/v5.2/), which is often gitignored locally - Register repo skills:
ln -s $PWD/numerai/agents/skills/* ~/.codex/skills/ - Network access required for MCP operations (Codex CLI may need
--yoloflag) - Always query Python version before creating pkl files
- BMC (Benchmark Model Contribution) is the key experiment metric (proxy for MMC), computed vs official
v52_lgbm_ender20benchmark predictions in*_benchmark_models.parquet - Only Classic tournament (8) supports pickle uploads