Author: Rishabh Patil
Quant Alpha Factory is a production-style quantitative research platform built around Microsoft Qlib and Microsoft RD-Agent workflows. It is designed to show the engineering discipline behind credible alpha research: data validation, reproducible experiments, leakage controls, cost-aware portfolio analysis, controlled agent integration, and artifact-backed reporting.
This is not a notebook backtest. It is a tested research platform skeleton that can run without live market data or LLM credentials, while still supporting real Qlib execution and opt-in RD-Agent workflows when users provide their own environment and keys.
For the detailed live-integration setup path, see docs/real_execution.md.
- Production-grade Python package structure with typed, composable modules.
- Validated CSV/Parquet ingestion for daily OHLCV data.
- Qlib-compatible data conversion with a native file-storage fallback for current
pyqlib. - Real Qlib baseline workflow execution through controlled
qrunwrappers. - RD-Agent command wrappers for
health_check,fin_factor,fin_model,fin_quant,fin_factor_report, andui. - Experiment manifests with config hashes, data hashes, git commit when available, date splits, costs, metrics, artifact paths, command lineage, status, and failure reason.
- Research controls for date splits, fit/test overlap, leakage patterns, and missing transaction cost assumptions.
- Factor registry with economic rationale, leakage notes, Qlib expressions, and IC/rank IC helpers.
- Cost-aware portfolio metrics, turnover, drawdown, benchmark metadata, and Markdown tearsheets.
- Optional FastAPI and Streamlit readers over stored artifacts.
- Docker, Compose, GitHub Actions CI, and Makefile targets for repeatable local workflows.
No live tradable performance is claimed. Bundled metrics and demo outputs use deterministic synthetic data and exist to validate the workflow.
Quant research infrastructure fails when results are not reproducible, data quality assumptions are implicit, leakage checks are missing, or backtests are ranked by gross returns alone. This project focuses on those controls first. It is built to communicate that alpha research should be:
- reproducible from manifests,
- validated before modeling,
- separated into train, validation, and test periods,
- evaluated after costs,
- reviewed before agent-generated ideas are promoted,
- reported from artifacts rather than memory or screenshots.
The no-secret path works with synthetic data and does not require Qlib datasets, live market data, RD-Agent credentials, or internet-dependent tests.
python -m venv .venv
source .venv/bin/activate
make install
make quickstartmake quickstart runs lint, tests, CLI status, sample data validation, research checks, Qlib demo
dry-run, RD-Agent dry-run commands, and the deterministic synthetic demo.
After installation, the console script is also available:
qaf status
qaf doctor --component all --allow-missing-llm --skip-docker-daemon
qaf demo synthetic
qaf qlib demo --dry-runBefore publishing or handing off the repo, run:
make validate-releaseReal Qlib execution is opt-in. The local validation path has been tested with Python 3.11,
pyqlib, LightGBM, synthetic Qlib file storage, and Qlib/MLflow artifacts.
make install-real
make qlib-demo-realEquivalent explicit commands:
python -m src.cli doctor --component qlib --strict
python -m src.cli qlib demo --executeThe Qlib demo validates data/sample/prices.csv, converts it into Qlib-compatible storage, runs
research checks, executes qrun, parses Qlib/MLflow metrics, and writes a project manifest under
artifacts/experiments/<experiment-id>/manifest.json.
Metrics from this command are synthetic workflow-validation metrics, not live-market claims.
RD-Agent real workflows require user-provided LLM credentials. Users can still verify installation and Docker readiness without secrets:
DS_CODER_COSTEER_ENV_TYPE=docker \
python -m src.cli doctor --component rdagent --allow-missing-llm --strictDry-run commands do not call an LLM:
python -m src.cli rdagent health --dry-run
python -m src.cli rdagent run --mode fin_factor --loop-n 1 --dry-runReal agent execution remains explicit:
python -m src.cli rdagent run --mode fin_factor --loop-n 1 --executeGenerated hypotheses are not promoted automatically. The repository includes a human review layer for rationale, risk notes, reviewer decision, and promotion status.
make install
make quickstart
make validate-release
make install-real
make qlib-demo-real
make docker-build
make docker-testpython -m src.cli data validate --input data/sample/prices.csv
python -m src.cli data convert --input data/sample/prices.csv --output data/qlib_bin/sample --dry-run
python -m src.cli research check --config configs/qlib/baseline_lightgbm_alpha158.yaml
python -m src.cli qlib demo --dry-run
python -m src.cli experiments list
python -m src.cli experiments leaderboard --metric net_return
python -m src.cli report build --experiment-id synthetic-demo| Area | What is implemented |
|---|---|
| Data | CSV/Parquet loading, schema normalization, OHLCV validation, duplicate/date checks |
| Qlib | Conversion, native file storage fallback, qrun wrapper, real synthetic demo, metric parsing |
| RD-Agent | Safe command construction, dry-run/execute manager, logs, manifests, human review |
| Experiments | JSON manifests, hashes, artifacts, failure recording, leaderboard |
| Validation | Split checks, leakage heuristics, fit/test overlap checks, cost assumption checks |
| Factors | Versioned registry, baseline factor families, IC/rank IC evaluation |
| Portfolio | Top-K, turnover constraint, cost adjustment, drawdown, Sharpe/IR, tearsheets |
| Reporting | Markdown reports from manifests, explicit missing-metric handling |
| Interfaces | Typer CLI, optional FastAPI, optional Streamlit dashboard |
| MLOps | Makefile, Dockerfile, Compose, GitHub Actions, no-secret CI |
flowchart LR
A[CSV / Parquet Data] --> B[src.data validation]
B --> C[Qlib conversion]
C --> D[Qlib provider URI]
D --> E[Qlib qrun baseline]
E --> F[Qlib / MLflow artifacts]
F --> G[Experiment manifest]
H[RD-Agent adapter] --> G
I[Factor registry] --> E
J[Research checks] --> E
G --> K[Leaderboard]
G --> L[Markdown reports]
G --> M[API / Dashboard]
Qlib and RD-Agent are integration targets, not hidden global dependencies. CI and local tests use synthetic data and mocked/dry-run execution paths.
Latest local validation:
ruff: All checks passed
pytest: 160 passed
make quickstart: passed with Python 3.11
The suite covers data validation, Qlib command construction and demo orchestration, Qlib config checks, RD-Agent command construction, no-LLM readiness checks, failed-run manifests, leakage checks, factor registry behavior, experiment manifests, leaderboard sorting, reporting, API/dashboard readers, Docker/CI contract files, and the synthetic end-to-end demo.
Real and validated locally:
- Python 3.11 project environment.
- Qlib installation and
qrunexecution on deterministic synthetic data. - Qlib/MLflow artifact parsing into project manifests.
- RD-Agent package installation and Docker readiness checks without LLM keys.
Dry-run by default:
- Qlib conversion and qrun commands.
- RD-Agent health and finance workflow commands.
- API/dashboard serving commands.
Synthetic:
data/sample/prices.csv.python -m src.cli demo synthetic.python -m src.cli qlib demo --execute.- Any bundled demo metrics or generated reports.
Requires user setup:
- Real market research requires licensed or user-provided market data.
- Real RD-Agent workflows require user-provided LLM provider credentials.
- Docker must be running for RD-Agent isolated execution.
Important files:
configs/qlib/baseline_lightgbm_alpha158.yaml
configs/qlib/conversion_sample.yaml
.env.example
pyproject.toml
Makefile
Environment variables for RD-Agent are documented in .env.example and
docs/real_execution.md. Do not commit .env.
The optional API and dashboard read stored experiment manifests and leaderboard rows. They do not recompute experiments by default.
python -m src.cli api serve --dry-run
python -m src.cli dashboard run --dry-runInstall optional dependencies with:
python -m pip install -e ".[api,dashboard]"docker build -t quant-alpha-factory:local .
docker run --rm quant-alpha-factory:local python -m pytest
docker compose run --rm quant-alpha-factory python -m src.cli --helpThe default Docker image installs developer, API, and dashboard dependencies. Real Qlib and RD-Agent dependencies remain opt-in to keep the default image practical.
- Synthetic metrics are not investment claims.
- No licensed market dataset is bundled.
- RD-Agent live execution is intentionally gated behind user credentials.
- The dashboard is intentionally lightweight and reads stored artifacts only.
- The transaction cost model is simple and should be expanded for venue-specific assumptions.
- Qlib/MLflow artifact parsing is conservative and should be extended as more real recorder outputs are observed.
- Architecture
- Methodology
- Leakage Policy
- Real Execution Guide
- Recruiter Summary
- Research Memo Template
- Project Walkthrough
- Limitations
For a quick technical review:
make install
make quickstart
python -m src.cli experiments leaderboard --metric net_return
python -m src.cli report build --experiment-id synthetic-demoFor Qlib integration review:
make install-real
make qlib-demo-realFor RD-Agent integration review without keys:
DS_CODER_COSTEER_ENV_TYPE=docker \
python -m src.cli doctor --component rdagent --allow-missing-llm --strict