cross-signal-data

The labeled Polymarket crash-recovery dataset behind a 80.2% win-rate live trading bot.

308 closed trades. Real Polymarket markets. Real entry triggers. Real outcomes. Public for anyone who wants to build their own mean-reversion bot, replicate our results, or prove us wrong.

What's in here

A single CSV (data/crashes_v1.csv) with one row per closed trade on Polymarket where the crash-recovery bot entered. Each row has:

The market (public Polymarket market_id and question text)
The signal (pre_crash_high, entry_price, drop_pct)
The outcome (exit_price, exit_reason, pnl_usd, is_profitable)
Time features (entry_hour_utc, entry_dow, hold_hours)

Stat	Value
Total trades	308
Profitable	247 (80.2%)
Date range	March 2026 – April 2026
Median hold	~3 hours
Avg drop_pct at entry	~22%
Avg recovered_to_pct_of_high	~85%

Exit reason	Count
RECOVERY (price came back)	235
TIMEOUT_48H (held 48h, exited)	62
TIMEOUT (early TIMEOUT exit)	11

Why this exists

Most prediction-market datasets are either:

Synthetic (generated for academic papers, no real money behind them), or
Aggregate (volume, liquidity at hourly resolution — useless for tactical signals)

This is neither. It's the actual labeled examples of a single specific signal — Polymarket markets that crashed N% from a recent high — paired with the actual outcome of trading the recovery. If you want to study whether mean-reversion works on prediction markets, this is the data.

Install

pip install cross-signal-data

Quick use (Python)

from cross_signal_data import load

df = load()
print(df.shape)              # (308, 19)
print(df.columns.tolist())   # full list of fields

# Filter to RECOVERY-only trades
recovered = df[df["exit_reason"] == "RECOVERY"]

# What entry-price bucket has the best win rate?
buckets = df.groupby(df["entry_price"].round(2)).agg(
    n=("trade_id", "count"),
    win_rate=("is_profitable", "mean"),
)
print(buckets)

If you don't have pandas:

from cross_signal_data import load
rows = load(as_pandas=False)  # list of dicts
print(len(rows), rows[0])

Quick use (any language)

The file is plain CSV. Just download it:

curl -o crashes_v1.csv https://raw.githubusercontent.com/LuciferForge/cross-signal-data/main/data/crashes_v1.csv

Schema

See docs/schema.md for full column-by-column documentation.

Key columns:

entry_price — the price-per-share when the bot entered (0–1)
pre_crash_high — the recent local-window high
drop_pct — (pre_crash_high − entry_price) / pre_crash_high × 100
exit_reason — RECOVERY, TIMEOUT_48H, TIMEOUT, or STOP
is_profitable — 1 if pnl_usd > 0 else 0
recovered_to_pct_of_high — exit_price / pre_crash_high × 100

Methodology

See docs/methodology.md for:

How the crash signal is defined
Entry/exit rules
Known biases (survivorship: only triggers that fired are recorded; a different threshold might surface different examples)
What's NOT in the data (slippage cost — see pnl-truthteller for the slippage layer)

Reproducibility

The script that generated this dataset is in scripts/extract.py. Anyone with the source positions.json from the bot can rerun it:

python scripts/extract.py \
    --positions /path/to/positions.json \
    --output data/crashes_v1.csv

Baseline notebook

notebooks/baseline_model.py trains a logistic regression and random forest on the dataset to predict is_profitable.

Result: ~79.9% cross-validated accuracy with simple features — essentially matching the bot's 80.2% WR. Translation: most of the alpha is in the entry trigger itself (which already filters to high-WR setups), not in further feature engineering. If you want to beat this dataset, you almost certainly need features the bot doesn't currently log (orderbook depth, market category, time-to-resolution).

Top feature importances from the random forest:

Feature	Importance
`drop_pct`	0.254
`shares`	0.200
`entry_price`	0.174
`pre_crash_high`	0.171
`entry_hour_utc`	0.110
`entry_dow`	0.059

A clean, exploitable insight from the diurnal column: win rate at hours 16, 21, 22 UTC reaches ~100% (small samples though); hour 8 UTC dips to ~55%. Off-peak hours are punishing. Adjust your live-firing schedule accordingly.

pip install cross-signal-data[ml]
python notebooks/baseline_model.py

Versioning

Version	Date	Trades	Notes
v1	2026-04-28	308	Initial public release

Future versions will add more trades, more features (orderbook depth at entry, market category, time-to-resolution) and possibly per-market metadata. Pin to a specific version if reproducibility matters: load(version="v1").

License

Code: MIT. Use the loader, the extraction script, and the baseline notebook however you want.

Data: MIT. Public on-chain prediction market data, transformed into a labeled dataset. Cite if you use it in research.

Citation

@dataset{cross_signal_data_2026,
    title  = {cross-signal-data: Polymarket crash-recovery labeled dataset},
    author = {LuciferForge},
    year   = {2026},
    url    = {https://github.com/LuciferForge/cross-signal-data}
}

About the author

Built by LuciferForge, running a public-audited Polymarket crash bot (308 closed trades, 80.2% WR, all data here). Also runs:

polymarket-mcp — MCP server for live Polymarket data
pnl-truthteller — slippage audit tool
polymarket-v2-migration — V1→V2 cookbook
protodex.io — public MCP-server index

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
data		data
docs		docs
notebooks		notebooks
scripts		scripts
src/cross_signal_data		src/cross_signal_data
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
HF_UPLOAD_GUIDE.md		HF_UPLOAD_GUIDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cross-signal-data

What's in here

Why this exists

Install

Quick use (Python)

Quick use (any language)

Schema

Methodology

Reproducibility

Baseline notebook

Versioning

License

Citation

About the author

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

cross-signal-data

What's in here

Why this exists

Install

Quick use (Python)

Quick use (any language)

Schema

Methodology

Reproducibility

Baseline notebook

Versioning

License

Citation

About the author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages