Lightweight Python research repository for building and comparing Binance spot crypto strategies. The repo now covers:
- data acquisition, cleaning, and return construction
- pair selection and pairs backtesting
- multi-asset trend-following backtesting
- transaction cost modelling with Roll slippage estimates
- report-ready tables and figures for the chosen final strategies
- Pairs:
AVAXUSDT/ICPUSDT - Trend: 4-asset trend strategy on
BTCUSDT,ETHUSDT,SOLUSDT,BNBUSDT
The latest cost-adjusted comparison outputs are saved in:
data/processed/costs/strategy_comparison.csvdata/processed/costs/roll_slippage_summary.csvoutputs/figures/costs/
There are two different performance layers in this repo, and they should not be compared as if they are the same quantity:
- The original strategy sweep files such as
data/processed/trend/strategy/trend_strategy_parameter_sweep.csvreport the strategy-selection results from the original backtest configuration. - The stratergy cost files such as
data/processed/costs/strategy_comparison.csvreport a separate rerun that applies the Roll-model slippage stage and computes both gross and net performance.
Example for the chosen trend strategy:
- In the parameter sweep,
test_cumulative_return = 0.897441, which is the original net test return under the original sweep setup. - In the stratergy comparison,
gross_return_pct = 107.21%andnet_return_pct = 57.84%.
So:
89.74%is the original sweep net test return107.21%is the stratergy gross test return57.84%is the stratergy net test return after Roll slippage
These differ because the stratergy stage reruns the chosen strategies with asset-level Roll slippage costs and separates gross from net performance explicitly.
The transaction cost stage uses the turnover form:
Cost_t = s * sum_i |theta_t^i - theta_{t-1}^i (1 + r_{t-1}^i)|
Implementation details:
thetais notional exposure by asset- unallocated capital stays in
USDTcash - profits and losses are reinvested through the live equity state
- exposure remains constrained by the fixed gross exposure cap
- Roll slippage is estimated per asset from the negative first-order autocovariance of price changes
- if the estimated autocovariance is non-negative, the slippage estimate is clipped to zero
The methodology note is saved in:
Combined cost comparison:
Combined gross vs net PnL during the test period:
Roll slippage by asset:
Pairs sensitivity:
Trend sensitivity:
config/: YAML configuration for the research pipelinedata/raw/binance/: raw Binance OHLCV parquet filesdata/raw/risk_free/: raw risk-free parquet filesdata/interim/binance/: cleaned and repaired per-symbol parquet filesdata/interim/risk_free/: aligned per-step risk-free parquet filesdata/processed/binance/: per-symbol parquet files with returns and excess returnsdata/processed/pairs/: pairs selection and chosen-pair backtest outputsdata/processed/trend/: trend parameter sweeps and final backtest outputsdata/processed/costs/: Roll slippage summaries, comparison tables, and cost sensitivity outputsoutputs/figures/prices/: cleaned price charts with repaired bars highlightedoutputs/figures/candles/: recent-window candlestick chartsoutputs/figures/report/: stacked multi-symbol report chartsoutputs/figures/returns/: return time-series chartsoutputs/figures/volatility/: rolling-volatility chartsoutputs/figures/costs/: slippage and cost-adjusted performance figuresscripts/: runnable project scriptssrc/crypto_trader/: research code for config, data, signals, backtests, analysis, and plotstests/: unit tests for accounting, signal logic, and slippage estimation
Create a virtual environment and install the package in editable mode:
python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -e .After installation, run:
python scripts/run_data_pipeline.pyTo regenerate figures only from existing processed parquet files:
python scripts/run_plot_pipeline.pyTo plot only one symbol:
python scripts/run_plot_pipeline.py --symbol BTCUSDTTo generate stacked report plots for selected symbols:
python scripts/run_plot_pipeline.py --symbol BTCUSDT --symbol ETHUSDT --symbol BNBUSDT --reportFor the dedicated report workflow, with default symbols BTCUSDT, ETHUSDT, and BNBUSDT:
python scripts/run_report_plots.pyTo override the default report symbols:
python scripts/run_report_plots.py --symbol BTCUSDT --symbol ETHUSDT --symbol BNBUSDTThe pipeline will:
- download Binance spot OHLCV data for the configured symbols
- reuse an exact cached raw Binance parquet instead of downloading again when
start_date,end_date, andintervalalready match an existing raw file - save raw Binance parquet files to
data/raw/binance/ - clean, validate, and repair bad bars before saving cleaned parquet files to
data/interim/binance/ - download a short-rate proxy, save it to
data/raw/risk_free/, and align it to the asset timestamps indata/interim/risk_free/ - compute simple returns and lagged-risk-free excess returns
- save processed parquet files to
data/processed/binance/ - save price, candlestick, return, and rolling-volatility plots to
outputs/figures/
Pairs research:
python scripts/run_pairs_research.pyTrend strategy research:
python scripts/run_trend_strategy.pyTrend report plots:
python scripts/run_trend_report_plots.pyPairs report plot for a saved run:
python scripts/run_pairs_report_plot.py --pair-slug avaxusdt_icpusdt --run-key a00a6c7066b7Cost analysis and comparison outputs:
python scripts/run_cost_analysis.pyThis produces:
- Roll slippage summary by asset
- cost-adjusted pair and trend summaries
- slippage sensitivity tables
- combined pair vs trend comparison table
- report-ready cost figures
- Raw, interim, and processed parquet files use the naming pattern
NAME_YYYY-MM-DD_YYYY-MM-DD_interval.parquet. - Figure files use the matching pattern
NAME_YYYY-MM-DD_YYYY-MM-DD_interval_plot-name.png. - The default plotting style is applied project-wide through Matplotlib
rcParamsand can still be overridden per figure when needed. - Report plots use a consistent per-symbol color mapping across stacked and overlaid comparisons.
- Report figures default to single-column LaTeX sizing and can be adjusted through
plotting.report_single_column_width,plotting.report_stack_panel_height, andplotting.report_overlay_height. - Raw parquet files contain standardized Binance spot OHLCV data with UTC timestamps and columns:
timestamp,open,high,low,close,volume,symbol. - Interim Binance parquet files contain cleaned and repaired OHLCV data after sorting, deduplication, missing-bar handling, OHLC validation, and simple outlier repair, plus a readable
datetime_utccolumn. - Interim risk-free parquet files contain the aligned per-step rate, its lagged
rf_{t-1}version, and the underlying annualized source fields used to form excess returns. - Processed Binance parquet files contain the cleaned OHLCV data plus
simple_return, the laggedrisk_free_rate, andexcess_return. - Chosen pair backtests include bar-level PnL, turnover, transaction cost, and equity paths.
- Final trend backtests include per-asset positions, turnover, trade IDs, transaction costs, and equity paths.
- Cost outputs include gross and net PnL series, cumulative transaction costs, and sensitivity tables at multiple slippage multipliers.
Excess returns are formed as simple_return_t - rf_{t-1} so that the risk-free rate used at timestamp t is information available before the return over period t is realised.
At mid to high frequency, the per-bar risk-free accrual is usually extremely small relative to crypto return volatility, so it is often ignored in practical intraday research. It is still included here for completeness and methodological correctness, especially when comparing results across sampling frequencies.
The current combined comparison file is:
Current headline values:
| Strategy | Gross PnL (USDT) | Net PnL (USDT) | Gross Return | Net Return | Total Transaction Cost |
|---|---|---|---|---|---|
Pairs (AVAX/ICP) |
3816.84 | 2573.95 | 38.17% | 25.74% | 1242.89 |
Trend (BTC, ETC, BNB, SOL) |
10720.61 | 5784.21 | 107.21% | 57.84% | 4936.40 |
Those values come from the rerun with Roll-model slippage and therefore differ from the original sweep files that were used to choose the strategies.




