Skip to content

Rah9742/Crypto-Stat-Arb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crypto Trading Research Repo

Lightweight Python research repository for building and comparing Binance spot crypto strategies. The repo now covers:

  • data acquisition, cleaning, and return construction
  • pair selection and pairs backtesting
  • multi-asset trend-following backtesting
  • transaction cost modelling with Roll slippage estimates
  • report-ready tables and figures for the chosen final strategies

Current Final Strategies

  • Pairs: AVAXUSDT/ICPUSDT
  • Trend: 4-asset trend strategy on BTCUSDT, ETHUSDT, SOLUSDT, BNBUSDT

The latest cost-adjusted comparison outputs are saved in:

Key Result Note

There are two different performance layers in this repo, and they should not be compared as if they are the same quantity:

Example for the chosen trend strategy:

  • In the parameter sweep, test_cumulative_return = 0.897441, which is the original net test return under the original sweep setup.
  • In the stratergy comparison, gross_return_pct = 107.21% and net_return_pct = 57.84%.

So:

  • 89.74% is the original sweep net test return
  • 107.21% is the stratergy gross test return
  • 57.84% is the stratergy net test return after Roll slippage

These differ because the stratergy stage reruns the chosen strategies with asset-level Roll slippage costs and separates gross from net performance explicitly.

Cost Logic

The transaction cost stage uses the turnover form:

Cost_t = s * sum_i |theta_t^i - theta_{t-1}^i (1 + r_{t-1}^i)|

Implementation details:

  • theta is notional exposure by asset
  • unallocated capital stays in USDT cash
  • profits and losses are reinvested through the live equity state
  • exposure remains constrained by the fixed gross exposure cap
  • Roll slippage is estimated per asset from the negative first-order autocovariance of price changes
  • if the estimated autocovariance is non-negative, the slippage estimate is clipped to zero

The methodology note is saved in:

Report Figures

Combined cost comparison:

Combined cost comparison

Combined gross vs net PnL during the test period:

Combined gross vs net PnL

Roll slippage by asset:

Roll slippage by asset

Pairs sensitivity:

Pairs sensitivity

Trend sensitivity:

Trend sensitivity

Structure

  • config/: YAML configuration for the research pipeline
  • data/raw/binance/: raw Binance OHLCV parquet files
  • data/raw/risk_free/: raw risk-free parquet files
  • data/interim/binance/: cleaned and repaired per-symbol parquet files
  • data/interim/risk_free/: aligned per-step risk-free parquet files
  • data/processed/binance/: per-symbol parquet files with returns and excess returns
  • data/processed/pairs/: pairs selection and chosen-pair backtest outputs
  • data/processed/trend/: trend parameter sweeps and final backtest outputs
  • data/processed/costs/: Roll slippage summaries, comparison tables, and cost sensitivity outputs
  • outputs/figures/prices/: cleaned price charts with repaired bars highlighted
  • outputs/figures/candles/: recent-window candlestick charts
  • outputs/figures/report/: stacked multi-symbol report charts
  • outputs/figures/returns/: return time-series charts
  • outputs/figures/volatility/: rolling-volatility charts
  • outputs/figures/costs/: slippage and cost-adjusted performance figures
  • scripts/: runnable project scripts
  • src/crypto_trader/: research code for config, data, signals, backtests, analysis, and plots
  • tests/: unit tests for accounting, signal logic, and slippage estimation

Setup

Create a virtual environment and install the package in editable mode:

python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -e .

Run The Data Pipeline

After installation, run:

python scripts/run_data_pipeline.py

To regenerate figures only from existing processed parquet files:

python scripts/run_plot_pipeline.py

To plot only one symbol:

python scripts/run_plot_pipeline.py --symbol BTCUSDT

To generate stacked report plots for selected symbols:

python scripts/run_plot_pipeline.py --symbol BTCUSDT --symbol ETHUSDT --symbol BNBUSDT --report

For the dedicated report workflow, with default symbols BTCUSDT, ETHUSDT, and BNBUSDT:

python scripts/run_report_plots.py

To override the default report symbols:

python scripts/run_report_plots.py --symbol BTCUSDT --symbol ETHUSDT --symbol BNBUSDT

The pipeline will:

  • download Binance spot OHLCV data for the configured symbols
  • reuse an exact cached raw Binance parquet instead of downloading again when start_date, end_date, and interval already match an existing raw file
  • save raw Binance parquet files to data/raw/binance/
  • clean, validate, and repair bad bars before saving cleaned parquet files to data/interim/binance/
  • download a short-rate proxy, save it to data/raw/risk_free/, and align it to the asset timestamps in data/interim/risk_free/
  • compute simple returns and lagged-risk-free excess returns
  • save processed parquet files to data/processed/binance/
  • save price, candlestick, return, and rolling-volatility plots to outputs/figures/

Run Strategy Research

Pairs research:

python scripts/run_pairs_research.py

Trend strategy research:

python scripts/run_trend_strategy.py

Trend report plots:

python scripts/run_trend_report_plots.py

Pairs report plot for a saved run:

python scripts/run_pairs_report_plot.py --pair-slug avaxusdt_icpusdt --run-key a00a6c7066b7

Cost analysis and comparison outputs:

python scripts/run_cost_analysis.py

This produces:

  • Roll slippage summary by asset
  • cost-adjusted pair and trend summaries
  • slippage sensitivity tables
  • combined pair vs trend comparison table
  • report-ready cost figures

Data Outputs

  • Raw, interim, and processed parquet files use the naming pattern NAME_YYYY-MM-DD_YYYY-MM-DD_interval.parquet.
  • Figure files use the matching pattern NAME_YYYY-MM-DD_YYYY-MM-DD_interval_plot-name.png.
  • The default plotting style is applied project-wide through Matplotlib rcParams and can still be overridden per figure when needed.
  • Report plots use a consistent per-symbol color mapping across stacked and overlaid comparisons.
  • Report figures default to single-column LaTeX sizing and can be adjusted through plotting.report_single_column_width, plotting.report_stack_panel_height, and plotting.report_overlay_height.
  • Raw parquet files contain standardized Binance spot OHLCV data with UTC timestamps and columns: timestamp, open, high, low, close, volume, symbol.
  • Interim Binance parquet files contain cleaned and repaired OHLCV data after sorting, deduplication, missing-bar handling, OHLC validation, and simple outlier repair, plus a readable datetime_utc column.
  • Interim risk-free parquet files contain the aligned per-step rate, its lagged rf_{t-1} version, and the underlying annualized source fields used to form excess returns.
  • Processed Binance parquet files contain the cleaned OHLCV data plus simple_return, the lagged risk_free_rate, and excess_return.
  • Chosen pair backtests include bar-level PnL, turnover, transaction cost, and equity paths.
  • Final trend backtests include per-asset positions, turnover, trade IDs, transaction costs, and equity paths.
  • Cost outputs include gross and net PnL series, cumulative transaction costs, and sensitivity tables at multiple slippage multipliers.

Excess returns are formed as simple_return_t - rf_{t-1} so that the risk-free rate used at timestamp t is information available before the return over period t is realised.

Risk-Free Rate Note

At mid to high frequency, the per-bar risk-free accrual is usually extremely small relative to crypto return volatility, so it is often ignored in practical intraday research. It is still included here for completeness and methodological correctness, especially when comparing results across sampling frequencies.

Latest Comparison

The current combined comparison file is:

Current headline values:

Strategy Gross PnL (USDT) Net PnL (USDT) Gross Return Net Return Total Transaction Cost
Pairs (AVAX/ICP) 3816.84 2573.95 38.17% 25.74% 1242.89
Trend (BTC, ETC, BNB, SOL) 10720.61 5784.21 107.21% 57.84% 4936.40

Those values come from the rerun with Roll-model slippage and therefore differ from the original sweep files that were used to choose the strategies.

About

Systematic crypto trading research using Binance spot data, focused on cointegration and trend-following strategies with robust data cleaning, excess return construction, and cost-aware evaluation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages