Skip to content

fix(validation): restore LEAN case-study benchmark#26

Open
stefan-jansen wants to merge 3 commits into
mainfrom
fix/benchmark-zero-cost-reset
Open

fix(validation): restore LEAN case-study benchmark#26
stefan-jansen wants to merge 3 commits into
mainfrom
fix/benchmark-zero-cost-reset

Conversation

@stefan-jansen

Copy link
Copy Markdown
Contributor

Summary

  • restore the LEAN case-study validation harness
  • clear benchmark cost fields when zero-cost mode is enabled
  • report decoded LEAN fills as trade counts rather than round trips

Validation

  • pre-commit run --all-files
  • uv run pytest tests/benchmark/test_case_study_lean.py -q
  • uv run pytest tests/ -q

stefan-jansen and others added 3 commits June 15, 2026 16:53
The benchmark's no-cost path set commission_type/slippage_type to NONE and
zeroed only the rate, leaving a preset's per-share/minimum commission intact
(e.g. zipline_strict carries per_share=0.005, minimum=1.0). Since b16 the
broker auto-activates a cost model whenever any direct field is non-zero, so
those residual fields silently re-enabled a $1-minimum-per-fill commission —
$2 per round-trip, ~$453K over the 250x20yr parity run. That dropped
ml4t[zipline_strict] to $266,548 vs the reference zipline's $720,033 (a 63%
gap) while backtrader, whose preset carries no per-share commission, stayed at
exact parity.

Clear every direct commission/slippage field (per_share, per_trade, minimum,
fixed, spread, spread_by_asset) in the disable paths so an explicit zero-cost
request cannot be reactivated by leftover preset fields. Restores zipline
parity to $720,044 vs $720,033 ($11, 0.0016%), matching the validated README
figure; backtrader unchanged at $746,797.21.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Restore the case-study LEAN parity harness lost in the validation reorg (the
notebook's validation.adapters.* / validation.weights imports were never
committed anywhere). Reconstructed in committed form so it can't vanish again.

case_study_lean.py runs ml4t-backtest[lean] on real case-study target weights
and compares against actual LEAN on the sorted daily fill multiset plus terminal
value. Prices are decoded from each chapter16 LEAN workspace's own daily zips so
both engines consume byte-identical inputs; the strategy mirrors the workspace
main.py exactly, including LEAN's fill-forward (a name dropped from the universe
is sized off its last close and liquidated at the next real bar, until delisting).

Verified exact parity on all three daily case studies vs the committed LEAN runs:
  etfs               $1,119,459.54 == LEAN, 10,701 fills, multiset match
  sp500_eqopt        $991,275.92   == LEAN,  1,848 fills, multiset match
  us_equities_panel  $1,228,617.38 == LEAN, 35,103 fills, multiset match

test_case_study_lean.py asserts this (no Docker; LEAN side is the committed
reference). Marked no_invariant_check: the generic equity-decomposition invariant
does not model 2x margin, but parity is asserted against LEAN's terminal value.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…rips

benchmark_lean reported LEAN's tradeStatistics.totalNumberOfTrades, which
counts closed round-trip trades, while every other engine in the suite
reports individual fills via len(trades_df). On the canonical 250x20yr
benchmark this made LEAN appear to have 226,723 trades vs ml4t's 428,459
fills — an apples-to-oranges gap — when the decoded LEAN fill log in fact
contains exactly 428,459 fills, matching ml4t-backtest[lean] at the fill
level (terminal-value delta $1.57 / 0.0218 bps).

Report the decoded fill count (len of filled order_events) as num_trades
so the cross-engine parity surface is consistent. load_lean_artifacts is
unchanged (its round-trip default is still a valid general loader); the
substitution is local to benchmark_lean. test_lean_adapter passes.

Underpins third_edition nb15 (15_lean_engine_parity) genuine parity.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 21, 2026 17:13

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Restores and hardens the LEAN validation/benchmarking surface so cross-engine parity comparisons remain consistent, and adds a LEAN case-study regression test + harness code under ml4t.backtest._validation.

Changes:

  • Add helpers to fully disable commission/slippage fields in validation/benchmark_suite.py to prevent cost-model auto-activation from preset leftovers.
  • Adjust LEAN benchmark reporting to treat decoded fills as “trade count” for parity with other engines.
  • Add a LEAN case-study parity harness module and a regression test that compares fill multisets + terminal portfolio value.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
validation/benchmark_suite.py Centralizes “zero-cost mode” resets and adjusts LEAN trade-count reporting for cross-engine parity.
tests/benchmark/test_case_study_lean.py Adds a regression test intended to validate ml4t’s LEAN reconstruction against committed LEAN artifacts.
src/ml4t/backtest/_validation/case_study_lean.py Implements the case-study harness: workspace parsing, LEAN zip decoding, ml4t replay, and parity comparison utilities.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +2118 to +2119
if trades_df is not None and not trades_df.empty:
num_trades = int(len(trades_df))
Comment on lines +32 to +35
workspace_dir = WORKSPACE / project
if not (workspace_dir / "ml4t_order_events.csv").exists():
pytest.skip(f"LEAN reference artifacts missing for {project}")

Comment on lines +8 to +12
* **LEAN side** -- run the workspace project through actual LEAN (Docker) via
:func:`run_lean_backtest`; the algorithm logs its fills to ``ml4t_order_events.csv``
and equity to ``ml4t_daily_equity.csv``.
* **ml4t side** -- decode the workspace's own daily zips (so both engines consume
byte-identical prices), then replay the identical target-weight strategy through
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants