fix(validation): restore LEAN case-study benchmark#26
Open
stefan-jansen wants to merge 3 commits into
Open
Conversation
The benchmark's no-cost path set commission_type/slippage_type to NONE and zeroed only the rate, leaving a preset's per-share/minimum commission intact (e.g. zipline_strict carries per_share=0.005, minimum=1.0). Since b16 the broker auto-activates a cost model whenever any direct field is non-zero, so those residual fields silently re-enabled a $1-minimum-per-fill commission — $2 per round-trip, ~$453K over the 250x20yr parity run. That dropped ml4t[zipline_strict] to $266,548 vs the reference zipline's $720,033 (a 63% gap) while backtrader, whose preset carries no per-share commission, stayed at exact parity. Clear every direct commission/slippage field (per_share, per_trade, minimum, fixed, spread, spread_by_asset) in the disable paths so an explicit zero-cost request cannot be reactivated by leftover preset fields. Restores zipline parity to $720,044 vs $720,033 ($11, 0.0016%), matching the validated README figure; backtrader unchanged at $746,797.21. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Restore the case-study LEAN parity harness lost in the validation reorg (the notebook's validation.adapters.* / validation.weights imports were never committed anywhere). Reconstructed in committed form so it can't vanish again. case_study_lean.py runs ml4t-backtest[lean] on real case-study target weights and compares against actual LEAN on the sorted daily fill multiset plus terminal value. Prices are decoded from each chapter16 LEAN workspace's own daily zips so both engines consume byte-identical inputs; the strategy mirrors the workspace main.py exactly, including LEAN's fill-forward (a name dropped from the universe is sized off its last close and liquidated at the next real bar, until delisting). Verified exact parity on all three daily case studies vs the committed LEAN runs: etfs $1,119,459.54 == LEAN, 10,701 fills, multiset match sp500_eqopt $991,275.92 == LEAN, 1,848 fills, multiset match us_equities_panel $1,228,617.38 == LEAN, 35,103 fills, multiset match test_case_study_lean.py asserts this (no Docker; LEAN side is the committed reference). Marked no_invariant_check: the generic equity-decomposition invariant does not model 2x margin, but parity is asserted against LEAN's terminal value. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…rips benchmark_lean reported LEAN's tradeStatistics.totalNumberOfTrades, which counts closed round-trip trades, while every other engine in the suite reports individual fills via len(trades_df). On the canonical 250x20yr benchmark this made LEAN appear to have 226,723 trades vs ml4t's 428,459 fills — an apples-to-oranges gap — when the decoded LEAN fill log in fact contains exactly 428,459 fills, matching ml4t-backtest[lean] at the fill level (terminal-value delta $1.57 / 0.0218 bps). Report the decoded fill count (len of filled order_events) as num_trades so the cross-engine parity surface is consistent. load_lean_artifacts is unchanged (its round-trip default is still a valid general loader); the substitution is local to benchmark_lean. test_lean_adapter passes. Underpins third_edition nb15 (15_lean_engine_parity) genuine parity. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Restores and hardens the LEAN validation/benchmarking surface so cross-engine parity comparisons remain consistent, and adds a LEAN case-study regression test + harness code under ml4t.backtest._validation.
Changes:
- Add helpers to fully disable commission/slippage fields in
validation/benchmark_suite.pyto prevent cost-model auto-activation from preset leftovers. - Adjust LEAN benchmark reporting to treat decoded fills as “trade count” for parity with other engines.
- Add a LEAN case-study parity harness module and a regression test that compares fill multisets + terminal portfolio value.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
validation/benchmark_suite.py |
Centralizes “zero-cost mode” resets and adjusts LEAN trade-count reporting for cross-engine parity. |
tests/benchmark/test_case_study_lean.py |
Adds a regression test intended to validate ml4t’s LEAN reconstruction against committed LEAN artifacts. |
src/ml4t/backtest/_validation/case_study_lean.py |
Implements the case-study harness: workspace parsing, LEAN zip decoding, ml4t replay, and parity comparison utilities. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+2118
to
+2119
| if trades_df is not None and not trades_df.empty: | ||
| num_trades = int(len(trades_df)) |
Comment on lines
+32
to
+35
| workspace_dir = WORKSPACE / project | ||
| if not (workspace_dir / "ml4t_order_events.csv").exists(): | ||
| pytest.skip(f"LEAN reference artifacts missing for {project}") | ||
|
|
Comment on lines
+8
to
+12
| * **LEAN side** -- run the workspace project through actual LEAN (Docker) via | ||
| :func:`run_lean_backtest`; the algorithm logs its fills to ``ml4t_order_events.csv`` | ||
| and equity to ``ml4t_daily_equity.csv``. | ||
| * **ml4t side** -- decode the workspace's own daily zips (so both engines consume | ||
| byte-identical prices), then replay the identical target-weight strategy through |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Validation