fix(validation): restore LEAN case-study benchmark by stefan-jansen · Pull Request #26 · ml4t/backtest

stefan-jansen · 2026-06-21T17:13:52Z

Summary

restore the LEAN case-study validation harness
clear benchmark cost fields when zero-cost mode is enabled
report decoded LEAN fills as trade counts rather than round trips

Validation

pre-commit run --all-files
uv run pytest tests/benchmark/test_case_study_lean.py -q
uv run pytest tests/ -q

The benchmark's no-cost path set commission_type/slippage_type to NONE and zeroed only the rate, leaving a preset's per-share/minimum commission intact (e.g. zipline_strict carries per_share=0.005, minimum=1.0). Since b16 the broker auto-activates a cost model whenever any direct field is non-zero, so those residual fields silently re-enabled a $1-minimum-per-fill commission — $2 per round-trip, ~$453K over the 250x20yr parity run. That dropped ml4t[zipline_strict] to $266,548 vs the reference zipline's $720,033 (a 63% gap) while backtrader, whose preset carries no per-share commission, stayed at exact parity. Clear every direct commission/slippage field (per_share, per_trade, minimum, fixed, spread, spread_by_asset) in the disable paths so an explicit zero-cost request cannot be reactivated by leftover preset fields. Restores zipline parity to $720,044 vs $720,033 ($11, 0.0016%), matching the validated README figure; backtrader unchanged at $746,797.21. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Restore the case-study LEAN parity harness lost in the validation reorg (the notebook's validation.adapters.* / validation.weights imports were never committed anywhere). Reconstructed in committed form so it can't vanish again. case_study_lean.py runs ml4t-backtest[lean] on real case-study target weights and compares against actual LEAN on the sorted daily fill multiset plus terminal value. Prices are decoded from each chapter16 LEAN workspace's own daily zips so both engines consume byte-identical inputs; the strategy mirrors the workspace main.py exactly, including LEAN's fill-forward (a name dropped from the universe is sized off its last close and liquidated at the next real bar, until delisting). Verified exact parity on all three daily case studies vs the committed LEAN runs: etfs $1,119,459.54 == LEAN, 10,701 fills, multiset match sp500_eqopt $991,275.92 == LEAN, 1,848 fills, multiset match us_equities_panel $1,228,617.38 == LEAN, 35,103 fills, multiset match test_case_study_lean.py asserts this (no Docker; LEAN side is the committed reference). Marked no_invariant_check: the generic equity-decomposition invariant does not model 2x margin, but parity is asserted against LEAN's terminal value. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…rips benchmark_lean reported LEAN's tradeStatistics.totalNumberOfTrades, which counts closed round-trip trades, while every other engine in the suite reports individual fills via len(trades_df). On the canonical 250x20yr benchmark this made LEAN appear to have 226,723 trades vs ml4t's 428,459 fills — an apples-to-oranges gap — when the decoded LEAN fill log in fact contains exactly 428,459 fills, matching ml4t-backtest[lean] at the fill level (terminal-value delta $1.57 / 0.0218 bps). Report the decoded fill count (len of filled order_events) as num_trades so the cross-engine parity surface is consistent. load_lean_artifacts is unchanged (its round-trip default is still a valid general loader); the substitution is local to benchmark_lean. test_lean_adapter passes. Underpins third_edition nb15 (15_lean_engine_parity) genuine parity. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copilot

Pull request overview

Restores and hardens the LEAN validation/benchmarking surface so cross-engine parity comparisons remain consistent, and adds a LEAN case-study regression test + harness code under ml4t.backtest._validation.

Changes:

Add helpers to fully disable commission/slippage fields in validation/benchmark_suite.py to prevent cost-model auto-activation from preset leftovers.
Adjust LEAN benchmark reporting to treat decoded fills as “trade count” for parity with other engines.
Add a LEAN case-study parity harness module and a regression test that compares fill multisets + terminal portfolio value.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`validation/benchmark_suite.py`	Centralizes “zero-cost mode” resets and adjusts LEAN trade-count reporting for cross-engine parity.
`tests/benchmark/test_case_study_lean.py`	Adds a regression test intended to validate ml4t’s LEAN reconstruction against committed LEAN artifacts.
`src/ml4t/backtest/_validation/case_study_lean.py`	Implements the case-study harness: workspace parsing, LEAN zip decoding, ml4t replay, and parity comparison utilities.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    if trades_df is not None and not trades_df.empty:
+        num_trades = int(len(trades_df))


+    workspace_dir = WORKSPACE / project
+    if not (workspace_dir / "ml4t_order_events.csv").exists():
+        pytest.skip(f"LEAN reference artifacts missing for {project}")
+


+* **LEAN side** -- run the workspace project through actual LEAN (Docker) via
+  :func:`run_lean_backtest`; the algorithm logs its fills to ``ml4t_order_events.csv``
+  and equity to ``ml4t_daily_equity.csv``.
+* **ml4t side** -- decode the workspace's own daily zips (so both engines consume
+  byte-identical prices), then replay the identical target-weight strategy through


stefan-jansen and others added 3 commits June 15, 2026 16:53

Copilot AI review requested due to automatic review settings June 21, 2026 17:13

Copilot started reviewing on behalf of stefan-jansen June 21, 2026 17:14 View session

Copilot AI reviewed Jun 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(validation): restore LEAN case-study benchmark#26

fix(validation): restore LEAN case-study benchmark#26
stefan-jansen wants to merge 3 commits into
mainfrom
fix/benchmark-zero-cost-reset

stefan-jansen commented Jun 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if trades_df is not None and not trades_df.empty:
		num_trades = int(len(trades_df))

Conversation

stefan-jansen commented Jun 21, 2026

Summary

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants