You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TODO (human): why we're doing this, in your words.
Note
Drafted by AI (Claude) from our design discussion; decisions are the maintainers'.
Goal
Split the local-dev analytics (plotting, cross-version sweep, the memray engine, snapshot/compare, the CLI) out of linopy into a standalone, reusable benchmarking package (benchkit, name TBD), so that:
linopy stays lean — only the benchmark content (specs, phases, pytest drivers, conftest) + the CodSpeed workflows. (Maintainer ask: "ship CI first, CLI later".)
the local CLI survives as pip install benchkit rather than being dropped.
the tool is reusable on other repos (cross-repo interest from review).
Two structural changes already landed on the foundation branch, both no behaviour change:
Decoupled the CI baseline from analytics — the CodSpeed baseline imports zero of {memory, sweep, plotting, snapshot, bench, cli}; the dependency arrow only points analytics → core.
One measurements() contract — phases own phase_cases(phase) -> Iterable[PhaseCase] (each a setup→action→teardown context manager); the pytest drivers and the memray engine consume the same source. Node ids byte-identical; the id-alignment band-aid test is gone.
Target shape — two repos, one CLI
benchkit/ (new repo — generic, no linopy import)
core: Case, measure_peak, run_case, snapshot, plot, compare, sweep, cli ── knows only Cases
suite: Subject(build=…), operation(node=…), iter_params, skip_reason, tiers ── optional convention → Cases
linopy/benchmarks/ (stays in linopy — the "content", thin)
models/ patterns/ Subject(build=…) via benchkit.suite ← linopy-specific
phases.py @operation(...) wrappers around to_file/… ← linopy verbs
test_*.py + conftest pytest drivers (parametrize benchkit cases) ← stays for CodSpeed
__main__.py wires linopy's registry into benchkit.cli
depends on benchkit
python -m benchmarks … stays the command; linopy's __main__ registers its content into benchkit, then hands off to benchkit's CLI. Same UX, zero tooling code in linopy.
Primitives (the design question)
Primitives live in different layers — measure/analyze are generic; generate is the plugin.
The atom is Case, not phase.phase bundles three linopy-specific things (an operation, a pytest node prefix, applicability) and assumes a "pipeline of phases" many repos don't have. Keep it in linopy's layer.
dims is the primitive that makes analysis generic. Today structure is smuggled into the id and recovered by regex (<spec>-<axis>=<value>). Promote it to an explicit field: plots facet/scale by any dim, no id-parsing; the id stays opaque (CodSpeed's key), dims is separate metadata in the local snapshot.
Required plugin contract is just cases() -> Iterable[Case].
subject / operation / scale are convention, not core — the "subjects × operations × scale-dial" shape (build-once helper, registry, quick/full tiers) is useful but linopy-shaped, so it lives in benchkit.suite (opt-in), not core.
Phases register into benchkit (@operation(node=…)) instead of the engine importing linopy's phases.py.
The sweep's install spec is a parameter (install_spec=lambda v: f"linopy=={v}") instead of a literal.
Everything else (snapshot, plotting, compare, measure_peak, CLI bodies, the spec dataclasses) is already domain-agnostic — pure move.
CodSpeed unaffected
test_*.py + conftest stay in linopy/benchmarks/, so a linopy PR still runs pytest benchmarks/ --quick --codspeed against the dev linopy — ids and baselines unchanged. benchkit is a dev dependency.
Open questions
Case{id, dims, run} as the atom — promote dims to first-class instead of id-parsing? (lean: yes)
Core-knows-only-Cases with subject/operation as opt-in sugar, or bake the subject/operation registry into core?
Selection (quick/full tiers): core (predicate over a numeric dim) or suite concern? (lean: suite)
TODO (human): why we're doing this, in your words.
Note
Drafted by AI (Claude) from our design discussion; decisions are the maintainers'.
Goal
Split the local-dev analytics (plotting, cross-version sweep, the memray engine, snapshot/compare, the CLI) out of linopy into a standalone, reusable benchmarking package (
benchkit, name TBD), so that:pip install benchkitrather than being dropped.Status — foundation done (#34, draft)
Two structural changes already landed on the foundation branch, both no behaviour change:
{memory, sweep, plotting, snapshot, bench, cli}; the dependency arrow only points analytics → core.measurements()contract — phases ownphase_cases(phase) -> Iterable[PhaseCase](each a setup→action→teardown context manager); the pytest drivers and the memray engine consume the same source. Node ids byte-identical; the id-alignment band-aid test is gone.Target shape — two repos, one CLI
python -m benchmarks …stays the command; linopy's__main__registers its content into benchkit, then hands off to benchkit's CLI. Same UX, zero tooling code in linopy.Primitives (the design question)
Primitives live in different layers —
measure/analyzeare generic;generateis the plugin.The atom is
Case, notphase.phasebundles three linopy-specific things (an operation, a pytest node prefix, applicability) and assumes a "pipeline of phases" many repos don't have. Keep it in linopy's layer.dimsis the primitive that makes analysis generic. Today structure is smuggled into the id and recovered by regex (<spec>-<axis>=<value>). Promote it to an explicit field: plots facet/scale by any dim, no id-parsing; the id stays opaque (CodSpeed's key),dimsis separate metadata in the local snapshot.Required plugin contract is just
cases() -> Iterable[Case].subject/operation/scaleare convention, not core — the "subjects × operations × scale-dial" shape (build-once helper, registry, quick/full tiers) is useful but linopy-shaped, so it lives inbenchkit.suite(opt-in), not core.Injection seams (already isolated by #34)
@operation(node=…)) instead of the engine importing linopy'sphases.py.install_spec=lambda v: f"linopy=={v}") instead of a literal.Everything else (
snapshot,plotting,compare,measure_peak, CLI bodies, the spec dataclasses) is already domain-agnostic — pure move.CodSpeed unaffected
test_*.py+confteststay inlinopy/benchmarks/, so a linopy PR still runspytest benchmarks/ --quick --codspeedagainst the dev linopy — ids and baselines unchanged. benchkit is a dev dependency.Open questions
Case{id, dims, run}as the atom — promotedimsto first-class instead of id-parsing? (lean: yes)benchkit/linopy-bench/xbench/ …?Implementation sequence
benchkitlocally (sibling package); move the domain-agnostic files verbatim.operationregistration;install_spec).benchmarks/at it; provepython -m benchmarks+ the node-id diff still hold.