Skip to content

Split local benchmark analytics into a standalone generic package (benchkit) #35

@FBumann

Description

@FBumann

TODO (human): why we're doing this, in your words.

Note

Drafted by AI (Claude) from our design discussion; decisions are the maintainers'.

Goal

Split the local-dev analytics (plotting, cross-version sweep, the memray engine, snapshot/compare, the CLI) out of linopy into a standalone, reusable benchmarking package (benchkit, name TBD), so that:

  • linopy stays lean — only the benchmark content (specs, phases, pytest drivers, conftest) + the CodSpeed workflows. (Maintainer ask: "ship CI first, CLI later".)
  • the local CLI survives as pip install benchkit rather than being dropped.
  • the tool is reusable on other repos (cross-repo interest from review).

Status — foundation done (#34, draft)

Two structural changes already landed on the foundation branch, both no behaviour change:

  • Decoupled the CI baseline from analytics — the CodSpeed baseline imports zero of {memory, sweep, plotting, snapshot, bench, cli}; the dependency arrow only points analytics → core.
  • One measurements() contract — phases own phase_cases(phase) -> Iterable[PhaseCase] (each a setup→action→teardown context manager); the pytest drivers and the memray engine consume the same source. Node ids byte-identical; the id-alignment band-aid test is gone.

Target shape — two repos, one CLI

benchkit/                 (new repo — generic, no linopy import)
  core:   Case, measure_peak, run_case, snapshot, plot, compare, sweep, cli   ── knows only Cases
  suite:  Subject(build=…), operation(node=…), iter_params, skip_reason, tiers ── optional convention → Cases

linopy/benchmarks/        (stays in linopy — the "content", thin)
  models/ patterns/   Subject(build=…) via benchkit.suite          ← linopy-specific
  phases.py           @operation(...) wrappers around to_file/…     ← linopy verbs
  test_*.py + conftest pytest drivers (parametrize benchkit cases)  ← stays for CodSpeed
  __main__.py         wires linopy's registry into benchkit.cli
  depends on benchkit

python -m benchmarks … stays the command; linopy's __main__ registers its content into benchkit, then hands off to benchkit's CLI. Same UX, zero tooling code in linopy.

Primitives (the design question)

Primitives live in different layers — measure/analyze are generic; generate is the plugin.

  • The atom is Case, not phase. phase bundles three linopy-specific things (an operation, a pytest node prefix, applicability) and assumes a "pipeline of phases" many repos don't have. Keep it in linopy's layer.

    class Case:
        id:   str                           # stable key (== CodSpeed node id)
        dims: Mapping[str, str | int]       # structured axes for analysis
        run:  () -> ContextManager[Action]  # setup (untimed) → measured action → teardown
  • dims is the primitive that makes analysis generic. Today structure is smuggled into the id and recovered by regex (<spec>-<axis>=<value>). Promote it to an explicit field: plots facet/scale by any dim, no id-parsing; the id stays opaque (CodSpeed's key), dims is separate metadata in the local snapshot.

  • Required plugin contract is just cases() -> Iterable[Case].

  • subject / operation / scale are convention, not core — the "subjects × operations × scale-dial" shape (build-once helper, registry, quick/full tiers) is useful but linopy-shaped, so it lives in benchkit.suite (opt-in), not core.

Injection seams (already isolated by #34)

  1. Phases register into benchkit (@operation(node=…)) instead of the engine importing linopy's phases.py.
  2. The sweep's install spec is a parameter (install_spec=lambda v: f"linopy=={v}") instead of a literal.

Everything else (snapshot, plotting, compare, measure_peak, CLI bodies, the spec dataclasses) is already domain-agnostic — pure move.

CodSpeed unaffected

test_*.py + conftest stay in linopy/benchmarks/, so a linopy PR still runs pytest benchmarks/ --quick --codspeed against the dev linopy — ids and baselines unchanged. benchkit is a dev dependency.

Open questions

  1. Case{id, dims, run} as the atom — promote dims to first-class instead of id-parsing? (lean: yes)
  2. Core-knows-only-Cases with subject/operation as opt-in sugar, or bake the subject/operation registry into core?
  3. Selection (quick/full tiers): core (predicate over a numeric dim) or suite concern? (lean: suite)
  4. Name: benchkit / linopy-bench / xbench / …?

Implementation sequence

  1. Land refactor(benchmarks): decouple CI baseline from analytics + unify measurements() (split prep) #34 (foundation).
  2. Scaffold benchkit locally (sibling package); move the domain-agnostic files verbatim.
  3. Add the two seams (operation registration; install_spec).
  4. Repoint linopy's benchmarks/ at it; prove python -m benchmarks + the node-id diff still hold.
  5. Publish the repo; trim bench: Add internal performance benchmark suite + CodSpeed CI PyPSA/linopy#771 to the CI baseline (deletion last — reversible).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions