Found dogfooding the plotting against linopy's (lean) pytest-benchmark suite — out of the box the dims aren't usable, for two separate reasons.
Context
linopy parametrizes its benchmarks over a spec object and an int, one test function per phase:
@pytest.mark.parametrize(("spec", "n"), params, ids=[spec_param_id(...) for ...])
def test_to_lp(benchmark, spec, n, tmp_path): ...
A real --benchmark-json then contains:
fullname: benchmarks/test_to_lp.py::test_to_lp[basic-n=10]
params: {'spec': "UNSERIALIZABLE[BenchSpec('basic', axis='n', sweep=(10, 250))]", 'n': 10}
extra_info: {}
Since _dims reads only params + scalar extra_info (no id parsing — correct by design), the dims come out as {spec: "UNSERIALIZABLE[…]", n: 10}: a garbage spec, and no phase (it's the function name). A plot_scaling is then one indistinguishable blob. Two concerns:
1. UNSERIALIZABLE[...] param values poison dims
pytest-benchmark stringifies non-JSON params as "UNSERIALIZABLE[<repr>]". As a dim that's worse than useless — it's a long, unreadable, per-object-identical string. Proposal: in _dims, detect values matching UNSERIALIZABLE[...] and drop them (a param that couldn't be serialized isn't a usable analysis axis). Cheap, defensive, clearly right.
2. The test function / module isn't available as a dim
The single most important grouping axis for a multi-operation suite — which operation (build vs to_lp vs to_solver) — lives in the node id's function name, which the dims model deliberately never parses. So there's no way to facet/scale per phase from a single JSON without the producer manually mirroring the function name into extra_info.
Proposal: expose the node's function (and/or module) as a built-in dim — e.g. func / module — derived from fullname. That's not "id parsing for structure"; it's surfacing pytest's own grouping that every suite already has. With it, phase comes for free.
Notes
🤖 Issue drafted by Claude from the linopy integration.
Context
linopy parametrizes its benchmarks over a spec object and an int, one test function per phase:
A real
--benchmark-jsonthen contains:Since
_dimsreads onlyparams+ scalarextra_info(no id parsing — correct by design), the dims come out as{spec: "UNSERIALIZABLE[…]", n: 10}: a garbagespec, and nophase(it's the function name). Aplot_scalingis then one indistinguishable blob. Two concerns:1.
UNSERIALIZABLE[...]param values poison dimspytest-benchmark stringifies non-JSON params as
"UNSERIALIZABLE[<repr>]". As a dim that's worse than useless — it's a long, unreadable, per-object-identical string. Proposal: in_dims, detect values matchingUNSERIALIZABLE[...]and drop them (a param that couldn't be serialized isn't a usable analysis axis). Cheap, defensive, clearly right.2. The test function / module isn't available as a dim
The single most important grouping axis for a multi-operation suite — which operation (build vs to_lp vs to_solver) — lives in the node id's function name, which the dims model deliberately never parses. So there's no way to facet/scale per phase from a single JSON without the producer manually mirroring the function name into
extra_info.Proposal: expose the node's function (and/or module) as a built-in dim — e.g.
func/module— derived fromfullname. That's not "id parsing for structure"; it's surfacing pytest's own grouping that every suite already has. With it,phasecomes for free.Notes
extra_info(linopy will do this) — but these make pytest-benchmem usable out of the box for any pytest-benchmark suite, which is the point.🤖 Issue drafted by Claude from the linopy integration.