Unified multi-metric memory model: metrics, tagging, selection, comparability

## Why this exists

Several open issues independently propose pieces of the *same* design — more than one
memory metric, a way to record which one produced a number, a way to select it, and a guard
against comparing incomparable ones. Drafted at different times (mostly pre-#32), they overlap
and in one case **conflict**: #9 and #34 each add an "rss" number, with different engines
(psutil polling vs kernel `ru_maxrss`), different JSON tags (`peak_backend` vs `mode`), and
different comparability guards. Left separate they'd ship two different "rss" numbers wearing
one label.

This is the design hub. It owns the shared model; the linked issues become engines/consumers
that plug into it.

## The model

### Metrics (the *what*)

| metric | meaning | unit | precision |
|---|---|---|---|
| `heap` | allocator demand (memray) | bytes | byte-exact |
| `rss` | resident-page high-water | bytes | page/THP-quantized |
| `allocated` | total bytes allocated / count (churn) — see #23 | bytes / count | byte-exact |

These are **different quantities** and must never share a plot axis. `heap` vs `rss` is
allocator-bytes vs resident-pages; see the accuracy caveats in #34.

### Engines (the *how*, platform-dispatched)

A metric can have more than one engine; the engine is a property of the *environment*, not the
test. The result records the **concrete** engine used.

- `heap` → memray (Linux/macOS), in-process or isolated (see #20).
- `rss` → `ru_maxrss` + `fork` on Linux/macOS (#34, default, accurate); **psutil sampling on
  Windows only** (#9, coarse fallback — no `fork`/`ru_maxrss` there). #9's "misses spikes"
  caveat is honest Windows small print, not a rival design.

### One tag (resolves the #9/#34 collision)

A single field in `extra_info["benchmem"]` records what produced the number — `mode` (the
metric) plus the concrete `engine`. Supersedes #9's separate `peak_backend`/`--peakbench-backend`
scheme; everything funnels through one tag.

### One selection surface

- CLI: `--benchmark-memory[=heap|rss]` (optional-value), `--benchmark-memory-repeats=N` (#19).
- Marker: `@pytest.mark.benchmem(mode=..., repeats=...)`.
- Fixture: `benchmark_memory(fn, mode=...)`.

> **Scope (Step-2 decision):** no OOM handling — no `limit=`/cgroup/`RLIMIT_AS`, no `killed`
> field. If a benchmarked action dies or raises under measurement, it **fails like any other
> test**. OOM-survival testing is out of scope.

### One comparability guard

`memory_from_pytest_benchmark` / `load_long_df` carry `mode`+`engine` as facets;
`compare`/`plot` **refuse to stack rows of differing `mode`** (and warn on differing engine).
Supersedes both #9's mixed-backend guard and #34's co-plot refusal.

### Shared infrastructure

- **Subprocess + baseline:** `fork` + `os.wait4` (per-child), `gc.freeze()` before fork.
  Baseline is a **forked no-op child's `ru_maxrss`** (NOT parent current RSS — that
  over-subtracts inherited COW; proven in #34's Step-2 validation). Used by `rss` (#34) and by
  isolated-`heap` (#20).
- **Denoising:** `min`-of-N repeats (#19), surfaced spread.

## Children

- #34 — `rss` engine (posix `ru_maxrss` + fork + baseline). Reference design for the shared infra.
- #9 — Windows engine for `rss` (psutil fallback). Rescoped: drop rival tagging, adopt this model.
- #20 — isolated `heap` mode, reusing the shared subprocess/baseline infra.
- #19 — `repeats=N` denoising knob (shared across metrics).
- #23 — `allocated`/count metric + `--metric allocated`.
- #26 — `--metric both` (time + memory together) in compare/plot.

## Note on naming

#9/#19/#20/#23/#26 predate #32 and reference stale names (`peakbench`, `peak_mib`,
`measure_peak`, `--peakbench-backend`). Refresh to current (`pytest_benchmem`, `peak_bytes`,
`measure_memory`, `extra_info["benchmem"]`) when each is picked up.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unified multi-metric memory model: metrics, tagging, selection, comparability #38

Why this exists

The model

Metrics (the what)

Engines (the how, platform-dispatched)

One tag (resolves the #9/#34 collision)

One selection surface

One comparability guard

Shared infrastructure

Children

Note on naming

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

metric	meaning	unit	precision
`heap`	allocator demand (memray)	bytes	byte-exact
`rss`	resident-page high-water	bytes	page/THP-quantized
`allocated`	total bytes allocated / count (churn) — see #23	bytes / count	byte-exact

Unified multi-metric memory model: metrics, tagging, selection, comparability #38

Description

Why this exists

The model

Metrics (the what)

Engines (the how, platform-dispatched)

One tag (resolves the #9/#34 collision)

One selection surface

One comparability guard

Shared infrastructure

Children

Note on naming

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions