Skip to content

Unified multi-metric memory model: metrics, tagging, selection, comparability #38

@FBumann

Description

@FBumann

Why this exists

Several open issues independently propose pieces of the same design — more than one
memory metric, a way to record which one produced a number, a way to select it, and a guard
against comparing incomparable ones. Drafted at different times (mostly pre-#32), they overlap
and in one case conflict: #9 and #34 each add an "rss" number, with different engines
(psutil polling vs kernel ru_maxrss), different JSON tags (peak_backend vs mode), and
different comparability guards. Left separate they'd ship two different "rss" numbers wearing
one label.

This is the design hub. It owns the shared model; the linked issues become engines/consumers
that plug into it.

The model

Metrics (the what)

metric meaning unit precision
heap allocator demand (memray) bytes byte-exact
rss resident-page high-water bytes page/THP-quantized
allocated total bytes allocated / count (churn) — see #23 bytes / count byte-exact

These are different quantities and must never share a plot axis. heap vs rss is
allocator-bytes vs resident-pages; see the accuracy caveats in #34.

Engines (the how, platform-dispatched)

A metric can have more than one engine; the engine is a property of the environment, not the
test. The result records the concrete engine used.

One tag (resolves the #9/#34 collision)

A single field in extra_info["benchmem"] records what produced the number — mode (the
metric) plus the concrete engine. Supersedes #9's separate peak_backend/--peakbench-backend
scheme; everything funnels through one tag.

One selection surface

Scope (Step-2 decision): no OOM handling — no limit=/cgroup/RLIMIT_AS, no killed
field. If a benchmarked action dies or raises under measurement, it fails like any other
test
. OOM-survival testing is out of scope.

One comparability guard

memory_from_pytest_benchmark / load_long_df carry mode+engine as facets;
compare/plot refuse to stack rows of differing mode (and warn on differing engine).
Supersedes both #9's mixed-backend guard and #34's co-plot refusal.

Shared infrastructure

Children

Note on naming

#9/#19/#20/#23/#26 predate #32 and reference stale names (peakbench, peak_mib,
measure_peak, --peakbench-backend). Refresh to current (pytest_benchmem, peak_bytes,
measure_memory, extra_info["benchmem"]) when each is picked up.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions