Skip to content

Windows engine for the rss metric (psutil fallback) #9

@FBumann

Description

@FBumann

Note

Rescoped under #38 (unified multi-metric memory model). This is now the Windows
engine for the rss metric
, not a standalone backend. Specifically:


Summary

Add an opt-in RSS-sampling memory backend so the benchmark_memory fixture can produce a (coarse) peak-memory number on Windows, where memray is unavailable. memray stays the default and only backend on Linux/macOS.

Not currently needed (our workloads are Linux/macOS, where memray already captures native heap precisely). Filing to stash the design — it's purely additive / backwards-compatible, so it can be picked up cold if a Windows need ever arises.

Motivation

memray intercepts at the malloc layer, so it sees native allocations — NumPy buffers, Rust/Arrow (polars), C-solver heap (gurobipy). tracemalloc is blind to all of those (would report near-zero for polars/gurobipy), so it is not an acceptable Windows substitute. The only portable way to see native peak on Windows is process RSS sampled in a background thread.

Library Where bulk data lives tracemalloc sees it? memray sees it?
pandas / xarray (NumPy) NumPy buffers
polars Rust/Arrow native heap
gurobipy Gurobi C solver heap
pandas (pyarrow dtypes) Arrow native heap

RSS (via psutil) catches all of it on Windows — coarsely.

Design (decided)

Selection is run-level, not per-test — the backend is a property of the environment, not the benchmark (the same test should use memray on Linux and, if allowed, RSS on Windows; don't bake a backend into test code).

  • pytest option --peakbench-backend={memray|rss|auto}, backed by ini peakbench_backend.
  • Engine takes it explicitly: measure_peak(action, *, backend="memray"). The fixture maps run config → this arg.

Three values, strict default:

  • memray (default) — memray, or the existing clear error. RSS is never produced unless asked. This preserves the trust invariant: an unqualified peak_mib is always memray.
  • rss — force the psutil sampler on any platform (A/B the backends, or a Windows-only job).
  • auto — memray where available, else RSS. Fallback exists but is explicitly opted into, never silent.

On Windows with the default, the memory pass raises (timing still works):

RuntimeError: memory needs memray (Linux/macOS). Set --peakbench-backend=rss for RSS sampling.

Tag every result with the backend used (the concrete one, even under auto):

  • extra_info["peak_backend"] = "memray" | "rss".
  • Reserve peak_backend out of dims (like peak_mib) in snapshot._dims.
  • compare / load_long_df warn (or refuse) on mixed peak_backend — a memray peak and an RSS peak are different measurements wearing the same MiB label; cross-backend diffs are misleading.

RSS sampler sketch

  • Run action() on the main thread; a background thread polls psutil.Process().memory_info().rss at high frequency, tracking the max.
  • Report peak − baseline (baseline = RSS just before the action).
  • Keep the min-of-N repeats convention (RSS is noisy: allocator caching, OS lazy reclaim, page cache).
  • psutil becomes a dependency of this backend only.

Caveats to document: sampling granularity is bounded by the thread switch interval (~5ms default; short peaks can hide between samples — consider lowering sys.setswitchinterval during measurement); RSS includes shared pages and can't attribute; it is a process-level high-water mark, not allocation-precise.

Where it slots in

src/peakbench/memray.py: measure_peak() is the single entry point and _require_memray() the platform gate. Make measure_peak dispatch over backends:

measure_peak(action, repeats, backend="memray")
  ├─ _memray_peak(action)   # today's body
  └─ _rss_peak(action)      # new: psutil sampler thread, max − baseline

The fixture records peak_backend alongside peak_mib.

Acceptance

  • --peakbench-backend option + peakbench_backend ini; default memray (strict error off-platform).
  • _rss_peak + backend dispatch in measure_peak; psutil as backend dep.
  • peak_backend written to extra_info, reserved from dims, surfaced by the readers.
  • mixed-backend guard in compare / load_long_df.
  • Validated on real Windows CI (the actual target — not just RSS on Linux/macOS).
  • Docs: backend selection + the RSS-vs-memray comparability caveat.

Notes

Purely additive — does not change the memray default or any existing API. Safe to defer indefinitely.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions