You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The "RSS misses spikes" caveat below is therefore Windows small print, not the general rss behavior.
Original design preserved below for the psutil sketch + Windows-CI acceptance.
Summary
Add an opt-in RSS-sampling memory backend so the benchmark_memory fixture can produce a (coarse) peak-memory number on Windows, where memray is unavailable. memray stays the default and only backend on Linux/macOS.
Not currently needed (our workloads are Linux/macOS, where memray already captures native heap precisely). Filing to stash the design — it's purely additive / backwards-compatible, so it can be picked up cold if a Windows need ever arises.
Motivation
memray intercepts at the malloc layer, so it sees native allocations — NumPy buffers, Rust/Arrow (polars), C-solver heap (gurobipy). tracemalloc is blind to all of those (would report near-zero for polars/gurobipy), so it is not an acceptable Windows substitute. The only portable way to see native peak on Windows is process RSS sampled in a background thread.
Library
Where bulk data lives
tracemalloc sees it?
memray sees it?
pandas / xarray (NumPy)
NumPy buffers
✅
✅
polars
Rust/Arrow native heap
❌
✅
gurobipy
Gurobi C solver heap
❌
✅
pandas (pyarrow dtypes)
Arrow native heap
❌
✅
RSS (via psutil) catches all of it on Windows — coarsely.
Design (decided)
Selection is run-level, not per-test — the backend is a property of the environment, not the benchmark (the same test should use memray on Linux and, if allowed, RSS on Windows; don't bake a backend into test code).
pytest option --peakbench-backend={memray|rss|auto}, backed by ini peakbench_backend.
Engine takes it explicitly: measure_peak(action, *, backend="memray"). The fixture maps run config → this arg.
Three values, strict default:
memray(default) — memray, or the existing clear error. RSS is never produced unless asked. This preserves the trust invariant: an unqualified peak_mib is always memray.
rss — force the psutil sampler on any platform (A/B the backends, or a Windows-only job).
auto — memray where available, else RSS. Fallback exists but is explicitly opted into, never silent.
On Windows with the default, the memory pass raises (timing still works):
RuntimeError: memory needs memray (Linux/macOS). Set --peakbench-backend=rss for RSS sampling.
Tag every result with the backend used (the concrete one, even under auto):
extra_info["peak_backend"] = "memray" | "rss".
Reserve peak_backend out of dims (like peak_mib) in snapshot._dims.
compare / load_long_dfwarn (or refuse) on mixed peak_backend — a memray peak and an RSS peak are different measurements wearing the same MiB label; cross-backend diffs are misleading.
RSS sampler sketch
Run action() on the main thread; a background thread polls psutil.Process().memory_info().rss at high frequency, tracking the max.
Report peak − baseline (baseline = RSS just before the action).
Keep the min-of-N repeats convention (RSS is noisy: allocator caching, OS lazy reclaim, page cache).
psutil becomes a dependency of this backend only.
Caveats to document: sampling granularity is bounded by the thread switch interval (~5ms default; short peaks can hide between samples — consider lowering sys.setswitchinterval during measurement); RSS includes shared pages and can't attribute; it is a process-level high-water mark, not allocation-precise.
Where it slots in
src/peakbench/memray.py: measure_peak() is the single entry point and _require_memray() the platform gate. Make measure_peak dispatch over backends:
measure_peak(action, repeats, backend="memray")
├─ _memray_peak(action) # today's body
└─ _rss_peak(action) # new: psutil sampler thread, max − baseline
The fixture records peak_backend alongside peak_mib.
Note
Rescoped under #38 (unified multi-metric memory model). This is now the Windows
engine for the
rssmetric, not a standalone backend. Specifically:peak_backend/--peakbench-backend; use Unified multi-metric memory model: metrics, tagging, selection, comparability #38's singlemode+enginetag and shared comparability guard.fork/ru_maxrssthere). Theaccurate
rssengine on Linux/macOS isru_maxrss+fork — see Add an rss memory mode: kernel peak-RSS via subprocess, with baseline subtraction #34.rssbehavior.Original design preserved below for the psutil sketch + Windows-CI acceptance.
Summary
Add an opt-in RSS-sampling memory backend so the
benchmark_memoryfixture can produce a (coarse) peak-memory number on Windows, where memray is unavailable. memray stays the default and only backend on Linux/macOS.Not currently needed (our workloads are Linux/macOS, where memray already captures native heap precisely). Filing to stash the design — it's purely additive / backwards-compatible, so it can be picked up cold if a Windows need ever arises.
Motivation
memray intercepts at the malloc layer, so it sees native allocations — NumPy buffers, Rust/Arrow (polars), C-solver heap (gurobipy).
tracemallocis blind to all of those (would report near-zero for polars/gurobipy), so it is not an acceptable Windows substitute. The only portable way to see native peak on Windows is process RSS sampled in a background thread.RSS (via psutil) catches all of it on Windows — coarsely.
Design (decided)
Selection is run-level, not per-test — the backend is a property of the environment, not the benchmark (the same test should use memray on Linux and, if allowed, RSS on Windows; don't bake a backend into test code).
--peakbench-backend={memray|rss|auto}, backed by inipeakbench_backend.measure_peak(action, *, backend="memray"). The fixture maps run config → this arg.Three values, strict default:
memray(default) — memray, or the existing clear error. RSS is never produced unless asked. This preserves the trust invariant: an unqualifiedpeak_mibis always memray.rss— force the psutil sampler on any platform (A/B the backends, or a Windows-only job).auto— memray where available, else RSS. Fallback exists but is explicitly opted into, never silent.On Windows with the default, the memory pass raises (timing still works):
Tag every result with the backend used (the concrete one, even under
auto):extra_info["peak_backend"] = "memray" | "rss".peak_backendout of dims (likepeak_mib) insnapshot._dims.compare/load_long_dfwarn (or refuse) on mixedpeak_backend— a memray peak and an RSS peak are different measurements wearing the sameMiBlabel; cross-backend diffs are misleading.RSS sampler sketch
action()on the main thread; a background thread pollspsutil.Process().memory_info().rssat high frequency, tracking the max.repeatsconvention (RSS is noisy: allocator caching, OS lazy reclaim, page cache).psutilbecomes a dependency of this backend only.Caveats to document: sampling granularity is bounded by the thread switch interval (~5ms default; short peaks can hide between samples — consider lowering
sys.setswitchintervalduring measurement); RSS includes shared pages and can't attribute; it is a process-level high-water mark, not allocation-precise.Where it slots in
src/peakbench/memray.py:measure_peak()is the single entry point and_require_memray()the platform gate. Makemeasure_peakdispatch over backends:The fixture records
peak_backendalongsidepeak_mib.Acceptance
--peakbench-backendoption +peakbench_backendini; defaultmemray(strict error off-platform)._rss_peak+ backend dispatch inmeasure_peak;psutilas backend dep.peak_backendwritten toextra_info, reserved from dims, surfaced by the readers.compare/load_long_df.Notes
Purely additive — does not change the memray default or any existing API. Safe to defer indefinitely.