Windows engine for the `rss` metric (psutil fallback)

> [!NOTE]
> **Rescoped under #38 (unified multi-metric memory model).** This is now the **Windows
> engine for the `rss` metric**, not a standalone backend. Specifically:
> - It does **not** introduce its own tagging scheme. Drop `peak_backend` /
>   `--peakbench-backend`; use #38's single `mode`+`engine` tag and shared comparability guard.
> - psutil sampling is the **Windows-only fallback** (no `fork`/`ru_maxrss` there). The
>   accurate `rss` engine on Linux/macOS is `ru_maxrss`+fork — see #34.
> - The "RSS misses spikes" caveat below is therefore Windows small print, not the general
>   `rss` behavior.
> Original design preserved below for the psutil sketch + Windows-CI acceptance.

---

## Summary

Add an **opt-in** RSS-sampling memory backend so the `benchmark_memory` fixture can produce a (coarse) peak-memory number on **Windows**, where memray is unavailable. memray stays the default and only backend on Linux/macOS.

Not currently needed (our workloads are Linux/macOS, where memray already captures native heap precisely). Filing to stash the design — it's purely additive / backwards-compatible, so it can be picked up cold if a Windows need ever arises.

## Motivation

memray intercepts at the malloc layer, so it sees **native** allocations — NumPy buffers, Rust/Arrow (polars), C-solver heap (gurobipy). `tracemalloc` is blind to all of those (would report near-zero for polars/gurobipy), so it is *not* an acceptable Windows substitute. The only portable way to see native peak on Windows is **process RSS sampled in a background thread**.

| Library | Where bulk data lives | tracemalloc sees it? | memray sees it? |
|---|---|---|---|
| pandas / xarray (NumPy) | NumPy buffers | ✅ | ✅ |
| polars | Rust/Arrow native heap | ❌ | ✅ |
| gurobipy | Gurobi C solver heap | ❌ | ✅ |
| pandas (pyarrow dtypes) | Arrow native heap | ❌ | ✅ |

RSS (via psutil) catches all of it on Windows — coarsely.

## Design (decided)

**Selection is run-level, not per-test** — the backend is a property of the *environment*, not the benchmark (the same test should use memray on Linux and, if allowed, RSS on Windows; don't bake a backend into test code).

- pytest option `--peakbench-backend={memray|rss|auto}`, backed by ini `peakbench_backend`.
- Engine takes it explicitly: `measure_peak(action, *, backend="memray")`. The fixture maps run config → this arg.

**Three values, strict default:**

- `memray` *(default)* — memray, or the existing clear error. **RSS is never produced unless asked.** This preserves the trust invariant: *an unqualified `peak_mib` is always memray.*
- `rss` — force the psutil sampler on any platform (A/B the backends, or a Windows-only job).
- `auto` — memray where available, else RSS. Fallback exists but is **explicitly opted into**, never silent.

On Windows with the default, the memory pass raises (timing still works):
> `RuntimeError: memory needs memray (Linux/macOS). Set --peakbench-backend=rss for RSS sampling.`

**Tag every result with the backend used** (the concrete one, even under `auto`):

- `extra_info["peak_backend"] = "memray" | "rss"`.
- Reserve `peak_backend` out of dims (like `peak_mib`) in `snapshot._dims`.
- `compare` / `load_long_df` **warn (or refuse) on mixed `peak_backend`** — a memray peak and an RSS peak are different measurements wearing the same `MiB` label; cross-backend diffs are misleading.

## RSS sampler sketch

- Run `action()` on the main thread; a background thread polls `psutil.Process().memory_info().rss` at high frequency, tracking the max.
- Report **peak − baseline** (baseline = RSS just before the action).
- Keep the min-of-N `repeats` convention (RSS is noisy: allocator caching, OS lazy reclaim, page cache).
- `psutil` becomes a dependency of this backend only.

**Caveats to document:** sampling granularity is bounded by the thread switch interval (~5ms default; short peaks can hide between samples — consider lowering `sys.setswitchinterval` during measurement); RSS includes shared pages and can't attribute; it is a process-level high-water mark, not allocation-precise.

## Where it slots in

`src/peakbench/memray.py`: `measure_peak()` is the single entry point and `_require_memray()` the platform gate. Make `measure_peak` dispatch over backends:

```
measure_peak(action, repeats, backend="memray")
  ├─ _memray_peak(action)   # today's body
  └─ _rss_peak(action)      # new: psutil sampler thread, max − baseline
```

The fixture records `peak_backend` alongside `peak_mib`.

## Acceptance

- [ ] `--peakbench-backend` option + `peakbench_backend` ini; default `memray` (strict error off-platform).
- [ ] `_rss_peak` + backend dispatch in `measure_peak`; `psutil` as backend dep.
- [ ] `peak_backend` written to `extra_info`, reserved from dims, surfaced by the readers.
- [ ] mixed-backend guard in `compare` / `load_long_df`.
- [ ] **Validated on real Windows CI** (the actual target — not just RSS on Linux/macOS).
- [ ] Docs: backend selection + the RSS-vs-memray comparability caveat.

## Notes

Purely additive — does not change the memray default or any existing API. Safe to defer indefinitely.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Windows engine for the `rss` metric (psutil fallback) #9

Summary

Motivation

Design (decided)

RSS sampler sketch

Where it slots in

Acceptance

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Library	Where bulk data lives	tracemalloc sees it?	memray sees it?
pandas / xarray (NumPy)	NumPy buffers	✅	✅
polars	Rust/Arrow native heap	❌	✅
gurobipy	Gurobi C solver heap	❌	✅
pandas (pyarrow dtypes)	Arrow native heap	❌	✅

Windows engine for the rss metric (psutil fallback) #9

Description

Summary

Motivation

Design (decided)

RSS sampler sketch

Where it slots in

Acceptance

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Windows engine for the `rss` metric (psutil fallback) #9