Skip to content

free output array when single-axis nan reducers raise#572

Open
sahvx655-wq wants to merge 2 commits into
pydata:masterfrom
sahvx655-wq:reduce-one-error-leak
Open

free output array when single-axis nan reducers raise#572
sahvx655-wq wants to merge 2 commits into
pydata:masterfrom
sahvx655-wq:reduce-one-error-leak

Conversation

@sahvx655-wq

Copy link
Copy Markdown
Contributor

Output array leaked when single-axis reducers raise

Reading through the _one reduce functions I noticed the nan-aware ones allocate their output array with PyArray_EMPTY in INIT_ONE and then bail out with a bare return NULL on two error paths: an empty reduction axis (shape[axis] == 0), and for nanargmin/nanargmax an all-NaN slice. Neither path drops the reference, so a call like bn.nanmin(np.zeros((4, 0)), axis=1) or bn.nanargmin over an all-NaN row leaks the array that was just built. The buffer is never written before the raise, so resident memory stays flat and the leak slips past the existing ru_maxrss test, but tracemalloc and the output dtype refcount both climb by one on every call.

Unfixed it accumulates in any loop that probes reductions and swallows the ValueError, which is normal behaviour. The fix drops the reference before each of those returns so the error paths match the normal exit; I also added a refcount regression that fails on master and passes with the change.

@neutrinoceros neutrinoceros left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my surprise, I checked out the test on my machine (without the patch), and found that it succeeded. I'm not saying the patch isn't correct, but maybe the test isn't reliable on all platforms. I'd like to see if CI captures the problem.
Could you split you commit as follows ?

  • add just the test, decorated with @pytest.mark.xfail(strict=True)
  • add the patch and remove the xfail decorator

@sahvx655-wq sahvx655-wq force-pushed the reduce-one-error-leak branch from 9c4595a to 7e36e21 Compare June 9, 2026 03:58
@sahvx655-wq

Copy link
Copy Markdown
Contributor Author

Split as you asked. First commit adds just the test with @pytest.mark.xfail(strict=True), the second adds the patch and drops the decorator, so it goes from xfail to a normal pass once the fix lands.

On the reliability point: with the decorator in place the unpatched test reports XFAIL here (macOS, numpy 2.x), so it does catch the leak on my box. The assertion watches the singleton dtype refcount, which gains one ref per leaked output array, and that refcount signal is the part I'm less sure travels to every platform. If it doesn't climb somewhere, the strict marker turns the run into an xpass and CI fails loudly, which is exactly the behaviour you want to see. When I first tracked this down tracemalloc was the cleaner probe (~624 B/call before the fix, ~0 after), so if the refcount turns out not to be portable I'm happy to reassert on tracemalloc instead.

@neutrinoceros

Copy link
Copy Markdown
Collaborator

Thanks !
The failure on 3.14t is unrelated (#574)

with the decorator in place the unpatched test reports XFAIL here (macOS, numpy 2.x)

my setup is similar, at least at this level of accuracy, so I still don't know why I'm getting different results.
I realise I wasn't explicit enough in my instructions. I was expecting CI to be run on both commits, which requires that the first one be pushed on its own, and the second one only re-added after the first run completes.

@sahvx655-wq sahvx655-wq force-pushed the reduce-one-error-leak branch 2 times, most recently from 941909c to 869f73d Compare June 9, 2026 10:47
@sahvx655-wq

Copy link
Copy Markdown
Contributor Author

Pushed the test commit on its own now, so this run exercises the strict-xfail in isolation. If the dtype refcount climbs the way it does on my box, every job reports XFAIL and stays green; if it doesn't climb on some runner, strict mode turns that into an XPASS and the job goes red, which is exactly the signal we're after for the portability question you raised. One small change from the earlier push: ruff-format wanted single spaces before the inline comments, so I reformatted and the commit SHA moved, but it's still test-only.

I'll re-add the fix on top once this run finishes, so the follow-up run shows the same test flipping from XFAIL to an ordinary pass. If the refcount probe turns out not to travel to every platform, I'm happy to reassert on tracemalloc, which gave the cleaner before/after signal when I first tracked this down.

@sahvx655-wq sahvx655-wq force-pushed the reduce-one-error-leak branch from 869f73d to d8ba6da Compare June 9, 2026 10:56
@sahvx655-wq

Copy link
Copy Markdown
Contributor Author

That run gave us the answer: the refcount probe is reliable on 3.10-3.12 but XPASSes under strict on 3.13/3.14 across every OS, so it isn't portable. The reason is that numpy's builtin dtype singletons are immortal on 3.13+, so a leaked output array no longer bumps their refcount and the assertion can't see the leak. The leak in the C code is real on every version; only that particular probe goes blind. I built the branch unpatched against 3.13 locally to confirm: the float64 dtype refcount moved by 0 over 1000 leaky calls, while tracemalloc reported ~144 B/call on the leaking paths against ~0 on the happy-path controls, and with the Py_DECREF in place the leaking paths drop back to ~0 too.

So I've reworked the test to watch net allocation with tracemalloc rather than the dtype refcount, and marked it thread_unsafe like test_memory_leak since the measurement is process-global. On the unpatched 3.13 build it overshoots the threshold by ~170 KB over the loop and goes green again once the fix lands. Pushed just that test commit, so this run should report XFAIL across the matrix; I'll re-add the fix on top once it's green.

@sahvx655-wq

Copy link
Copy Markdown
Contributor Author

First run is in and green across the whole matrix, including the 3.13/3.14 and free-threaded jobs that tripped strict before, so the tracemalloc probe reliably reports XFAIL everywhere now rather than going blind the way the refcount one did.

Re-added the fix on top, so this run should show the same test flip from XFAIL to a normal pass. The fix commit is just the five Py_DECREF(y) calls on the error returns plus dropping the xfail marker, nothing else moved.

empty_i = np.zeros((4, 0), dtype=np.int64)
all_nan = np.full((4, 2), np.nan, dtype=np.float64)

cases = [

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to use @pytest.mark.parametrize instead of this internal loop ? If there's ever a regression it'll make it easier to pin point any leaky function(s).

@neutrinoceros neutrinoceros left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect. Thanks a lot for your diligence conducting this investigation.
I have a final (minor) suggestion left, but overall I'm very happy with the current state !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants