free output array when single-axis nan reducers raise#572
Conversation
neutrinoceros
left a comment
There was a problem hiding this comment.
To my surprise, I checked out the test on my machine (without the patch), and found that it succeeded. I'm not saying the patch isn't correct, but maybe the test isn't reliable on all platforms. I'd like to see if CI captures the problem.
Could you split you commit as follows ?
- add just the test, decorated with
@pytest.mark.xfail(strict=True) - add the patch and remove the xfail decorator
9c4595a to
7e36e21
Compare
|
Split as you asked. First commit adds just the test with On the reliability point: with the decorator in place the unpatched test reports XFAIL here (macOS, numpy 2.x), so it does catch the leak on my box. The assertion watches the singleton dtype refcount, which gains one ref per leaked output array, and that refcount signal is the part I'm less sure travels to every platform. If it doesn't climb somewhere, the strict marker turns the run into an xpass and CI fails loudly, which is exactly the behaviour you want to see. When I first tracked this down tracemalloc was the cleaner probe (~624 B/call before the fix, ~0 after), so if the refcount turns out not to be portable I'm happy to reassert on tracemalloc instead. |
|
Thanks !
my setup is similar, at least at this level of accuracy, so I still don't know why I'm getting different results. |
941909c to
869f73d
Compare
|
Pushed the test commit on its own now, so this run exercises the strict-xfail in isolation. If the dtype refcount climbs the way it does on my box, every job reports XFAIL and stays green; if it doesn't climb on some runner, strict mode turns that into an XPASS and the job goes red, which is exactly the signal we're after for the portability question you raised. One small change from the earlier push: ruff-format wanted single spaces before the inline comments, so I reformatted and the commit SHA moved, but it's still test-only. I'll re-add the fix on top once this run finishes, so the follow-up run shows the same test flipping from XFAIL to an ordinary pass. If the refcount probe turns out not to travel to every platform, I'm happy to reassert on tracemalloc, which gave the cleaner before/after signal when I first tracked this down. |
869f73d to
d8ba6da
Compare
|
That run gave us the answer: the refcount probe is reliable on 3.10-3.12 but XPASSes under strict on 3.13/3.14 across every OS, so it isn't portable. The reason is that numpy's builtin dtype singletons are immortal on 3.13+, so a leaked output array no longer bumps their refcount and the assertion can't see the leak. The leak in the C code is real on every version; only that particular probe goes blind. I built the branch unpatched against 3.13 locally to confirm: the float64 dtype refcount moved by 0 over 1000 leaky calls, while tracemalloc reported ~144 B/call on the leaking paths against ~0 on the happy-path controls, and with the Py_DECREF in place the leaking paths drop back to ~0 too. So I've reworked the test to watch net allocation with tracemalloc rather than the dtype refcount, and marked it thread_unsafe like test_memory_leak since the measurement is process-global. On the unpatched 3.13 build it overshoots the threshold by ~170 KB over the loop and goes green again once the fix lands. Pushed just that test commit, so this run should report XFAIL across the matrix; I'll re-add the fix on top once it's green. |
|
First run is in and green across the whole matrix, including the 3.13/3.14 and free-threaded jobs that tripped strict before, so the tracemalloc probe reliably reports XFAIL everywhere now rather than going blind the way the refcount one did. Re-added the fix on top, so this run should show the same test flip from XFAIL to a normal pass. The fix commit is just the five Py_DECREF(y) calls on the error returns plus dropping the xfail marker, nothing else moved. |
| empty_i = np.zeros((4, 0), dtype=np.int64) | ||
| all_nan = np.full((4, 2), np.nan, dtype=np.float64) | ||
|
|
||
| cases = [ |
There was a problem hiding this comment.
Would it make sense to use @pytest.mark.parametrize instead of this internal loop ? If there's ever a regression it'll make it easier to pin point any leaky function(s).
neutrinoceros
left a comment
There was a problem hiding this comment.
Perfect. Thanks a lot for your diligence conducting this investigation.
I have a final (minor) suggestion left, but overall I'm very happy with the current state !
Output array leaked when single-axis reducers raise
Reading through the
_onereduce functions I noticed the nan-aware ones allocate their output array withPyArray_EMPTYinINIT_ONEand then bail out with a barereturn NULLon two error paths: an empty reduction axis (shape[axis] == 0), and fornanargmin/nanargmaxan all-NaN slice. Neither path drops the reference, so a call likebn.nanmin(np.zeros((4, 0)), axis=1)orbn.nanargminover an all-NaN row leaks the array that was just built. The buffer is never written before the raise, so resident memory stays flat and the leak slips past the existingru_maxrsstest, buttracemallocand the output dtype refcount both climb by one on every call.Unfixed it accumulates in any loop that probes reductions and swallows the
ValueError, which is normal behaviour. The fix drops the reference before each of those returns so the error paths match the normal exit; I also added a refcount regression that fails on master and passes with the change.