From /simplify codebase sweep (2026-06-10).
Full generational gc.collect() + CUDA allocator flush run once per mini-batch: transparency/explainers/base_explainer.py:231-233,265-268, robustness/assessors/base_assessor.py:333-336, same pattern in task_families/classification.py:88-99. On CPU-only runs empty_cache is skipped but gc.collect() is not.
Cost: gc.collect() is O(all live objects); with batch sizes 4-8 over hundreds of samples this adds tens of ms per chunk and can dominate cheap explainers (Saliency). Per-chunk empty_cache() also forces allocator round-trips that slow subsequent allocations.
Fix: collect once after the loop, or guard the per-chunk collect behind torch.cuda.is_available() (the leak it mitigates is CUDA-graph/hook retention; CPU runs gain nothing). Keep per-chunk empty_cache only if the memory-leak tests demand it. Verify against src/raitap/tests/test_memory_leaks.py before changing cadence.
From /simplify codebase sweep (2026-06-10).
Full generational
gc.collect()+ CUDA allocator flush run once per mini-batch:transparency/explainers/base_explainer.py:231-233,265-268,robustness/assessors/base_assessor.py:333-336, same pattern intask_families/classification.py:88-99. On CPU-only runsempty_cacheis skipped butgc.collect()is not.Cost:
gc.collect()is O(all live objects); with batch sizes 4-8 over hundreds of samples this adds tens of ms per chunk and can dominate cheap explainers (Saliency). Per-chunkempty_cache()also forces allocator round-trips that slow subsequent allocations.Fix: collect once after the loop, or guard the per-chunk collect behind
torch.cuda.is_available()(the leak it mitigates is CUDA-graph/hook retention; CPU runs gain nothing). Keep per-chunkempty_cacheonly if the memory-leak tests demand it. Verify againstsrc/raitap/tests/test_memory_leaks.pybefore changing cadence.