Per-chunk gc.collect() + cuda.empty_cache() in batch loops (runs on CPU-only too)

From /simplify codebase sweep (2026-06-10).

Full generational `gc.collect()` + CUDA allocator flush run once per mini-batch: `transparency/explainers/base_explainer.py:231-233,265-268`, `robustness/assessors/base_assessor.py:333-336`, same pattern in `task_families/classification.py:88-99`. On CPU-only runs `empty_cache` is skipped but `gc.collect()` is not.

Cost: `gc.collect()` is O(all live objects); with batch sizes 4-8 over hundreds of samples this adds tens of ms per chunk and can dominate cheap explainers (Saliency). Per-chunk `empty_cache()` also forces allocator round-trips that slow subsequent allocations.

Fix: collect once after the loop, or guard the per-chunk collect behind `torch.cuda.is_available()` (the leak it mitigates is CUDA-graph/hook retention; CPU runs gain nothing). Keep per-chunk `empty_cache` only if the memory-leak tests demand it. Verify against `src/raitap/tests/test_memory_leaks.py` before changing cadence.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Per-chunk gc.collect() + cuda.empty_cache() in batch loops (runs on CPU-only too) #327

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Per-chunk gc.collect() + cuda.empty_cache() in batch loops (runs on CPU-only too) #327

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions