Skip to content

Fix/cpu memory dtype#10

Merged
whtoo merged 6 commits intomainfrom
fix/cpu-memory-dtype
Jun 21, 2025
Merged

Fix/cpu memory dtype#10
whtoo merged 6 commits intomainfrom
fix/cpu-memory-dtype

Conversation

@whtoo
Copy link
Copy Markdown
Owner

@whtoo whtoo commented Jun 21, 2025

No description provided.

google-labs-jules bot and others added 6 commits June 21, 2025 08:29
This commit introduces objgraph calls at various points in the training
process to help diagnose potential memory leaks or unexpected object
accumulation.

Changes:
- Added `import objgraph` to `train_optimized.py` and `src/agent.py`,
  with a fallback if the library is not installed.
- In `train_optimized.py`:
    - Added an `objgraph` log at the beginning of `main()`.
    - Added a periodic `objgraph` log every 20 episodes in the main
      training loop.
- In `DQNAgent.update_model()` (in `src/agent.py`):
    - Added a periodic `objgraph` log triggered every 2000 environment steps.

These logs will show the most common object types, providing insights
into memory usage patterns during training.
This commit removes the objgraph logging added previously, as it did not
reveal a runaway Python object count leak directly explaining the scale
of the memory issue.

It introduces explicit calls to `gc.collect()` at the end of each
episode in `train_optimized.py`. This is a diagnostic step to test
if more aggressive garbage collection can help mitigate the observed
high memory usage, which is suspected to be related to high churn of
memory from frequent allocations/deallocations during intense training
periods.
Changed the explicit garbage collection call in `train_optimized.py`
to trigger every 5 episodes instead of every single episode.
This is to test if a less frequent forced GC can still prevent
memory explosion while reducing potential GC overhead.
@whtoo whtoo merged commit 7103e38 into main Jun 21, 2025
1 check failed
@whtoo whtoo deleted the fix/cpu-memory-dtype branch June 21, 2025 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant