Fix/replay buffer nan loss by whtoo · Pull Request #6 · whtoo/snake_rl

whtoo · 2025-06-21T06:00:43Z

No description provided.

…ReplayBuffer Added epsilons to prevent division by zero during the calculation of probabilities and importance sampling weights in `PrioritizedReplayBuffer.sample()`. This addresses RuntimeWarnings for division by zero and invalid values, which were causing the average loss to become NaN during training.

- Modified RainbowAgent.__init__ to prevent passing 'n_step' or other Rainbow-specific parameters to DQNAgent, resolving the TypeError. - Updated tests in test_rainbow_components.py to use 'base_n_step' instead of the legacy 'n_step' parameter when instantiating RainbowAgent. - Corrected assertion in test_rainbow_agent_initialization_standard to check agent.n_step_buffer.base_n_step instead of a non-existent agent.n_step. These changes fix the agent initialization error and related test failures. A pre-existing TypeError in test_rainbow_agent_update_model_call_order_noisy (MagicMock issue) remains and is unrelated to these changes.

将直接运行脚本路径改为使用模块方式运行，解决相对导入问题

…/snake_rl into fix/replay-buffer-nan-loss

- 添加优化配置文件optimized_config.py包含调整后的超参数 - 修改agent.py支持Huber Loss和更严格的梯度裁剪 - 新增train_optimized.py用于优化训练流程 - 添加quick_test_optimization.py快速验证优化效果 - 创建optimization_plan.md记录优化策略和计划

google-labs-jules bot and others added 5 commits June 21, 2025 03:03

fix: 使用模块方式运行训练脚本以避免相对导入错误

786f342

将直接运行脚本路径改为使用模块方式运行，解决相对导入问题

Merge branch 'fix/replay-buffer-nan-loss' of https://github.com/whtoo…

556370d

…/snake_rl into fix/replay-buffer-nan-loss

whtoo merged commit e4500ce into main Jun 21, 2025
1 check failed

whtoo deleted the fix/replay-buffer-nan-loss branch June 21, 2025 06:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/replay buffer nan loss#6

Fix/replay buffer nan loss#6
whtoo merged 5 commits intomainfrom
fix/replay-buffer-nan-loss

whtoo commented Jun 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

whtoo commented Jun 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant