Add support for gradient clipping.#94
Conversation
|
/build_and_test |
|
🚀 Build workflow triggered! View run |
|
✅ Build workflow passed! View run |
romerojosh
left a comment
There was a problem hiding this comment.
Thanks for adding this feature!
Left a small comment about SAC grad clipping. There are also some merge conflicts that need to be resolved after merging in #93.
| if (alpha_lr_scheduler) { | ||
| alpha_lr_scheduler->step(); | ||
| // I think we do not need grad clipping here | ||
| // the model is just a scalar and clipping the grad for that might hurt |
There was a problem hiding this comment.
I noted this in the previous version of this MR. What does stable-baselines do in this case (if they support grad clipping)? We should match that as a reference if we can.
There was a problem hiding this comment.
Yes, so stable baselines does not support gradient accumulation nor clipping except for PPO.
However, I think gradient accumulation makes sense in general for torchfort because of the online nature. That way we can have larger batch sizes without using a replay buffer.
Concerning grad clipping, I mean it's optional. I do not see why it should hurt if you have the support for it. We can add it here as well.
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
f510a76 to
cc498c8
Compare
|
/build_and_test |
|
🚀 Build workflow triggered! View run |
|
✅ Build workflow passed! View run |
This MR adds gradient clipping to all training procedures.