-
Notifications
You must be signed in to change notification settings - Fork 168
Open
Description
Hi, just a small question about the choice of making the KL term decay in importance.
In the paper you describe that high values of the KL divergence coefficient limit the policy, while low values make it forget useful stuff from the BC model. Thus you set it to decay gradually.
I'm just wondering, can't this lead to having kind of the worst of both worlds, where it only hits a good range for a brief period? Did you try setting it to a constant in-between value instead of decaying?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels