Skip to content

Rationale for KL decay #44

@Rolv-Arild

Description

@Rolv-Arild

Hi, just a small question about the choice of making the KL term decay in importance.

In the paper you describe that high values of the KL divergence coefficient limit the policy, while low values make it forget useful stuff from the BC model. Thus you set it to decay gradually.
I'm just wondering, can't this lead to having kind of the worst of both worlds, where it only hits a good range for a brief period? Did you try setting it to a constant in-between value instead of decaying?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions