Skip to content

alversafa/meta-rl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Meta Reinforcement Learning

Simple implementation of meta reinforcement learning experiments in PyTorch.

Dependencies: pytorch 0.4, numpy, matplotlib

Experiments:

  • depbandit_1.ipynb: Dependent bandit experiment I from the Learning to reinforcement learn paper by J. Wang et. al

    In this experiment, the agent is trained on a distribution of dependent 2-armed bandits and it is asked to solve particular 2-armed bandits where the reward giving arm is changed in each test episode.

  • depbandit_2.ipynb: Dependent bandit experiment II from the Learning to reinforcement learn paper by J. Wang et. al

    In this experiment, the agent is trained on a distribution of dependent 11-armed bandits with deterministic payouts. The nine non-target arms give a reward of 1, the one target arm gives a reward of 5, and the eleventh arm gives an informative reward (<1) of one tenth the target arm's index as 0.2 when the target arm is 2. It is expected from the agent to pay a short-term reward cost to gain information on the target arm and keep pulling it.

About

Implementation of Meta RL Experiments

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors