This repository contains implementations of Agents using Nervana Systems Coach, and bayesian optimization implementations using Sheffields GPyOpt.
These scripts require that you use Linux.
There is also a heavy reliance on the use of Pandas to mange .csv files for logging data during training, this is implemented in such a way that while running Bayesian Optimization if for any reason the environment/agent crashes the failed log files will be removed.
This repository is realized quite quickly and not everything has been extensively tested. However the premis behind the code works, Bayesian Optimization of RL Coach Agents for any arbitrary environment considering they make use of either OpenAI Gym or the RL Coach interface.
Find optimal hyperparameters using the following equation:
is defined as the averaged sum of 'Training rewards' for all episodes in a training cycle.
Here are some reasons for this:
- Very Easy to implement
- Quantifies learning rate, faster is better
- Quantifies stable learning, continual progress is better
- Quantifies asymptotic performance, the higher the final performance the better
Of course there is a possibility that a local-maximum is found, but in my experience this metric is sufficient in achieving acceptable results.
Using the library GPyOpt, a gaussian process bayesian optimizer is constructed the two mandatory params that must be passed are:
-
Boundary definitions of the hyper-parameter set
as a list of dicts.
-
The function
can be implemented in python as an algorithm/function such as def run_ai(x): do stuff; return y
The acquisition funtion for determining the next choice of hyper-parameters is the Expected Improvement function by default.
The algorithm/function defined here actually performs 3 steps:
- It writes the new parameters to an opt_params.csv file
- It calls the Agent .py script and waits for this code to finish running.
- After correct execution of the Agent .py script reads the log-file and sums + returns the total-reward for each episode. If the agent script crashes this code deletes the last training data and exits the script as well.
In the Agent script the opt_params.csv file is read and the latest hyper-parameters entry is used to construct a new agent.
The new agent is trained for a predefined number of iterations and upon completion the hyperparameter optimization process is resumed.
The output of the bayesopt.py script is the optimization_parameters.csv file. Agents implemented in Reinforcement Learning Coach automatically generate log files that are used for both the Dashboard app that comes with RL Coach as well as the Optimization script implemented in this Repo.
This means that any Agent realized in RL Coach can easily be optimzed using bayesopt.py all that is required is having the Agent load new parameters from the opt_params.csv file and defining the boundaries of the hyper-parameter searchspace.
Goals for in the future if i have time:
- Implement multi-agent optimization techniques
- Implement CMA-ES as an alternate optimization method