-
Notifications
You must be signed in to change notification settings - Fork 2
Agent Configuration
This documentation describes how to configure the JSON file for a reinforcement learning (RL) agent using the provided patterns for various parameters and options. Configuration File Structure
The configuration JSON file contains various parameters to define the behavior and architecture of the RL agent. Here is an example configuration:
{
"agent_type": "dqn",
"activation_layer": "relu",
"fc_layer_params": [30, 20, 10],
"learning_rate": 0.0001,
"eps_greedy_bolz_choose": "eps_greedy",
"epsilon_greedy": 0.1,
"boltzmann_temperature": 1.0,
"gamma": 0.99,
"error_loss_fn": "squared_loss",
"optimizer": "adam",
"rsmprop_momentum": 0.0,
"rsmprop_rho": 0.9,
"adam_beta_1": 0.9,
"adam_beta_2": 0.999,
"adagrad_initial_accumulator_value": 0.1,
"rb_max_length": 1000,
"rb_batch_size": 32,
"rb_train_freq": 2,
"rb_sample_batch_size": 4
}agent_type Description: Specifies the type of RL agent to use.
Acceptable Values:
dqn: Deep Q-Network.
random: Random Agent
Pattern: dqn_agent|dqnagent|dqn|dqn-agent|random_agent|randomagent|random|random-agent.
activation_layer
Description: Defines the type of activation function used in the neural network.
Acceptable Values:
relu: Rectified Linear Unit.
tanh: Hyperbolic Tangent.
sigmoid: Sigmoid Function.
linear: Linear Function.
softmax: Softmax Function.
Pattern: relu|tanh|sigmoid|linear|softmax.
fc_layer_params
Description: List of integers defining the size of the fully connected layers in the neural network.
Example: [30, 20, 10].
learning_rate
Description: Learning rate for optimization.
Type: Number (float).
Example: 0.0001.
eps_greedy_bolz_choose
Description: Method to choose between epsilon-greedy and Boltzmann.
Acceptable Values:
eps_greedy: Epsilon-greedy method.
boltzmann: Boltzmann method.
Pattern: epsilon_greedy|epsilon-greedy|epsilon|greedy|epsilongreedy|eps_greedy|eps-greedy|eps|greedy|epsilon-greedy-bolz-choose|epsilon_greedy_bolz_choose|eps_greedy_bolz_choose|eps-greedy-bolz-choose|eps-greedy-bolz|eps_greedy_bolz|eps-greedy-bolz.
epsilon_greedy
Description: Epsilon value for the epsilon-greedy policy.
Type: Number (float).
Example: 0.1.
boltzmann_temperature
Description: Temperature for the Boltzmann policy.
Type: Number (float).
Example: 1.0.
Pattern: boltzmann_temperature|boltzmann-temperature|boltzmann|temperature|boltzmanntemperature.
gamma
Description: Discount factor for future expected value calculation.
Type: Number (float).
Example: 0.99.
error_loss_fn
Description: Loss function used for optimization.
Acceptable Values:
squared_loss: Squared loss.
huber_loss: Huber loss.
Pattern: squared_loss|squaredloss|squared|loss|huber_loss|huberloss|huber|loss.
optimizer
Description: Optimizer used to update the neural network weights.
Acceptable Values:
adam: Adam optimizer.
sgd: Stochastic Gradient Descent optimizer.
rmsprop: RMSProp optimizer.
adagrad: AdaGrad optimizer.
Pattern: adam|adam_optimizer|adamoptimizer|sgd|sgd_optimizer|sgdoptimizer|rmsprop|rmsprop_optimizer|rmspropoptimizer|adagrad|adagrad_optimizer|adagradoptimizer.
rsmprop_momentum
Description: Momentum used by the RMSProp optimizer.
Type: Number (float).
Example: 0.0.
rsmprop_rho
Description: Discount factor used by the RMSProp optimizer.
Type: Number (float).
Example: 0.9.
adam_beta_1
Description: Beta_1 parameter for the Adam optimizer.
Type: Number (float).
Example: 0.9.
adam_beta_2
Description: Beta_2 parameter for the Adam optimizer.
Type: Number (float).
Example: 0.999.
adagrad_initial_accumulator_value
Description: Initial accumulator value for the AdaGrad optimizer.
Type: Number (float).
Example: 0.1.
rb_max_length
Description: Maximum length of the replay buffer.
Type: Number (int).
Example: 1000.
rb_batch_size
Description: Batch size fetched from the replay buffer for training.
Type: Number (int).
Example: 32.
rb_train_freq
Description: Training frequency of the agent (number of action steps between training).
Type: Number (int).
Example: 2.
rb_sample_batch_size
Description: Sample batch size from the replay buffer.
Type: Number (int).
Example: 4.
The search patterns used in this configuration help to recognize various parameters through regular expressions. For example, the pattern for recognizing the DQN agent is dqn_agent|dqnagent|dqn|dqn-agent, meaning any string matching one of these subterms will be interpreted as a DQN agent.