Skip to content

Agent Configuration

LucaFalasca edited this page May 19, 2024 · 2 revisions

This documentation describes how to configure the JSON file for a reinforcement learning (RL) agent using the provided patterns for various parameters and options. Configuration File Structure

The configuration JSON file contains various parameters to define the behavior and architecture of the RL agent. Here is an example configuration:

{
    "agent_type": "dqn",
    "activation_layer": "relu",
    "fc_layer_params": [30, 20, 10],
    "learning_rate": 0.0001,
    "eps_greedy_bolz_choose": "eps_greedy",
    "epsilon_greedy": 0.1,
    "boltzmann_temperature": 1.0,
    "gamma": 0.99,
    "error_loss_fn": "squared_loss",
    "optimizer": "adam",
    "rsmprop_momentum": 0.0,
    "rsmprop_rho": 0.9,
    "adam_beta_1": 0.9,
    "adam_beta_2": 0.999,
    "adagrad_initial_accumulator_value": 0.1,
    "rb_max_length": 1000,
    "rb_batch_size": 32,
    "rb_train_freq": 2,
    "rb_sample_batch_size": 4
}

Configuration Parameters

agent_type Description: Specifies the type of RL agent to use.

 Acceptable Values:
 dqn: Deep Q-Network.
 random: Random Agent
 Pattern: dqn_agent|dqnagent|dqn|dqn-agent|random_agent|randomagent|random|random-agent.

activation_layer

Description: Defines the type of activation function used in the neural network.
Acceptable Values:
    relu: Rectified Linear Unit.
    tanh: Hyperbolic Tangent.
    sigmoid: Sigmoid Function.
    linear: Linear Function.
    softmax: Softmax Function.
    Pattern: relu|tanh|sigmoid|linear|softmax.

fc_layer_params

Description: List of integers defining the size of the fully connected layers in the neural network.
Example: [30, 20, 10].

learning_rate

Description: Learning rate for optimization.
Type: Number (float).
Example: 0.0001.

eps_greedy_bolz_choose

Description: Method to choose between epsilon-greedy and Boltzmann.
Acceptable Values:
    eps_greedy: Epsilon-greedy method.
    boltzmann: Boltzmann method.
    Pattern: epsilon_greedy|epsilon-greedy|epsilon|greedy|epsilongreedy|eps_greedy|eps-greedy|eps|greedy|epsilon-greedy-bolz-choose|epsilon_greedy_bolz_choose|eps_greedy_bolz_choose|eps-greedy-bolz-choose|eps-greedy-bolz|eps_greedy_bolz|eps-greedy-bolz.

epsilon_greedy

Description: Epsilon value for the epsilon-greedy policy.
Type: Number (float).
Example: 0.1.

boltzmann_temperature

Description: Temperature for the Boltzmann policy.
Type: Number (float).
Example: 1.0.
Pattern: boltzmann_temperature|boltzmann-temperature|boltzmann|temperature|boltzmanntemperature.

gamma

Description: Discount factor for future expected value calculation.
Type: Number (float).
Example: 0.99.

error_loss_fn

Description: Loss function used for optimization.
Acceptable Values:
    squared_loss: Squared loss.
    huber_loss: Huber loss.
    Pattern: squared_loss|squaredloss|squared|loss|huber_loss|huberloss|huber|loss.

optimizer

Description: Optimizer used to update the neural network weights.
Acceptable Values:
    adam: Adam optimizer.
    sgd: Stochastic Gradient Descent optimizer.
    rmsprop: RMSProp optimizer.
    adagrad: AdaGrad optimizer.
    Pattern: adam|adam_optimizer|adamoptimizer|sgd|sgd_optimizer|sgdoptimizer|rmsprop|rmsprop_optimizer|rmspropoptimizer|adagrad|adagrad_optimizer|adagradoptimizer.

rsmprop_momentum

Description: Momentum used by the RMSProp optimizer.
Type: Number (float).
Example: 0.0.

rsmprop_rho

Description: Discount factor used by the RMSProp optimizer.
Type: Number (float).
Example: 0.9.

adam_beta_1

Description: Beta_1 parameter for the Adam optimizer.
Type: Number (float).
Example: 0.9.

adam_beta_2

Description: Beta_2 parameter for the Adam optimizer.
Type: Number (float).
Example: 0.999.

adagrad_initial_accumulator_value

Description: Initial accumulator value for the AdaGrad optimizer.
Type: Number (float).
Example: 0.1.

rb_max_length

Description: Maximum length of the replay buffer.
Type: Number (int).
Example: 1000.

rb_batch_size

Description: Batch size fetched from the replay buffer for training.
Type: Number (int).
Example: 32.

rb_train_freq

Description: Training frequency of the agent (number of action steps between training).
Type: Number (int).
Example: 2.

rb_sample_batch_size

Description: Sample batch size from the replay buffer.
Type: Number (int).
Example: 4.

Note on Search Patterns

The search patterns used in this configuration help to recognize various parameters through regular expressions. For example, the pattern for recognizing the DQN agent is dqn_agent|dqnagent|dqn|dqn-agent, meaning any string matching one of these subterms will be interpreted as a DQN agent.

Clone this wiki locally