A reinforcement learning environment made using Godot and Godot RL Agents. Both players are controlled using a single RL agent (it receives observations from both players, and sends actions for both players).
2p_coop_box_sort_env.mp4
- Open the project in Godot Engine (made with 4.5.beta6.mono on Windows, may work with other versions too).
- Find the test scene (
scenes/test_scene/testing_scene.tscn) and open it in the Godot Editor. - Press 'F6' to start the scene.
You should see the trained agent solving the environment using onnx inference. If you wish to train your own agent, refer to the Godot RL Agents repository. Check the tutorial section to learn more.
Two players must cooperate to push boxes into correct goals based on the category within a time limit. The boxes must not reach the limit (red area).
The RL agent receives data from position sensors, raycast sensors, and current fraction of the allowed time.
First robot:
- Box checkpoint (not visible, located at the opening between the two rooms) Second robot:
- Green goal
- Yellow goal
- Box checkpoint
- Collision shape of the limit (red area)
There are multiple raycast sensors attached to each robot. They provide distances to the walls, as well as yellow and green boxes.
Each sensors only reacts with a single physics layer, so e.g. the wall raycast sensor only reports the distances to the walls, ignoring any occluding boxes.
The env uses a multi-discrete action space. For each player, there is a single discrete action which determines the movement direction.
func get_action_space() -> Dictionary:
return {
"p1_move": {"size": move_dirs.size(), "action_type": "discrete"},
"p2_move": {"size": move_dirs.size(), "action_type": "discrete"},
}
var move_dirs = [
Vector2.LEFT,
Vector2.RIGHT,
Vector2.UP,
Vector2.DOWN,
Vector2.LEFT + Vector2.UP,
Vector2.RIGHT + Vector2.UP,
Vector2.LEFT + Vector2.DOWN,
Vector2.RIGHT + Vector2.DOWN,
]- +10 for each box entering the checkpoint
- +10 for each box entering the correct goal
- -1 for each box entering the limit
- -1 for each box entering the wrong goal
- -1 for timeout
Training was done with the SB3 example training script, modified to use different hyperparameters.
Modified lines include:
model: PPO = PPO(
"MultiInputPolicy",
env,
ent_coef=0.025,
n_steps=1024,
batch_size=1024 * env.num_envs,
target_kl=0.006,
n_epochs=40,
vf_coef=0.1,
tensorboard_log=args.experiment_dir,
learning_rate=learning_rate,
)Training was started with the following CL args (excluding the arg used for checkpoint saving, also you will need to export the game to fill in the env path, do not run the command before filling in the correct executable path):
--env_path=PATH_TO_EXPORTED_EXECUTABLE
--n_parallel=4
--onnx_export_path=model.onnx
--timesteps=100_000_0000
--save_model_path=model.zip
--speedup=20
--experiment_name=2p_coop_box_sorting2Training stats (rewards/success rates are during training, not eval):

- Godot - 4.5.beta6.mono on Win10
- Godot RL Agents - One version behind the currently newest commit, but it should work with the current one (https://github.com/edbeeching/godot_rl_agents/commit/d32518a7a0725b6a3e68ac6cb4ecec26517e18eb)
- Godot RL Agents Plugin - Local version that should be similar to: edbeeching/godot_rl_agents_plugin#53.