Skip to content

Ivan-267/2P-Box-Sorting-RL-Environment

Repository files navigation

2 Player Box Sorting Environment

A reinforcement learning environment made using Godot and Godot RL Agents. Both players are controlled using a single RL agent (it receives observations from both players, and sends actions for both players).

2p_coop_box_sort_env.mp4

How to test the env:

  1. Open the project in Godot Engine (made with 4.5.beta6.mono on Windows, may work with other versions too).
  2. Find the test scene (scenes/test_scene/testing_scene.tscn) and open it in the Godot Editor.
  3. Press 'F6' to start the scene.

You should see the trained agent solving the environment using onnx inference. If you wish to train your own agent, refer to the Godot RL Agents repository. Check the tutorial section to learn more.

Goal:

Two players must cooperate to push boxes into correct goals based on the category within a time limit. The boxes must not reach the limit (red area).

Observations:

The RL agent receives data from position sensors, raycast sensors, and current fraction of the allowed time.

Position sensor data (relative positions to objects):

First robot:

  • Box checkpoint (not visible, located at the opening between the two rooms) Second robot:
  • Green goal
  • Yellow goal
  • Box checkpoint
  • Collision shape of the limit (red area)

Raycast sensors:

There are multiple raycast sensors attached to each robot. They provide distances to the walls, as well as yellow and green boxes.

Each sensors only reacts with a single physics layer, so e.g. the wall raycast sensor only reports the distances to the walls, ignoring any occluding boxes.

Action space:

The env uses a multi-discrete action space. For each player, there is a single discrete action which determines the movement direction.

func get_action_space() -> Dictionary:
	return {
		"p1_move": {"size": move_dirs.size(), "action_type": "discrete"},
		"p2_move": {"size": move_dirs.size(), "action_type": "discrete"},
	}


var move_dirs = [
	Vector2.LEFT,
	Vector2.RIGHT,
	Vector2.UP,
	Vector2.DOWN,
	Vector2.LEFT + Vector2.UP,
	Vector2.RIGHT + Vector2.UP,
	Vector2.LEFT + Vector2.DOWN,
	Vector2.RIGHT + Vector2.DOWN,
]

Rewards:

  • +10 for each box entering the checkpoint
  • +10 for each box entering the correct goal
  • -1 for each box entering the limit
  • -1 for each box entering the wrong goal
  • -1 for timeout

Training settings:

Training was done with the SB3 example training script, modified to use different hyperparameters.

Modified lines include:

    model: PPO = PPO(
        "MultiInputPolicy",
        env,
        ent_coef=0.025,
        n_steps=1024,
        batch_size=1024 * env.num_envs,
        target_kl=0.006,
        n_epochs=40,
        vf_coef=0.1,
        tensorboard_log=args.experiment_dir,
        learning_rate=learning_rate,
    )

Training:

Training was started with the following CL args (excluding the arg used for checkpoint saving, also you will need to export the game to fill in the env path, do not run the command before filling in the correct executable path):

--env_path=PATH_TO_EXPORTED_EXECUTABLE
--n_parallel=4
--onnx_export_path=model.onnx
--timesteps=100_000_0000
--save_model_path=model.zip
--speedup=20
--experiment_name=2p_coop_box_sorting2

Training stats (rewards/success rates are during training, not eval): training reward training success rate

Versions used:

About

A 2P cooperative reinforcement learning environment made using Godot and Godot RL Agents.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors