Skip to content

StevenRice99/ML-Dungeon

Repository files navigation

ML-Dungeon

Teaching an agent to navigate randomly generated and randomly-sized dungeons with Unity ML-Agents. See a web demo.

Purpose

The purpose is this project is for use as a learning resources for Unity ML-Agents, highlighting how different methods can be applied to try and overcome a complex environment.

Game Overview

  • The agent is placed in a randomly-sized, fully-connected square dungeon.
  • Walls are randomly placed throughout the dungeon at a given percentage.
  • The agent always starts in one corner of the dungeon and a weapon chest is placed in a different corner.
  • If the agent reaches the chest, it will give them a sword.
  • This sword can be used to eliminate enemies in the dungeon, of which a given number will spawn in the corner opposite of the player.
  • Enemies will choose a random space in the dungeon to navigate to, and if they get within five units to the player and have line-of-sight on them, they will begin to follow the player.
  • If the player makes contact with an enemy while they do not have the sword, they lose.
  • If they have the sword, the enemy is eliminated.
  • All enemies must be eliminated to win. If there were no enemies in the dungeon to begin, reaching the weapon chest wins the game.

Agent Design

The agent's actions are simply movement along both the horizontal and vertical axes. In order to allow the agent to be able to play dungeons of all sizes, environment inputs had to be carefully crafted. The goal was to give as few inputs as possible, and ensure all inputs were normalized between [0, 1], with a few special cases giving readings in [-1, 1] which are noted below. The internal architecture of the agent's brain is two layers of 256 neurons.

  1. Agent position - Both the agent's previous and current positions in the dungeon are given along both the horizontal and vertical axes each in the range of [0, 1].
  2. Chest position - Position of the chest, or [-1, -1] if obtained. If the agent has already reached the chest and obtained the sword, these inputs are given as [-1, -1] to the agent instead of the position of the chest. This was done to instead of adding another boolean input in addition to the chest coordinates, as given the chest's position becomes irrelevant to the agent once the sword is obtained, this allows us to reduce the input size.
  3. Nearest enemy's position - Same as the agent's position but for the previous and current positions of the nearest enemy if there are any, or [-1, -1] if there are no more enemies in the level. This again was done to avoid needing to add another boolean input to signify the existence of any enemies.
  4. Local area map - A local visual encoding of the local area of the world for the agent to navigate around local obstacles. This encodes a square of the local area consisting the agent's current dungeon tile as well as ten tiles in each direction. This creates a 21×21 grid which is encoded as a visual tensor, utilizing the match3 Convolutional Neural Network (CNN) model based on the work "Human-Like Playtesting with Deep Learning" by Gudmundsoon et al. This CNN model was chosen as it "is a smaller CNN that can capture more granular spatial relationships and is optimized for board games", and the encoding we utilize is very efficient. The world is encoded across 3 channels to denote if the cell is walkable, contains an enemy, or contains the weapon pickup.

Agent Rewards

  • A reward of 1 is given for reaching the weapon pickup.
  • A reward of 1 is given for eliminating an enemy.
  • A penalty of -1 is given for being eliminated by an enemy.
  • A shaping reward of up to 0.1 is given for moving towards the current objective (the weapon pickup if not held, otherwise the nearest enemy), or a penalty for moving away.

Agent Training

The agent was trained with Proximal Policy Optimization (PPO), training curriculum, a curiosity reward signal to encourage exploration, and imitation learning, being both Behavioral Cloning (BC) and Generative Adversarial Imitation Learning (GAIL). The demonstrations for imitation learning were recorded using the heuristic agent. Both the heuristic agent and details on the demonstrations recorded are in their sections below.

Heuristic Agent

The heuristic agent evaluates the following criteria to make a decision each step until the level is complete or the agent loses by getting eliminated by an enemy.

  1. If the agent does not have the sword, it navigates towards the chest. This does not avoid enemies which may be between the agent and the chest. As such, during this case of the heuristic decision-making, a human operator may take control of the agent. They can move them either manually using the arrow keys or WASD, or navigating by right-clicking with the mouse, to demonstrate how to avoid enemies.
  2. Otherwise, the agent has the sword, so it navigates towards the nearest enemy.

All navigation is done by finding a path using A* on the navigation mesh of the dungeon, and then determining the needed inputs to move the agent towards the first point along the found path.

Demonstration Recording

The demonstration recording of the heuristic agent is done for a set number of trials across given dungeon parameters. A separate recording is made for each trial, with a recording being discarded in the event that the heuristic agent fails the level by being eliminated by an enemy. Demonstrations were run for a thousand trials, each which had to following configurations:

  • Size = [2, 20]
  • Walls = [0%, 20%]
  • Enemies = [0, 2]

Curriculum Learning

There were three levels to the training, each allowing for more complex levels. To ensure generalization, agents would also be tested the lower-complexity levels in when in higher curriculums. The different curriculum levels were:

  1. Easy
    • Size = [2, 15]
    • Walls = [0%, 10%]
    • Enemies = 0
  2. Medium
    • Size = [10, 20]
    • Walls = [0%, 20%]
    • Enemies = [0, 1]
  3. Hard
    • Size = [10, 20]
    • Walls = [0%, 20%]
    • Enemies = [0, 2]

Results

  • The agent demonstrates some basic obstacle avoidance, but struggles with more complex levels and navigation.
  • The agent is not great at avoiding enemies prior to grabbing its weapon.

Running

If you just wish to see the agent in action, you can run the web demo which allows you to change the size of the dungeon, percentage of walls, and number of enemies.

Unity Editor

To run the project in the Unity editor, there are several scenes:

  • Main - The same scene as the web demo.
  • Recording - The scene to perform the demonstration recording. These are saved to the Demonstrations folder in the Assets folder. If you wish to create new recordings, you will need to delete the existing recordings in the Demonstrations folder, or set for more trials.
  • Training - A scene to train multiple instances of the agent in parallel. If you wish to train the agents, you will need to follow the run training instructions.

Run Training

To train the agent, you can either read the Unity ML-Agents documentation to learn how to install and run Unity ML-Agents, or use the provided helper functions to train the agent.

Helper Functions

The helper files have been made for Windows and you must install uv. One installed, from the top menu of the Unity editor, you can select ML-Dungeon followed by the desired command to run.

  • Train - Run training.
  • TensorBoard - This will open your browser to see the TensorBoard logs of the training of all models.
  • Install - If you have uv installed for Python, this will set up your environment for running all other commands. Note: This assumes you have NVIDIA CUDA support. You will need to remove the --index-url https://download.pytorch.org/whl/cu121 from the PyTorch installation line if you do not have an NVIDIA GPU with CUDA support.
  • Activate - This will open a terminal in your uv Python virtual environment for this project, allowing you to run other commands.

Resources

Assets are from the Mini Dungeon kit by Kenney under the Creative Commons CC0 license.