Skip to content

A simple reinforcement learning example using tabular Q-learning to teach an agent to navigate a 5×5 grid from the top-left corner to the bottom-right corner, avoiding obstacles along the way.

License

Notifications You must be signed in to change notification settings

commanderfun/gridworld

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Grid World

A simple reinforcement learning example using tabular Q-learning to teach an agent to navigate a 5×5 grid from the top-left corner to the bottom-right corner, avoiding obstacles along the way.

train.mp4

How It Works

The agent (🚘) starts at cell 0 and must reach the goal (🍺) at cell 24. Two obstacles (☠️) block cells 8 and 12. The agent learns by trial and error — exploring randomly at first, then gradually exploiting what it has learned.

  • +10.0 reward for reaching the goal
  • -1.0 penalty for hitting a wall or obstacle
  • -0.01 per-step cost to encourage finding short paths

Each cell displays its current Q-value — the agent's learned estimate of how much future reward it can expect from that position. Higher Q-values (like 9.354 near the goal) indicate the agent has learned those cells are close to the reward.

Setup

You need Python 3.9+ with numpy and jupyter. Choose one of the options below.

Option A: conda

conda create -n gridworld python=3.11 numpy jupyter -y
conda activate gridworld

Option B: uv

uv venv --python 3.11
source .venv/bin/activate
uv pip install numpy jupyter

Running the Notebook

Launch Jupyter:

jupyter notebook gridworld.ipynb

The notebook has four cells. Run them in order.

Cell 1 — Enable Auto-Reload

%load_ext autoreload
%autoreload 2

This makes Jupyter automatically pick up any changes you make to gridworld.py without restarting the kernel.

Cell 2 — Import

from gridworld import GridWorld, QLearner, train, test_policy

Imports the environment (GridWorld), the agent (QLearner), and the two main functions.

Cell 3 — Train the Agent

env = GridWorld(size=5)

agent = QLearner(
    n_states=env.n_states,
    n_actions=env.n_actions,
    learning_rate=0.1,
    discount=0.9,
    epsilon=1.0
)

train(env, agent, episodes=50, max_steps=100, render=True)

This creates the 5×5 grid and a Q-learning agent, then trains for 50 episodes. With render=True the grid animates live in the notebook — you'll see the agent stumbling around at first, then gradually finding shorter paths as the Q-values converge.

Parameters you can tweak:

Parameter Default Effect
size 5 Grid dimensions (size × size)
learning_rate 0.1 How fast Q-values update toward new information
discount 0.9 How much the agent values future vs. immediate rewards
epsilon 1.0 Starting exploration rate (decays linearly to 0.01)
episodes 50 Number of training episodes
max_steps 100 Step limit per episode (prevents infinite loops)

After training, the grid should look something like this — Q-values are highest near the goal and decrease with distance:

Training output

Cell 4 — Test the Learned Policy

path, total_reward = test_policy(env, agent, render=True)
print("Policy path: [(state, action)] = ", path)

Runs a single episode with exploration disabled (epsilon=0) so the agent always picks its best-known action. The grid shows directional arrows tracing the greedy path from start to goal:

Test policy output

The output also prints the path as a list of (state, action) pairs, where actions map to: 0=down, 1=up, 2=left, 3=right.

test_policy.mp4

Project Structure

gridworld.py        # GridWorld environment, QLearner agent, train/test functions
gridworld.ipynb     # Interactive notebook to run everything
images/             # Screenshots for this README
video/              # Screen recordings of training and test runs

About

A simple reinforcement learning example using tabular Q-learning to teach an agent to navigate a 5×5 grid from the top-left corner to the bottom-right corner, avoiding obstacles along the way.

Topics

Resources

License

Stars

Watchers

Forks