Train an AI agent to land a spacecraft on the moon using Deep Q-Learning (DQN).
- About This Project
- What is Deep Q-Learning?
- Installation
- Training the Agent
- Watching the Agent Play
- Expected Training Time
This project teaches a neural network to land a spacecraft in the LunarLander-v3 environment from OpenAI Gymnasium. The agent learns entirely from trial and error!
State (8 numbers: position, velocity, angle, etc.)
|
v
Neural Network (128 -> 64 neurons)
|
v
Q-values for each action: [Nothing: 5.2, Left: 3.1, Main: 8.5, Right: 2.0]
|
v
Choose: FIRE MAIN ENGINE (highest Q-value)
|
v
Get Reward (+100 for landing, -100 for crash)
|
v
Update network using Bellman equation
| Concept | Simple Explanation |
|---|---|
| State | 8 numbers describing the lander (position, velocity, angle) |
| Action | What to do (nothing, fire left/main/right engine) |
| Reward | Points for good landings, penalties for crashes |
| Q-Value | Expected future score from an action |
| Epsilon | Chance of random action (exploration) |
| Gamma | How much to value future rewards (0.99) |
--
cd Desktop/ml/DQL2048 # or wherever you placed the projectmacOS/Linux:
python3 -m venv venv
source venv/bin/activatepip install -r requirements.txtThis installs:
- PyTorch: Neural network framework
- Gymnasium[box2d]: LunarLander environment
- NumPy: Numerical computing
- Matplotlib: Training visualization
Note: Box2D can be tricky on Windows. If installation fails:
pip install swig
pip install gymnasium[box2d]python "import torch; import gymnasium; print('Ready to train!')"# Make sure venv is activated!
python train.py# Basic training (1000 episodes)
python train.py
# Longer training
python train.py --episodes 2000
# Resume from checkpoint
python train.py --load checkpoints/dqn_lunarlander_latest.pt
# Watch training in real-time (slower but fun!)
python train.py --render
# Custom hyperparameters
python train.py --learning-rate 0.0005 --batch-size 32python visualize.py --model checkpoints/dqn_lunarlander_final.pt# Watch 5 episodes
python visualize.py -m checkpoints/dqn_lunarlander_final.pt -e 5
# Slower playback (better for learning)
python visualize.py -m checkpoints/dqn_lunarlander_final.pt -d 0.1
# No model? Watch random agent
python visualize.py --no-renderThe visualization shows:
- Real-time rendering of the lander
- State values (position, velocity, angle)
- Q-values for each action
- Chosen action highlighted
- Episode statistics (reward, steps)
python plot_training.py --metrics logs/training_metrics.json--
| Episodes | Time (approx.) | Expected Performance |
|---|---|---|
| 300 | 5-10 min | Starts improving, mostly crashes |
| 500 | 10-20 min | Sometimes lands, inconsistent |
| 800 | 20-35 min | Good landings, may solve! |
| 1000 | 30-45 min | Usually solved (avg reward >= 200) |
DQL2048/
├── network.py # Neural network (heavily commented!)
├── agent.py # DQN agent with replay buffer
├── train.py # Training script
├── visualize.py # Watch agent play
├── plot_training.py # Training visualization
├── config.py # Hyperparameters
├── requirements.txt # Dependencies
├── README.md # This file
├── checkpoints/ # Saved models
├── logs/ # Training logs
└── plots/ # Training plots
