This project explores the application of Reinforcement Learning (RL) to the game of Video Poker. Video Poker presents a unique challenge: it is a game of chance, but it also has a mathematically solvable optimal strategy. This makes it an ideal environment to benchmark the performance of an RL agent against a perfect player.
The core of this project is a comparison between two agents:
-
An Optimal Agent: This agent uses a pre-computed, brute-force solution that represents the perfect, mathematically optimal strategy for every possible hand. It serves as the "ground truth" for the best possible performance.
-
A Deep Q-Network (DQN) Agent: This is a reinforcement learning agent that learns to play the game from scratch, with no prior knowledge of the rules or strategy. Its goal is to learn a policy that maximizes its score through trial and error.
By training the DQN agent and comparing its performance to the optimal agent, this project aims to answer the question: How close can a model-free reinforcement learning agent get to the perfect strategy in a complex, solvable game?
This repository allows for training the DQN agent, evaluating both agents, and even playing the game yourself to see how your decisions stack up against an AI or a Perfect Agent.
- Create and activate conda environment:
conda create -n MLSec_v0 python=3.8
conda activate MLSec_v0- Install dependencies:
pip install torch numpy matplotlib tqdm gym tensorboardYou can play Video Poker interactively:
python video_poker.pyThe game will:
- Deal 5 cards
- Let you choose which cards to hold/discard
- Show the optimal play for comparison
- Calculate your final hand value and payout
Train a Deep Q-Network agent with customizable parameters:
python train_agent.py \
--episodes 2000 \
--max-steps 100 \
--eps-start 1.0 \
--eps-end 0.01 \
--eps-decay 0.995 \
--checkpoint-freq 1000 \
--model-dir models \
--log-dir runs/video_poker \
--decay-type exponential \
--decay-percent 80Key parameters:
--episodes: Total training episodes--max-steps: Maximum steps per episode--eps-start: Starting exploration rate--eps-end: Minimum exploration rate--decay-type: Choose between 'exponential' or 'linear' epsilon decay--decay-percent: For linear decay, percentage of episodes over which to decay epsilon
Monitor training with TensorBoard:
tensorboard --logdir=runs/video_pokerThe training process automatically saves:
- Regular checkpoints (based on
--checkpoint-freq) - The best model (based on 100-episode moving average score)
- The final model
The optimal agent uses a pre-computed solution dictionary to make perfect decisions:
python play_with_optimal_agent.py --num-games 10000This will:
- Run the optimal agent for the specified number of games
- Generate a histogram of scores
- Calculate mean score and standard deviation
- The solution dictionary is cached for faster loading in subsequent runs
Evaluate your trained DQN agent in two modes:
# Interactive mode - play and see agent decisions
python play_with_dqn_agent.py --model best_model.pth --mode interactive
# Compare mode - compare with optimal agent
python play_with_dqn_agent.py --model best_model.pth --mode compare --num-games 10000The comparison will show:
- Performance statistics for both agents
- Differences in decision-making
- Overall score comparison
You can play interactively while seeing recommendations from both the DQN and optimal agents:
python play_with_dqn_agent.py --model best_model.pth --mode interactiveThis allows you to:
- See your cards and make decisions
- View what the DQN agent would do
- View what the optimal agent would do
poker_classes.py: Core poker game logic (cards, hands, deck)video_poker.py: Main game implementation with human play interfacepoker_env.py: Gym environment wrapper for reinforcement learningdqn_agent.py: Deep Q-Network implementation with experience replaytrain_agent.py: Training script with TensorBoard loggingplay_with_optimal_agent.py: Evaluation script for the optimal agentplay_with_dqn_agent.py: Evaluation script for the DQN agentsolution_loader.py: Optimized loader for the brute-force solutionsbrute_force_solving.py: Script to generate optimal solutions
The DQN agent learns to approximate the optimal strategy through experience. While it may not match the perfect play of the brute-force agent, it demonstrates how reinforcement learning can be applied to complex decision problems.
The optimal agent achieves a mean score of approximately 26 points per game, which represents the theoretical maximum expected value for this variant of Video Poker.
The DQN agent, after training, achieves a mean score of approximately 22 points per game, which is lower than the optimal agent but still demonstrates the power of reinforcement learning in approximating optimal strategies.
The folder media contains the following files:
OptimalAgent_rewards_distribution.png: Histogram of scores for the optimal agentDQN_rewards_distribution.png: Histogram of scores for the DQN agent

