Skip to content

Added training code for PPO, A2C, DQN and random and greedy baselines #7

Merged
noahkostesku merged 2 commits into
mainfrom
feat/training
Mar 30, 2026
Merged

Added training code for PPO, A2C, DQN and random and greedy baselines #7
noahkostesku merged 2 commits into
mainfrom
feat/training

Conversation

@Thomson-Lam
Copy link
Copy Markdown
Owner

@Thomson-Lam Thomson-Lam commented Mar 30, 2026

Changes

  • added src/models/benchmarking.py for core metric tracking used for training for all algos
  • edited src/models/evaluate_agents.py as offline RL agents evaluation after training
  • added a no action baseline policy inside src/models/benchmarking.py that does not suppress fires as the baseline to compare against
  • edited src/models/fire_env.py to support the metrics needed according to the proposal

Main metrics:
mean_return: mean episodic return
asset_survival_rate: fraction of episodes with assets_lost == 0
containment_success_rate: fraction of episodes where the fire is extinguished before truncation
mean_burned_area_fraction: final burned-area fraction per episode, (burned + burning + asset_burned cells) / 625
std_across_seeds: standard deviation of the seed-level metric means

  • made a single interface for training all RL algos, src/models/train_rl_agent.py for argparse CLI usage
  • wrapped the CLI commands for training RL agents and evaluating them in Powershell and Bash inside scripts
  • initialized 5 fixed seeds (11, 22, 33, 44, 55) for initializing models and evaluation order for holdout environments for benchmarking
  • updated the README and docs for usage and details

@noahkostesku noahkostesku merged commit 13501ba into main Mar 30, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants