Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ wheels/
.env

data/
outputs/

#logs
wandb
Expand Down
124 changes: 112 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,68 @@

Empirical RL benchmark for wildfire tactical suppression. Compares DQN, A2C, PPO, and heuristic baselines on a 25x25 grid environment with critical assets and finite suppression budgets.

The physics informed environment and built environment records from the [Alberta Historical Wildfires Database](https://open.alberta.ca/opendata/wildfire-data).

## Project Tree

```text
firebot/
├── README.md
├── pyproject.toml
├── uv.lock
├── ruff.toml
├── lefthook.yml
├── fp-historical-wildfire-data-dictionary-2006-2025.pdf # from the dataset download
├── data/
│ └── static/
│ ├── fp-historical-wildfire-data-2006-2025.csv # raw Alberta historical wildfire CSV
│ ├── snapshot_records.json # full normalized snapshot records from raw CSV
│ ├── snapshot_records_train.json # train-year snapshot subset
│ ├── snapshot_records_val.json # validation-year snapshot subset
│ ├── snapshot_records_holdout.json # holdout-year snapshot subset
│ ├── scenario_parameter_records.json # full unseeded environment parameter records
│ ├── scenario_parameter_records_train.json # train split unseeded records
│ ├── scenario_parameter_records_val.json # validation split unseeded records
│ ├── scenario_parameter_records_holdout.json # holdout split unseeded records
│ ├── scenario_parameter_records_seeded.json # full seeded records with ignition/layout seeds
│ ├── scenario_parameter_records_seeded_train.json # train runtime records
│ ├── scenario_parameter_records_seeded_val.json # validation runtime records
│ └── scenario_parameter_records_seeded_holdout.json # temporal holdout runtime records
├── docs/
│ ├── data-pipeline.md
│ ├── envspec.md
│ └── planning/
│ ├── env-checklist.md
│ ├── impl-plan.md
│ └── train-plan.md
├── src/
│ ├── __init__.py
│ ├── ingestion/
│ │ ├── __init__.py
│ │ ├── clean_historical.py # row cleaning and required-field checks
│ │ ├── cffdrs.py # CFFDRS station ingestion, not used
│ │ ├── weather.py # legacy Open-Meteo weather fetch helpers, not used
│ │ └── static_dataset.py # builds snapshot/scenario parameter records in data/static
│ └── models/ # environment, training, evaluation, and shared benchmark utilities
│ ├── __init__.py
│ ├── fire_env.py # WildfireEnv implementation and benchmark env construction helpers
│ ├── benchmarking.py # shared benchmark presets, rollout metrics, and aggregation functions
│ ├── train_rl_agent.py # unified PPO/A2C/DQN trainer with checkpoint and final evaluation artifacts
│ └── evaluate_agents.py # classdef for PPO/A2C/DQN plus greedy/random baselines
├── scripts/
│ ├── run_benchmark_train.sh # bash script for smoke validation then full 5-seed benchmark training
│ ├── run_benchmark_train.ps1 # powershell equivalent
│ ├── run_benchmark_eval.sh # bash script for post-training benchmark evaluation by seed
│ └── run_benchmark_eval.ps1 # powershell equivalent
├── tests/
│ ├── conftest.py
│ └── models/ # environment and benchmark metric contract tests
│ ├── test_fire_env_setup_contract.py # benchmark-mode env loading/split/schema contract tests
│ └── test_benchmarking_metrics.py # benchmark metric/preset/aggregation tests
├── outputs/ # generated training and evaluation artifacts (gitignored)
└── drd-archive/ # archived prototype code from the earlier DRD proposal
```

## Setup

Requirements: [uv](https://docs.astral.sh/uv/getting-started/installation/)
Expand Down Expand Up @@ -114,30 +176,68 @@ uv run python -m src.ingestion.static_dataset --fire-records path/to/fire_record

### Training

After building the dataset, you can train by running:
For controlled and reproducible benchmark training, use the script wrappers in `scripts/`.

Run from project root on macOS/Linux (bash):

```bash
uv run python -m src.models.train_rl_agent --scenario-dataset data/static/scenario_parameter_records_seeded_train.json --val-dataset data/static/scenario_parameter_records_seeded_val.json --holdout-dataset data/static/scenario_parameter_records_seeded_holdout.json
./scripts/run_benchmark_train.sh
```

The seeded scenario parameter files are the canonical benchmark inputs for `FireEnv` and PPO training.
Run from project root on Windows (PowerShell):

The builder also writes year-based split files for the benchmark:
```powershell
./scripts/run_benchmark_train.ps1
```

- `train`: `2006-2022`
- `val`: `2023`
- `holdout`: `2024-2025`
Script runs:

- Stage 1 (smoke): runs short validation training for `ppo`, `a2c`, `dqn` on one seed
- Stage 2 (smoke eval): loads smoke `best_model.zip` artifacts and runs evaluator sanity checks
- Stage 3 (formal): runs full canonical training for all three algorithms across 5 seeds (`11,22,33,44,55`)
- Uses artifact root `outputs/benchmark/` and keeps default trainer settings for env count, timesteps, and checkpoint cadence on formal runs

Training script environment overrides (optional):

- `ARTIFACT_ROOT` (default `outputs/benchmark`)
- `SMOKE_TIMESTEPS` (default `20000`, one canonical checkpoint interval)
- `SMOKE_SEED` (default `11`)
- `SMOKE_EVAL_EPISODES` (default `5`)
- `FINAL_SEEDS_CSV` (default `11,22,33,44,55`)
- `ALGO_ORDER_CSV` (default `ppo,a2c,dqn`)

Training command:
After `run_benchmark_train` completes, run benchmark evaluation wrappers.

Run from project root on macOS/Linux (bash):

```bash
uv run python -m src.models.train_rl_agent --scenario-dataset data/static/scenario_parameter_records_seeded_train.json --val-dataset data/static/scenario_parameter_records_seeded_val.json --holdout-dataset data/static/scenario_parameter_records_seeded_holdout.json
./scripts/run_benchmark_eval.sh
```

General split benchmark evaluation (PPO + baselines):
Run from project root on Windows (PowerShell):

```bash
uv run python -m src.models.evaluate_agents --agents ppo,greedy,random --train-dataset data/static/scenario_parameter_records_seeded_train.json --val-dataset data/static/scenario_parameter_records_seeded_val.json --holdout-dataset data/static/scenario_parameter_records_seeded_holdout.json --episodes 20 --seeds 42,43,44
```powershell
./scripts/run_benchmark_eval.ps1
```

These are the default values that can be overridden via env ars or editing the `ps1` and `.sh` scripts.

- `ARTIFACT_ROOT` (default `outputs/benchmark`)
- `RUN_LABEL` (default `final`)
- `EVAL_SEEDS_CSV` (default `11,22,33,44,55`)
- `EVAL_EPISODES` (default `100`)
- `AGENTS` (default `ppo,a2c,dqn,greedy,random`)
- `OUTPUT_DIR` (default `outputs/benchmark/<run_label>/eval`)
- `INCLUDE_FAMILY_HOLDOUT` (`0` or `1`, default `0`)
- `INCLUDE_TEMPORAL_HOLDOUT` (`0` or `1`, default `0`)
- `NO_NORMALIZED_BURN` (`0` or `1`, default `0`)

The seeded scenario parameter files are the benchmark inputs for `FireEnv` training and script-driven evaluation.

The builder also writes year-based split files for the benchmark:

- `train`: `2006-2022`
- `val`: `2023`
- `holdout`: `2024-2025`

The dataset builder prints cleaning/drop summaries to stdout and uses progress bars when `tqdm` is available.
104 changes: 87 additions & 17 deletions docs/envspec.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
# Fire Environment Specification (Current Implementation)
# Fire Environment Specification

This document describes the currently implemented RL environment in `src/models/fire_env.py` and its training/evaluation usage in `src/models/train_rl_agent.py` and `src/models/evaluate_agents.py`.
This document describes the RL environment in `src/models/fire_env.py` and its training/evaluation usage in `src/models/train_rl_agent.py` and `src/models/evaluate_agents.py`.

It is intentionally code-first and only documents behavior that exists in the current codebase.
It remains code-first for implemented behavior, but it also records the benchmark-alignment requirements that training and evaluation code must satisfy before full canonical runs are launched.

The concrete benchmark execution plan lives in `docs/planning/train-plan.md`.

---

Expand Down Expand Up @@ -174,35 +176,61 @@ Optimization intent:

---

## 7) Training Process (Current Code)
## 7) Training Process

Current training script:

- `src/models/train_rl_agent.py`
- algorithm currently implemented in this script: `PPO` (Stable-Baselines3)
- currently implemented learned method: `PPO` (Stable-Baselines3)

Canonical training flow:
Current implemented flow:

1. load seeded train split dataset
2. create vectorized benchmark envs (`n_envs`)
3. train PPO for configured timesteps
4. save model to `src/models/tactical_ppo_agent.zip`
5. run quick evaluation on train and optional val/holdout datasets

Benchmark-aligned target flow:

1. use a unified runner with `--algo {ppo,a2c,dqn}`
2. keep the same benchmark-mode dataset path and split validation for all methods
3. use vectorized envs for `PPO` and `A2C`
4. use a single benchmark env for `DQN` by default
5. write checkpoint metrics every fixed training interval
6. save per-run config and per-checkpoint metrics to disk
7. choose the best checkpoint by validation `asset_survival_rate`
8. run final split-wise evaluation on train/val/holdout after training completes

Current benchmark evaluation script:

- `src/models/evaluate_agents.py`
- evaluates agents across splits (`train`, `val`, `holdout`)
- supported evaluated agents: `ppo`, `greedy`, `random`
- currently implemented evaluated agents: `ppo`, `greedy`, `random`
- can output JSON summary via `--output`

Benchmark-aligned target support:

- `ppo`
- `a2c`
- `dqn`
- `greedy`
- `random`

Transparency outputs from current code:

- training console output: timesteps, env count, dataset path/count, quick split metrics
- model artifact: `tactical_ppo_agent.zip`
- evaluation console summary per split/agent
- optional evaluation JSON with aggregate metrics

Required benchmark transparency outputs:

- serialized run config per seed
- checkpoint metrics on train/val/holdout
- best-checkpoint selection record
- final evaluation JSON aggregated by seed, then across seeds

Recommended transparency plots (from saved eval JSON/logs):

- split-wise mean return (`train` vs `val` vs `holdout`)
Expand All @@ -212,24 +240,66 @@ Recommended transparency plots (from saved eval JSON/logs):

---

## 8) Reporting Metrics
## 8) Reporting Metrics

Primary optimization target/what the agent is trained to do: **Minimize assets damaged/lost**
Primary optimization target: **Minimize assets damaged/lost**

Additional reported metrics (already computed or directly derivable from current eval):
Frozen benchmark metric definitions:

- mean episodic return
- standard deviation / variance across episodes
- asset survival rate
- containment success rate
- mean final burned area
- mean time to containment
- mean resource efficiency
- mean burned-area fraction: `(burned + burning + asset_burned) / 625`
- mean time to containment, conditioned on successful containment only
- mean resource efficiency: `successful_deployments / total_deployments`
- standard deviation across seeds for each reported metric
- wasted deployment rate
- mean normalized burn ratio (optional in evaluator)

Report these diagnostics during training:
Important alignment notes:

- pooled episode variance is not a substitute for seed-level aggregation
- raw burned-cell count can still be logged, but the benchmark-facing metric should be the normalized burned-area fraction
- holdout performance is for final reporting only and must not be used for tuning
- the current temporal holdout dataset has only one unique seeded record, so it is a final diagnostic only until expanded

Report these diagnostics during training checkpoints:

- train/val/holdout gap for each metric
- train/val gap for each metric
- optional train/family-holdout gap for each metric
- per-seed summary tables
- baseline comparisons (`greedy`, `random`) against PPO
- baseline comparisons (`greedy`, `random`) against learned methods

## 9) Environment-Side Requirements For Benchmark Alignment

The environment and evaluator together must expose enough information to compute the benchmark metrics exactly.

Environment-side counters or `info` fields required for clean evaluation:

- `assets_lost`
- `step`
- `heli_left`
- `crew_left`
- count of successful helicopter deployments
- count of successful crew deployments
- count of wasted deployment attempts
- count of total deployment attempts

Operational metric rules:

- `mean_resource_efficiency = successful_deployments / total_deployments`
- if `total_deployments == 0`, report `0.0`
- `wasted_deployment_rate = wasted_deployments / total_deployment_attempts`
- if `total_deployment_attempts == 0`, report `0.0`

Evaluator-side aggregation requirements:

- aggregate per seed first, then summarize across seeds
- compute `time_to_containment` only on contained episodes
- compute normalized burn ratio against the same scenario record and evaluation seed under the deterministic non-intervention baseline defined in `docs/planning/train-plan.md`
- do not surface temporal holdout metrics during checkpoint evaluation in canonical runs
- pass `scenario_families` explicitly for canonical train, validation, and family-holdout runs rather than relying on the environment default

Verification requirement before full runs:

- all of the above metrics must appear in a short smoke-test evaluation artifact before any full 5-seed training sweep is launched
Loading
Loading