This project is an empirical RL benchmark for wildfire tactical suppression. We compare DQN, A2C, PPO, and random and greedy baselines on a 25x25 grid environment with critical assets and finite suppression budgets.
The physics informed environment and built environment records from the Alberta Historical Wildfires Database.
We ingested data from the Alberta Historical Wildfires Database, then built environment snapshots with a seeded random initialization for starting positions of fires and firefighter agents. We performed our first training run on the dataset directly from the data pipeline with only schema validation, and evaluated results inside notebooks/training_0_analysis.ipynb; we then extensively did a data audit (notebooks/data_audit.ipynb), cleaned the data (notebooks/clean_data.ipynb) and re-ran the full training process again. For both runs, we carried out the same training process:
- Overfitting on a single batch to check model architecture
- Smoke training run + seed behavior checks
- Smoke test evaluation check
- hyperparameter sweep on the validation set
- Full 5-seed training (trained seeds 11, 22, 33, 44, 55) for each model with hyperparameters from 4
We then ran the notebooks for analysis inside notebooks, notebooks/training_<NUMBER>_analysis.ipynb.
We did not include the 24 total models trained for both training runs in this GitHub repo, but we included our analysis of the results and our process in notebooks. To see our process and the results, please refer to the setup below, and open the notebooks and the relevant plots inside notebook to review our results directly. We have also included instructions to run the full data ingestion pipeline and training pipeline for reproducing. For more details, please refer to docs/ and the file tree below.
firebot/
├── README.md
├── pyproject.toml
├── uv.lock
├── ruff.toml
├── lefthook.yml
├── .env.example
├── .python-version
├── fp-historical-wildfire-data-dictionary-2006-2025.pdf # from the dataset download
├── .github/
│ └── workflows/
│ └── ci.yml
├── .ci-smoke/
│ ├── results.json
│ ├── scenario_parameter_records_seeded_train.json
│ ├── scenario_parameter_records_seeded_val.json
│ └── scenario_parameter_records_seeded_holdout.json
├── data/
│ └── static/
│ ├── fp-historical-wildfire-data-2006-2025.csv # raw Alberta historical wildfire CSV
│ ├── snapshot_records.json # full normalized snapshot records from raw CSV
│ ├── snapshot_records_train.json # train-year snapshot subset
│ ├── snapshot_records_val.json # validation-year snapshot subset
│ ├── snapshot_records_holdout.json # holdout-year snapshot subset
│ ├── scenario_parameter_records.json # full unseeded environment parameter records
│ ├── scenario_parameter_records_train.json # train split unseeded records
│ ├── scenario_parameter_records_val.json # validation split unseeded records
│ ├── scenario_parameter_records_holdout.json # holdout split unseeded records
│ ├── scenario_parameter_records_seeded.json # full seeded records with ignition/layout seeds
│ ├── scenario_parameter_records_seeded_train.json # train runtime records
│ ├── scenario_parameter_records_seeded_val.json # validation runtime records
│ └── scenario_parameter_records_seeded_holdout.json # temporal holdout runtime records
├── docs/
│ ├── data-pipeline.md
│ ├── envspec.md
│ └── training.md
├── src/
│ ├── __init__.py
│ ├── ingestion/
│ │ ├── __init__.py
│ │ ├── clean_historical.py # row cleaning and required-field checks
│ │ ├── cffdrs.py # CFFDRS station ingestion, not used
│ │ ├── weather.py # legacy Open-Meteo weather fetch helpers, not used
│ │ └── static_dataset.py # builds snapshot/scenario parameter records in data/static
│ └── models/ # environment, training, evaluation, and shared benchmark utilities
│ ├── __init__.py
│ ├── fire_env.py # WildfireEnv implementation and benchmark env construction helpers
│ ├── benchmarking.py # shared benchmark presets, rollout metrics, and aggregation functions
│ ├── train_rl_agent.py # unified PPO/A2C/DQN trainer with checkpoint and final evaluation outputs
│ └── evaluate_agents.py # classdef for PPO/A2C/DQN plus greedy/random baselines
├── scripts/
│ ├── canary.py # deterministic smoke/repro canary checks for trained artifacts
│ ├── run_benchmark_train.sh # legacy all-in-one bash runner (staged scripts are canonical)
│ ├── run_benchmark_train.ps1 # powershell equivalent
│ ├── run_benchmark_eval.sh # bash script for post-training benchmark evaluation by seed
│ ├── run_benchmark_eval.ps1 # powershell equivalent
│ └── stages/
│ ├── _common.sh # shared defaults, validation, and helper functions for staged runs
│ ├── 01_karpathy_overfit.sh # one-record overfit checks
│ ├── 02_smoke_train_and_repro.sh # smoke training + reproducibility canary
│ ├── 03_smoke_eval.sh # smoke checkpoint artifact loading + sanity eval
│ ├── 04_pilot_sweep.sh # one-seed validation-only pilot hyperparameter sweep
│ └── 05_final_train.sh # canonical full 5-seed training with frozen protocol
├── tests/
│ ├── conftest.py # test environment configuration
│ └── models/ # environment and benchmark metric contract tests
│ ├── test_fire_env_setup_contract.py # benchmark-mode env loading/split/schema contract tests
│ └── test_benchmarking_metrics.py # benchmark metric/preset/aggregation tests
├── notebooks/
│ ├── clean_data.ipynb # used for cleaning data
│ ├── data_audit.ipynb # data analysis and checks on the data
│ ├── training_0_analysis.ipynb # first training run analysis
│ ├── training_1_analysis.ipynb # second training run analysis
│ ├── final_results_table.png
│ ├── final_results_table_compact.png
│ └── training_val_plots.png
├── outputs/ # generated training and evaluation outputs (gitignored)
└── drd-archive/ # archived prototype code from the initial RL proposal before wildfire RL evaluation
├── main.py
└── src/
├── __init__.py
├── config.py
├── env.py
├── evaluate.py
├── networks.py
├── ppo.py
├── train.py
├── utils.py
└── viz.py
This project requires the uv package manager to run. After installing, please do the following:
- clone the repo
- in the project root, run:
uv venv && source .venv/bin/activate && uv sync
To run the notebooks or view them using the uv virtual environment:
source .venv/bin/activate # make sure the venv is active
cd notebooks
jupyter notebook Pre-commit hooks were used for the project for linting and checks. Install lefthook for local lint/format checks on commit:
# pick one
brew install lefthook
npm i -g lefthook
# then wire it up
lefthook installThe data pipeline now uses the Alberta historical wildfire dataset in data/static/ as its primary source.
Data sources:
- Alberta historical wildfire dataset: primary incident, weather, spread-rate, and assessment-time source
- CFFDRS: optional supplementary fire-danger enrichment
- CWFIS and FIRMS: retained in the repo for legacy/live experiments, not part of the canonical build path
Refer to docs/data-pipeline.md for exact fields and data we ingest.
We build the static dataset at src/ingestion/static_dataset.py. The script:
- loads historical incident rows from
data/static/fp-historical-wildfire-data-2006-2025.csv - normalizes them into snapshot records anchored at assessment time
- applies lightweight cleaning (strip blank strings and drop unusable rows with missing required assessment-time fields)
- optionally enriches with CFFDRS fields when
--cffdrs-yearis provided and usable - writes frozen and normalized
snapshot_records.jsonand split snapshot files indata/static - computes offline environment variables and writes
scenario_parameter_records.jsonplus split files indata/static. The environment variables written are:base_spread_probseverity_bucketwind_direction(8-direction string)wind_strengthignition_seedlayout_seed
- writes seeded benchmark variants (
scenario_parameter_records_seeded.jsonandscenario_parameter_records_seeded_{train|val|holdout}.json) for reproducible initialization; holdout seeded export is currently a single unique held-out record. - With the following extra fields stored:
spread_rate_1h_mspread_scoreweather_scorecffdrs_dryness_scorerain_factorsize_factorfire_type_factorfuel_factorobserved_spread_rate_m_minassessment_hectaresfire_typefuel_typerecord_quality_flag
NOTE: the stored extra fields are for checking whether the data pipeline is computing the primary metrics correctly, and checking why a record got a high/low spread setting. Their influence has already been collapsed into
base_spread_prob,severity_bucket, andwind_strengthfor the canonical environment. They are not directly consumed by the currentFireEnvdynamics to keep the initial benchmark simple and reduce overfitting/confounding risk.
For future improvements, consider using cffdrs_dryness_score to influence burnout probability, rain_factor to damp spread for the whole episode, and size_factor if we agree incident size should affect spread dynamics. For now, these remain audit fields rather than direct transition inputs.
Check docs/data-pipeline.md for how these variables are computed.
Alberta historical wildfire CSV -> normalized snapshot records -> scenario parameter records.
Each record is anchored at ASSESSMENT_DATETIME. The builder uses observed spread rate, assessment weather, incident size, fire type, and fuel type to compute benchmark environment variables. If --cffdrs-year is passed and a usable station file exists, the builder also joins supplementary CFFDRS danger indices by both distance and date.
fire record -> snapshot record (`data/static/snapshot_records.json`) -> scenario (environment) parameter record (`data/static/scenario_parameter_records.json`).
- CFFDRS is supplementary. If the requested year is sparse or unavailable, the builder still works without it.
- The raw Alberta CSV already contains the main weather and spread fields used for the benchmark.
For more details, check docs/data-pipeline.md
We run this command run to ingest our dataset (with a large cap to avoid split truncation):
uv run python -m src.ingestion.static_dataset --target-count 50000Or the following command for the full explicit paths:
uv run python -m src.ingestion.static_dataset --target-count 50000 --output-dir data/static/v1 --raw-alberta-csv data/static/raw/fp-historical-wildfire-data-2006-2025.csvWith default paths specified inside the data pipeline code: data/static/raw/ for where the source dataset is; data/static/v1 for where the initially compiled environment parameter records from the ingested data are
This command builds data from the CSV and generates initialization seeds for ignition and asset layout for the corresponding environment. CFFDRS was not used to reduce confounding variables and any bias introduced due to incomplete CFFDRS data ingested for some specific fires.
If CFFDRS for the selected year is sparse, the builder still runs and writes records without supplementary CFFDRS enrichment.
Optionally, test with a smaller target count:
uv run python -m src.ingestion.static_dataset --target-count 100 --cffdrs-year 2025 --raw-alberta-csv data/static/fp-historical-wildfire-data-2006-2025.csvIf you have your own normalized historical fire records JSON:
uv run python -m src.ingestion.static_dataset --fire-records path/to/fire_records.json --target-count 100For controlled and reproducible benchmark training, run the staged bash scripts in order.
Run from project root on macOS/Linux (bash):
./scripts/stages/01_karpathy_overfit.sh
./scripts/stages/02_smoke_train_and_repro.sh
./scripts/stages/03_smoke_eval.sh
./scripts/stages/04_pilot_sweep.sh
./scripts/stages/05_final_train.shRun from project root on Windows (PowerShell):
./scripts/run_benchmark_train.ps1Staged bash flow:
- Stage 1: single-record overfit checks (
ppo,a2c,dqn) - Stage 2: smoke training + reproducibility check
- Stage 3: smoke evaluation sanity check
- Stage 4: validation-only pilot sweeps and winner selection
- Stage 5: full canonical 5-seed training using frozen protocol values
Optional configurable fields that can be overridden for training:
ARTIFACT_ROOT(defaultoutputs/benchmark)SMOKE_TIMESTEPS(default20000, one canonical checkpoint interval)SMOKE_SEED(default11)SMOKE_EVAL_EPISODES(default5)RUN_REPRO_CANARY(default1)REPRO_CANARY_TOL(default1e-9)KARPATHY_TIMESTEPS(default10000)KARPATHY_SEED(default11)KARPATHY_FAMILY(defaultcenter,medium,A)KARPATHY_CHECKPOINT_EVAL_EPISODES(default1)PILOT_TIMESTEPS(default40000)PILOT_SEED(default11)RUN_KARPATHY_CHECK(default1)RUN_PILOT_SWEEP(default1)USE_PILOT_WINNERS(default1)FINAL_SEEDS_CSV(default11,22,33,44,55)ALGO_ORDER_CSV(defaultppo,a2c,dqn)
After Stage 5 completes, run benchmark evaluation wrappers with the commands below.
Run from project root on macOS/Linux (bash):
./scripts/run_benchmark_eval.shRun from project root on Windows (PowerShell):
./scripts/run_benchmark_eval.ps1These are the default values that can be overridden via env ars or editing the ps1 and .sh scripts.
ARTIFACT_ROOT(defaultoutputs/benchmark)RUN_LABEL(defaultfinal)EVAL_SEEDS_CSV(default11,22,33,44,55)EVAL_EPISODES(default100)AGENTS(defaultppo,a2c,dqn,greedy,random)OUTPUT_DIR(defaultoutputs/benchmark/<run_label>/eval)INCLUDE_FAMILY_HOLDOUT(0or1, default0)INCLUDE_TEMPORAL_HOLDOUT(0or1, default0)NO_NORMALIZED_BURN(0or1, default0)
The seeded scenario parameter files are the benchmark inputs for FireEnv training and script-driven evaluation.
The builder also writes year-based split files for the benchmark:
train:2006-2022val:2023holdout:2024-2025
The dataset builder prints cleaning/drop summaries to stdout and uses progress bars when tqdm is available.
The notebook details data analysis. After running, move outputs/ to training-outputs/ in the project root and launch the notebooks via jupyter notebook in notebooks/, to be able to run notebooks/training_*_analysis.ipynb. Run data_audit.ipynb after running the data pipeline once, and run clean_data.ipynb to reproduce data cleaning after ingestion. The training scripts inside stages/ use the cleaned data for training.