Feat: Environment Design & Implementation by Thomson-Lam · Pull Request #6 · Thomson-Lam/firebot-eval

Thomson-Lam · 2026-03-28T04:34:51Z

Changes

1. Implemented new environmental design with the new data pipeline schema instead of getting live data from CWFIS and computing an approximated spread rate based on the simplified Rothermel spread fire formula, and 2) added type safety checks for the loaded JSON record data.
1. Replaced selecting cached records by severity_bucket only with seed stable record sampling, 4) hardened train test val model and fixed sampling with replacement risk for validation holdout to ensure isolated validation set.
1. removed all live ingestion code and definitions and 6) added tests for schema validation, asserting benchmark mode fails if no records are found, checking whether environment params match dataset values, split isolation, and 7) cleaned up live ingestion variables
IMPORTANT 8) noticed that data ingestion did not account for environment initialization for both flames ignited, ignition, and layout of firefighter assets. Added an extra step to randomly generate a fixed seed for the starting configuration of each environment for every param records from the data pipeline, so the process is now ingestion -> data vetting (lightweight cleaning) -> randomly seed and fix ignitions and layout -> environment loads config with fully reproducible data, now stored as *_seed.json.

Please re-run the data pipeline according to the command in README.md, which is the same as before.

Other changes:

Updated .github/workflows/ci.yml to replace the outdated test that only checks default constructor boilerplate behavior with a test to generate small boilerplate data and run evaluate_agents to check whether the environment, splits and outputs work.

Validation

Ran the tests for all of the changes implemented and all 13 tests behaved as expected.

Check:

docs/env-checklist.md for changes made to the environment, README.md for general instructions for running the data pipeline.
docs/envspec.md for full details on environment design.
docs/data-pipeline.md for full documentation on how the data pipeline currently works with the new seeding step.

NOTE: docs/envspec.md specifies some metrics and the goals to report along with diagnostics essential for training that needs to be implemented in the respective training code in sections 7 and 8. The doc currently only documents the training for what has been implemented, which is PPO. Please refer to the sections in this doc when implementing training code to ensure consistent metrics reporting and proper training procedure aligned with the environment's specifications., and discuss what plots are needed for the paper.

…gested records, sampling record strategy

…ec contract

…r env creation

…ds, holdout validation with no sampling with replacement

Thomson-Lam added 10 commits March 27, 2026 22:44

feat: cleaned env checklist, benchmark mode, schema validation for in…

7dfbd3d

…gested records, sampling record strategy

feat: removed fallback; hardened pipeline to environment variables sp…

2460951

…ec contract

feat: concrete env construction and guardrails for data splits

d650702

fix: PPO training setup on offline dataset

07245ac

feat: schema, splits and data validation tests

ffee687

chore: cleaned all heuristic code; added benchmark helper function fo…

ea1171e

…r env creation

feat: added random seed in env init for reproducibility + tests

dbd0329

fix: wind direction inconsistency bug, more reproducible env init see…

1cf9fb0

…ds, holdout validation with no sampling with replacement

chore: updated docs and envspec

0c1bde8

chore: better CI tests

038f4b0

noahkostesku merged commit f2d1078 into main Mar 28, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Environment Design & Implementation#6

Feat: Environment Design & Implementation#6
noahkostesku merged 10 commits into
mainfrom
feat/environment

Thomson-Lam commented Mar 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Thomson-Lam commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Validation

Check:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Thomson-Lam commented Mar 28, 2026 •

edited

Loading