Skip to content

Feat/data pipeline#5

Merged
noahkostesku merged 7 commits into
mainfrom
feat/data-pipeline
Mar 27, 2026
Merged

Feat/data pipeline#5
noahkostesku merged 7 commits into
mainfrom
feat/data-pipeline

Conversation

@Thomson-Lam
Copy link
Copy Markdown
Owner

@Thomson-Lam Thomson-Lam commented Mar 27, 2026

Changes

  • Updated data pipeline to ingest and clean the Albert Historical Wildfires Dataset instead of live data from CWFIS.
  • Removed FIRMS and CWFIS ingestion and kept CFFDRS ingestion.
  • Removed XGBoost model which was originally trained on synthetic data for fire spread rate and replaced with observed spread rate from the dataset.
  • Added a lightweight cleaning process in the data pipeline for simple vetting and reporting missing or invalid essential values for the environment params.

Validation:

Ran uv run python -m src.ingestion.static_dataset --target-count 50000 --cffdrs-year 2025 --raw-alberta-csv data/static/fp-historical-wildfire-data-2006-2025.csv and ingested ~2000 samples with some enriched with corresponding geographically approximated CFFDRS data.

Noted that missing CFFDRS data can introduce bias for the RL agent training and would make the paper harder to defend, so elected to drop the --cffdrs-year flag for ingesting the final frozen data for training in the README instructions for usage, and to just use the Alberta dataset instead, which already contains enough core signals. Updated docs/data-pipeline.md and README.md for usage and reflected changes.

@noahkostesku noahkostesku merged commit d33f502 into main Mar 27, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants