Feat: Environment Design & Implementation#6
Merged
Conversation
…gested records, sampling record strategy
…ds, holdout validation with no sampling with replacement
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes
severity_bucketonly with seed stable record sampling, 4) hardened train test val model and fixed sampling with replacement risk for validation holdout to ensure isolated validation set.ignition, and layout of firefighter assets. Added an extra step to randomly generate a fixed seed for the starting configuration of each environment for every param records from the data pipeline, so the process is nowingestion -> data vetting (lightweight cleaning) -> randomly seed and fix ignitions and layout-> environment loads config with fully reproducible data, now stored as*_seed.json.Please re-run the data pipeline according to the command in
README.md, which is the same as before.Other changes:
Updated
.github/workflows/ci.ymlto replace the outdated test that only checks default constructor boilerplate behavior with a test to generate small boilerplate data and runevaluate_agentsto check whether the environment, splits and outputs work.Validation
Ran the tests for all of the changes implemented and all 13 tests behaved as expected.
Check:
docs/env-checklist.mdfor changes made to the environment,README.mdfor general instructions for running the data pipeline.docs/envspec.mdfor full details on environment design.docs/data-pipeline.mdfor full documentation on how the data pipeline currently works with the new seeding step.NOTE:
docs/envspec.mdspecifies some metrics and the goals to report along with diagnostics essential for training that needs to be implemented in the respective training code in sections 7 and 8. The doc currently only documents the training for what has been implemented, which is PPO. Please refer to the sections in this doc when implementing training code to ensure consistent metrics reporting and proper training procedure aligned with the environment's specifications., and discuss what plots are needed for the paper.