Skip to content

Ran Training Code #8

Merged
Thomson-Lam merged 8 commits into
mainfrom
training
Mar 31, 2026
Merged

Ran Training Code #8
Thomson-Lam merged 8 commits into
mainfrom
training

Conversation

@Thomson-Lam
Copy link
Copy Markdown
Owner

Changes

Ran training code (instructions to reproduce in README) for 5 seeds for DQN, PPO and A2C. Added a canary test for checking if seeds and config produced the same result to guarantee reproducibility, and tested model architecture soundness by overfitting on a single batch of data.

Produced and zipped the initial full run results from running scripts/run_benchmark_train.sh. Then split the full training pipeline scripts into 5 stages and preserved the original full run script for easier testing and reproducing results.

Next steps

See docs/planning/harden-training.md.

  • harden the training process to ensure that results produced have no bugs
  • NOTEBOOK do a deep data analysis and audit and reproduce the experiment against cleaner data, if possible. The Alberta historical dataset was assumed to be clean, so no deep cleaning was done except for vetting for basic valid values for the features used.
  • re-run training and ensure that the training has no leakage or overfitting for sure, because the current checks carries mixed signals when running both the benchmarks during training with the validation set before and after hyperparam search and after training with benchmark on the offline eval test set
  • NOTEBOOK Produce figures and plots for the final performance on the benchmark sets for each of the respective RL seeded agents. Report the metrics as planned in the paper and plot necessary graphs.

@Thomson-Lam Thomson-Lam merged commit cb5cd68 into main Mar 31, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant