Ran Training Code by Thomson-Lam · Pull Request #8 · Thomson-Lam/firebot-eval

Thomson-Lam · 2026-03-31T03:31:32Z

Changes

Ran training code (instructions to reproduce in README) for 5 seeds for DQN, PPO and A2C. Added a canary test for checking if seeds and config produced the same result to guarantee reproducibility, and tested model architecture soundness by overfitting on a single batch of data.

Produced and zipped the initial full run results from running scripts/run_benchmark_train.sh. Then split the full training pipeline scripts into 5 stages and preserved the original full run script for easier testing and reproducing results.

Next steps

See docs/planning/harden-training.md.

harden the training process to ensure that results produced have no bugs
NOTEBOOK do a deep data analysis and audit and reproduce the experiment against cleaner data, if possible. The Alberta historical dataset was assumed to be clean, so no deep cleaning was done except for vetting for basic valid values for the features used.
re-run training and ensure that the training has no leakage or overfitting for sure, because the current checks carries mixed signals when running both the benchmarks during training with the validation set before and after hyperparam search and after training with benchmark on the offline eval test set
NOTEBOOK Produce figures and plots for the final performance on the benchmark sets for each of the respective RL seeded agents. Report the metrics as planned in the paper and plot necessary graphs.

…ed benchmark configs

…non deterministic

Thomson-Lam added 8 commits March 30, 2026 21:59

feat: added overfit catch, hyperparam validation and usage and harden…

ddf7b89

…ed benchmark configs

updated docs on env severity level mismatch

00e741e

checkpoint: added notebooks for data health checks, canary check for …

2d677ed

…non deterministic

checkpoint: training; single case for overfit check; notebook audit

e4f2731

fix: sh/dash run syntax error

271be3b

feat: split sh scripts

7543b8b

zipped initial training outputs and model results

0b8e53a

added benchmark results for initial training run

bde34af

Thomson-Lam merged commit cb5cd68 into main Mar 31, 2026
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ran Training Code #8

Ran Training Code #8
Thomson-Lam merged 8 commits into
mainfrom
training

Thomson-Lam commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Thomson-Lam commented Mar 31, 2026

Changes

Next steps

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant