In order to get the best performance, multiple seeds / training runs should be kicked off in parallel with the best performing agents "surviving" and being fine-tuned in a sort of evolutionary fashion. The ExperimentGrid is a nice way to do this on one machine and manually picked hyperparams. However, more automated techniques like population based training could most likely achieve even better performance.