Hi,
I want to reuse your experiment on MiniGrid as a benchmark to my paper on RL generalisation ... it fits nicely, but I am not clear how to replicate the experiment to generate the orange line on your paper, can you provide some insight ?
Are your running the training on 2 000 000 environments to generate the chart ?
Thanks a lot in advance.