Dears,
Thanks for sharing your great work!
I have two questions regarding the training details:
- It is mentioned in the running script "run_grpo.sh" that we have to determine a certain GPU for the reward. However, I don't see any part of the code that does so. Can you please point out the part that splits the Janus model on X GPUs, then the reward models on a separate GPU?
- The training time: May I know how long it takes to train the model? From the configuration, it seems you have only trained for 1600 steps, isn't it? I am wondering if it is enough to capture new skills using only 1600 steps, while even the batch size is set to 1? I feel like this number is too small; thus, it would be appreciated if you could elaborate more about the training epochs or steps.
Thanks again for sharing your great work!
Dears,
Thanks for sharing your great work!
I have two questions regarding the training details:
Thanks again for sharing your great work!