Training Efficiency

Dears,

Thanks for sharing your great work!

I have two questions regarding the training details:

1. It is mentioned in the running script "run_grpo.sh" that we have to determine a certain GPU for the reward. However, I don't see any part of the code that does so. Can you please point out the part that splits the Janus model on X GPUs, then the reward models on a separate GPU?
2. The training time: May I know how long it takes to train the model? From the configuration, it seems you have only trained for 1600 steps, isn't it? I am wondering if it is enough to capture new skills using only 1600 steps, while even the batch size is set to 1? I feel like this number is too small; thus, it would be appreciated if you could elaborate more about the training epochs or steps.

Thanks again for sharing your great work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Efficiency #9

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Training Efficiency #9

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions