First of all, thank you for your work. I noticed that a large number of training sets are defined in both SFT and RL, but some of them are commented out. I would like to know which training data should be used to reproduce the results reported in your paper.
In addition, the part related to the cosine decay schedule in SFT has also been removed. Do I need to implement it to obtain the desired results?
First of all, thank you for your work. I noticed that a large number of training sets are defined in both SFT and RL, but some of them are commented out. I would like to know which training data should be used to reproduce the results reported in your paper.
In addition, the part related to the cosine decay schedule in SFT has also been removed. Do I need to implement it to obtain the desired results?