I saw discussions about training in other issues, and I have run train and inference code successfully. Training code is mainly based on SFTTrainer and I think only next-token prediction loss is used. If I want to add cross entropy loss mentioned in the paper, what should I do?
I saw discussions about training in other issues, and I have run train and inference code successfully. Training code is mainly based on SFTTrainer and I think only next-token prediction loss is used. If I want to add cross entropy loss mentioned in the paper, what should I do?