From the web page of hf dataset, I did not find the reasoning content of a sample. Even though it seems the training process does not use the reasoning data at all since it only use format reward and accuracy reward, the reasoning content will still benefit the community, e.g., SFT on VLM.
From the web page of hf dataset, I did not find the reasoning content of a sample. Even though it seems the training process does not use the reasoning data at all since it only use format reward and accuracy reward, the reasoning content will still benefit the community, e.g., SFT on VLM.