Now, DPO gets ~32% MFU.#1720
Conversation
…d hybrid 7B DPO OLMo-core sweep script Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
faef93b to
ee387d6
Compare
There was a problem hiding this comment.
Code Review
This pull request introduces several updates to the DPO training pipeline, including passing a dynamic max_length to DataCollatorForSeq2SeqDPO when model compilation is enabled, simplifying the initialization of forward_kwargs, and cleaning up script arguments. However, the newly added packing parameter in separate_forward_olmo is currently unused, which could lead to silent failures or confusion. It is recommended to either implement the packing logic or raise a NotImplementedError if packing is enabled.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
I am having trouble creating individual review comments. Click here to see my feedback.
open_instruct/dpo_utils.py (1122)
The packing parameter is introduced but remains unused within the separate_forward_olmo function. This can be misleading, as it suggests that the function supports packing when it does not. To prevent silent failures and make the function's contract clear, please either implement the packing logic or raise a NotImplementedError at the beginning of the function if packing is True.
…k Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
No description provided.