Now, DPO gets ~32% MFU. by finbarrtimbers · Pull Request #1720 · allenai/open-instruct

finbarrtimbers · 2026-06-10T14:11:47Z

No description provided.

…d hybrid 7B DPO OLMo-core sweep script Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces several updates to the DPO training pipeline, including passing a dynamic max_length to DataCollatorForSeq2SeqDPO when model compilation is enabled, simplifying the initialization of forward_kwargs, and cleaning up script arguments. However, the newly added packing parameter in separate_forward_olmo is currently unused, which could lead to silent failures or confusion. It is recommended to either implement the packing logic or raise a NotImplementedError if packing is enabled.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

I am having trouble creating individual review comments. Click here to see my feedback.

open_instruct/dpo_utils.py (1122)

The packing parameter is introduced but remains unused within the separate_forward_olmo function. This can be misleading, as it suggests that the function supports packing when it does not. To prevent silent failures and make the function's contract clear, please either implement the packing logic or raise a NotImplementedError at the beginning of the function if packing is True.

…k Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Report sequences-per-step and padding percent in DPO PerfCallback; ad…

ee387d6

…d hybrid 7B DPO OLMo-core sweep script Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

finbarrtimbers force-pushed the finbarr/dpo-mfu branch from faef93b to ee387d6 Compare June 10, 2026 14:12

gemini-code-assist Bot reviewed Jun 10, 2026

View reviewed changes

Remove defensive guards in PerfCallback and dedup token-count fallbac…

4e9c7f6

…k Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Now, DPO gets ~32% MFU.#1720

Now, DPO gets ~32% MFU.#1720
finbarrtimbers wants to merge 2 commits into
mainfrom
finbarr/dpo-mfu

finbarrtimbers commented Jun 10, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

finbarrtimbers commented Jun 10, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

open_instruct/dpo_utils.py (1122)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant