Now, we support the hybrid model in our Olmo-core code. by finbarrtimbers · Pull Request #1713 · allenai/open-instruct

finbarrtimbers · 2026-06-02T14:36:28Z

Summary

Adds Olmo-Hybrid (GDN) support to the OLMo-core DPO trainer (dpo.py) and substantially improves its MFU:

Bump olmo-core to a commit with the olmo3_hybrid_7B config preset and HF→olmo-core hybrid weight conversion (convert_hybrid_state_from_hf).
Pack DPO microbatches to the max_seq_length token budget instead of capping at per_device_train_batch_size sequences (the cap was the root cause of ~7% MFU).
Yield rectangular stacked packed-row batches (stack_packed_rows/unstack_packed_rows) so OLMo-core's dict batch contract, pre_train batch-size validation, and token/FLOPs accounting all work natively; gradient accumulation = packed rows per rank per step, rank_microbatch_size = 2 × max_seq_length tokens per packed row.
Move LR/step/epoch metric recording into DPOMetricsCallback (standard OLMo-core callback pattern) with ReduceType.sum numerator/denominator reduction.
Add a selected_modules activation checkpointing mode so torch.compile and AC coexist with GDN.
Make ModelDims FLOPs/memory GDN-aware for correct MFU reporting.
Add an OLMo-core hybrid DPO sweep script.

MFU on the multi-node debug config (OLMo-2-7B, 16k seq, packing, TP=2): 20.8% → 30.5% vs the previous cap-based packing at identical config (1.87 s/step vs 2.78 s/step).

Runs:

Multi-node packed DPO (2×8 GPU, OLMo-2-7B, 16k, rectangular batches; mfu_avg 30.5%, padding_fraction ~0.16): Beaker
Single-GPU non-packed DPO (OLMo-2-1B, 3 epochs, epoch/LR metrics via callback verified): Beaker
Hybrid 7B DPO with token-budget packing (list-based predecessor, mfu_avg 30.4%): wandb run 45sigjmq

GPU_TESTS=01KTCG94JXFMJQES1DERQR1JRM

🤖 Generated with Claude Code

…emory Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces support for hybrid models featuring linear attention layers (such as Gated Delta Net) within the model dimension and FLOPs calculation utilities, along with corresponding unit tests. It also updates the DPO training sweep scripts to use public SFT models and adds a new sweep script utilizing OLMo-core. Feedback on these changes highlights two issues: first, the removal of the SFT_LR variable in 7b_instruct_dpo_sweep.sh leaves a broken reference in the experiment description; second, direct attribute access on the configuration object in utils.py should be replaced with getattr to prevent potential AttributeErrors when optional attributes are missing.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

… tests Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…pport Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…matching ZeRO-3 reference Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… GDN at 16k seq Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…eckpoint of GDN op fails recompute metadata check Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…GDN (checkpoint only compile-safe MLPs, leave opaque GDN mixer activations live)

…selected_modules AC Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…rminism check Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…e checkpoint (avoids full-mode inductor stride guard failure) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…Triton>=3.4 Hopper kernel (fla #640) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ain/* and perf/* keys, add learning_rate/epoch/training_step) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

github-actions · 2026-06-03T14:15:51Z

Documentation Changes Detected

📄 sitemap.xml

--- site-base/sitemap.xml	2026-06-03 14:15:50.355697873 +0000
+++ site-pr/sitemap.xml	2026-06-03 14:15:43.530894820 +0000
@@ -13,6 +13,10 @@
          <lastmod>2026-06-03</lastmod>
     </url>
     <url>
+         <loc>https://github.com/allenai/open-instruct/dpo-mfu-optimization/</loc>
+         <lastmod>2026-06-03</lastmod>
+    </url>
+    <url>

📄 sitemap.xml.gz

Binary files site-base/sitemap.xml.gz and site-pr/sitemap.xml.gz differ

Showing first 10 lines of diff for each changed file (up to 5 files, excluding search indices).

…_length) and wire HSDP knobs to cut padding-FLOP waste Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…capping at per_device_batch×GAS sequences (fixes padding-FLOP MFU waste); revert bucketing approach Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ulation (microbatches_per_step); add train/padding_fraction and train/sequences_per_step metrics Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…sample_cap doesn't load the dataset Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

github-actions · 2026-06-04T16:16:45Z

Documentation Changes Detected

📄 sitemap.xml

--- site-base/sitemap.xml	2026-06-04 16:16:44.513516458 +0000
+++ site-pr/sitemap.xml	2026-06-04 16:16:38.739226163 +0000
@@ -13,6 +13,10 @@
          <lastmod>2026-06-04</lastmod>
     </url>
     <url>
+         <loc>https://github.com/allenai/open-instruct/dpo-mfu-optimization/</loc>
+         <lastmod>2026-06-04</lastmod>
+    </url>
+    <url>

📄 sitemap.xml.gz

Binary files site-base/sitemap.xml.gz and site-pr/sitemap.xml.gz differ

Showing first 10 lines of diff for each changed file (up to 5 files, excluding search indices).

github-actions · 2026-06-04T17:22:45Z

Documentation Changes Detected

📄 sitemap.xml

--- site-base/sitemap.xml	2026-06-04 17:22:44.705540248 +0000
+++ site-pr/sitemap.xml	2026-06-04 17:22:39.036262278 +0000
@@ -13,6 +13,10 @@
          <lastmod>2026-06-04</lastmod>
     </url>
     <url>
+         <loc>https://github.com/allenai/open-instruct/dpo-mfu-optimization/</loc>
+         <lastmod>2026-06-04</lastmod>
+    </url>
+    <url>

📄 sitemap.xml.gz

Binary files site-base/sitemap.xml.gz and site-pr/sitemap.xml.gz differ

Showing first 10 lines of diff for each changed file (up to 5 files, excluding search indices).

…mpute MFU (metric refactor moved it into the deferred callback, breaking get_metric) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ipt and extend CHANGELOG entry to cover the MFU work (token-budget packing, grad accumulation, selected_modules AC, GDN-aware ModelDims) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…er.global_num_tokens_in_batch and unify the collator packing probe behind _collator_max_seq_length Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

# Conflicts: # CHANGELOG.md # pyproject.toml # requirements.txt # uv.lock

…ked_rows) so OLMo-core's dict batch contract, pre_train validation, and token accounting work natively; rank_microbatch_size = 2*max_seq_length tokens per packed row; drop microbatches_per_step and list-batch handling Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… (microbatches_per_step removed) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… for non-packed batches), removing the None fallbacks in train_batch and PerfCallback.pre_step and the now-unused per_device_train_batch_size field Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

# Conflicts: # CHANGELOG.md # open_instruct/dpo.py # open_instruct/olmo_core_utils.py # open_instruct/utils.py # pyproject.toml # requirements.txt # uv.lock

finbarrtimbers added 5 commits June 1, 2026 11:54

minor tweaks to script

78f2dd1

using ai2/linear-rnns workspace

ac6a4ad

modified sweep

d0d8ea1

only one lr

86160a6

Add Olmo Hybrid DPO sweep (olmo-core) and GDN-aware ModelDims FLOPs/m…

7b34e23

…emory Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

gemini-code-assist Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread scripts/train/olmo-hybrid/7b_instruct_dpo_sweep.sh Outdated

Comment thread open_instruct/utils.py

finbarrtimbers changed the title ~~Finbarr/oc hybrid dpo~~ Now, we support the hybrid model in our Olmo-core code. Jun 2, 2026

finbarrtimbers added 14 commits June 2, 2026 08:56

Simplify ModelDims GDN handling: zero-default linear-attn dims, dedup…

1fbac3f

… tests Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Bump olmo-core to hybrid-dpo-conversion branch for Olmo-Hybrid DPO su…

2c23960

…pport Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Fully shard (fsdp_shard_degree=32) Olmo-Hybrid DPO sweep to fix OOM, …

600f83e

…matching ZeRO-3 reference Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add full-block activation checkpointing mode for olmo-core DPO to fit…

73ad560

… GDN at 16k seq Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Disable torch.compile in Olmo-Hybrid DPO sweep: compile+full-block ch…

2daaa2a

…eckpoint of GDN op fails recompute metadata check Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Bump flash-linear-attention 0.4.2 -> 0.5.0

e2a385c

Add selected_modules activation checkpointing to enable compile with …

53e2af3

…GDN (checkpoint only compile-safe MLPs, leave opaque GDN mixer activations live)

Checkpoint all Olmo-Hybrid block submodules except the GDN mixer for …

ed6b218

…selected_modules AC Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Use full AC + compile for Olmo-Hybrid DPO by skipping checkpoint dete…

66bbd9c

…rminism check Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

DPO: checkpoint GDN mixer via selected_modules to keep compile outsid…

872ad77

…e checkpoint (avoids full-mode inductor stride guard failure) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add tilelang dep so fla routes GDN chunk_bwd_dqkwg around the broken …

63348c8

…Triton>=3.4 Hopper kernel (fla #640) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

DPO: align dpo.py wandb metric keys with dpo_tune_cache.py (rename tr…

f0c8b07

…ain/* and perf/* keys, add learning_rate/epoch/training_step) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

committed changes

7856a45

Added scripts

9e00c78

finbarrtimbers added 7 commits June 3, 2026 11:07

DPO: bucket-pad packed microbatches to next power-of-two (not max_seq…

0ba1ec8

…_length) and wire HSDP knobs to cut padding-FLOP waste Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

DPO: pack microbatches to the max_seq_length token budget instead of …

e1dfe41

…capping at per_device_batch×GAS sequences (fixes padding-FLOP MFU waste); revert bucketing approach Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

DPO: add configurable per-microbatch sample cap + real gradient accum…

5cf1528

…ulation (microbatches_per_step); add train/padding_fraction and train/sequences_per_step metrics Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

DPO: bound get_mock_batch rows by token budget so a large microbatch_…

7949208

…sample_cap doesn't load the dataset Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

set flag

ac010f0

disable HF upload

7957881

moved flag

91afdf0

set flags correctly

efd9ad4

finbarrtimbers added 10 commits June 4, 2026 13:03

cleaned up pr

a59d9d2

DPO: restore per-step train/token_count record so PerfCallback can co…

a4ccbfd

…mpute MFU (metric refactor moved it into the deferred callback, breaking get_metric) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Drop leftover TRITON_PRINT_AUTOTUNING debug env from oc DPO sweep scr…

6d807e6

…ipt and extend CHANGELOG entry to cover the MFU work (token-budget packing, grad accumulation, selected_modules AC, GDN-aware ModelDims) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Fix stale SFT_LR reference in DeepSpeed sweep description (PR review)…

066265b

… Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Simplify: delegate global_num_flops_in_batch token count to data_load…

9d99e3e

…er.global_num_tokens_in_batch and unify the collator packing probe behind _collator_max_seq_length Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into finbarr/oc-hybrid-dpo

1cab2fa

# Conflicts: # CHANGELOG.md # pyproject.toml # requirements.txt # uv.lock

Update CHANGELOG entry for rectangular stacked packed-row DPO batches…

de4ff38

… (microbatches_per_step removed) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Simplify: get_num_sequences always returns int (counts input_ids rows…

e224447

… for non-packed batches), removing the None fallbacks in train_batch and PerfCallback.pre_step and the now-unused per_device_train_batch_size field Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into finbarr/oc-hybrid-dpo

0f5c4e4

# Conflicts: # CHANGELOG.md # open_instruct/dpo.py # open_instruct/olmo_core_utils.py # open_instruct/utils.py # pyproject.toml # requirements.txt # uv.lock

finbarrtimbers marked this pull request as ready for review June 9, 2026 22:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Now, we support the hybrid model in our Olmo-core code. #1713

Now, we support the hybrid model in our Olmo-core code. #1713
finbarrtimbers wants to merge 37 commits into
mainfrom
finbarr/oc-hybrid-dpo

finbarrtimbers commented Jun 2, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

finbarrtimbers commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 3, 2026

Documentation Changes Detected

Uh oh!

github-actions Bot commented Jun 4, 2026

Documentation Changes Detected

Uh oh!

github-actions Bot commented Jun 4, 2026

Documentation Changes Detected

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

finbarrtimbers commented Jun 2, 2026 •

edited

Loading