Add OLMo-core 32B GRPO launch scripts by mnoukhov · Pull Request #1726 · allenai/open-instruct

mnoukhov · 2026-06-16T21:06:38Z

Summary

Adds OLMo-core (FSDP) launch scripts for 32B GRPO, running open_instruct/grpo.py (OLMo-core Trainer) instead of the DeepSpeed grpo_fast.py:

scripts/train/olmo3/32b_think_rl_olmocore.sh — faithful OLMo-core port of 32b_think_rl.sh. Identical args except: grpo_fast.py → grpo.py; dropped --deepspeed_stage 3 / --deepspeed_zpg 32 / --gather_whole_model False; added --fsdp_shard_degree 32 --fsdp_num_replicas 3 (= 96 = 8×12 learner GPUs) and --activation_memory_budget 0.1.
scripts/train/olmo3/32b_rlzero_math_olmocore.sh — 7b_rlzero_math.sh scaled to 32B + OLMo-core: model Olmo-3-1025-7B → Olmo-3-1025-32B; dropped DeepSpeed args; added --fsdp_shard_degree 32 --fsdp_num_replicas 1 (= 32 = 8×4 learner GPUs) and --activation_memory_budget 0.3. vLLM scaled for a 32B (--vllm_num_engines 10 --vllm_tensor_parallel_size 4).

The DeepSpeed → FSDP conversion swaps --deepspeed_stage/--deepspeed_zpg/--gather_whole_model for --fsdp_shard_degree/--fsdp_num_replicas/--activation_memory_budget. Both topologies satisfy the grpo_utils.py validation (shard_degree × num_replicas == sum(num_learners_per_node)).

Decisions to sanity-check before launching

32B base model for the RLZero script: used allenai/Olmo-3-1025-32B (mirroring the 7B allenai/Olmo-3-1025-7B) — verify it exists / is intended.
FSDP/topology (shard degree, replicas, vLLM TP, activation budget) are sensible defaults from existing OLMo-core scripts, not tuned for these exact models.
RLZero data mix still references Dolci-RLZero-Math-7B; kept since the change targets model + backend.

Testing

bash -n passes on both scripts.
FSDP config validation verified by hand (96 = 32×3, 32 = 32×1).

GPU_TESTS=bypass

🤖 Generated with Claude Code

OLMo-core (FSDP) versions of the 32B Think RL script and a 32B-scaled RLZero math script, running open_instruct/grpo.py instead of the DeepSpeed grpo_fast.py. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request adds two OLMo-core (FSDP) launch scripts for 32B GRPO training and updates the changelog. The review feedback identifies a dataset-to-split mismatch in the evaluation config, an over-allocation of nodes (requesting 28 instead of 18), a missing uv run prefix for running python, and a placeholder PR number in the changelog.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-16T21:08:24Z

+DATASETS="allenai/Dolci-RLZero-Math-7B 1.0"
+
+LOCAL_EVALS="allenai/Dolci-RLZero-Math-7B 16"
+LOCAL_EVAL_SPLITS="train train"


There is a mismatch between the number of datasets in LOCAL_EVALS (which has 1 dataset: allenai/Dolci-RLZero-Math-7B) and the number of splits in LOCAL_EVAL_SPLITS (which has 2 splits: "train train"). This will likely cause a runtime error during dataset mixer parsing. Please update LOCAL_EVAL_SPLITS to "train".

Suggested change

LOCAL_EVAL_SPLITS="train train"

LOCAL_EVAL_SPLITS="train"

gemini-code-assist · 2026-06-16T21:08:24Z

+    --priority urgent \
+    --gs_model_name "sft_olmo3_32b_rl_run_testing" \
+    --preemptible \
+    --num_nodes 28 \


There is a node allocation mismatch. The script requests --num_nodes 28, but the actual resources required by the configuration are only 18 nodes: Learners need 12 nodes (12 entries in --num_learners_per_node of 8 GPUs each = 96 GPUs), and vLLM needs 6 nodes (--vllm_num_engines 6 with --vllm_tensor_parallel_size 8 = 48 GPUs). Total nodes needed: 12 + 6 = 18 nodes. Requesting 28 nodes will waste 10 nodes (80 GPUs) on the cluster and unnecessarily increase queue wait times. Please update --num_nodes to 18.

Suggested change

--num_nodes 28 \

--num_nodes 18 \

gemini-code-assist · 2026-06-16T21:08:24Z



 ### Added
+- Add OLMo-core (FSDP) launch scripts `scripts/train/olmo3/32b_think_rl_olmocore.sh` and `scripts/train/olmo3/32b_rlzero_math_olmocore.sh`, which run 32B GRPO via `open_instruct/grpo.py` instead of the DeepSpeed `grpo_fast.py` (https://github.com/allenai/open-instruct/pull/1726).


The pull request URL contains a placeholder XXXX. Please update this with the correct pull request number before merging.

gemini-code-assist · 2026-06-16T21:08:24Z

+    --env LD_LIBRARY_PATH=/var/lib/tcpxo/lib64 \
+    --env NCCL_LIB_DIR=/var/lib/tcpxo/lib64 \
+    --env HOSTED_VLLM_API_BASE=http://ceres-cs-aus-447.reviz.ai2.in:8001/v1 \
+    -- source configs/beaker_configs/ray_node_setup.sh \&\& source configs/beaker_configs/code_api_setup.sh \&\& python open_instruct/grpo.py \


The command inside the container runs python open_instruct/grpo.py directly. Since the environment uses uv for dependency management, running python directly might use the system python instead of the virtual environment, potentially leading to ModuleNotFoundError. It is safer and more consistent with the other launch scripts to use uv run python or uv run.

Suggested change

-- source configs/beaker_configs/ray_node_setup.sh \&\& source configs/beaker_configs/code_api_setup.sh \&\& python open_instruct/grpo.py \

-- source configs/beaker_configs/ray_node_setup.sh \&\& source configs/beaker_configs/code_api_setup.sh \&\& uv run python open_instruct/grpo.py \

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

mnoukhov and others added 2 commits June 16, 2026 17:06

Add OLMo-core 32B GRPO launch scripts

418898c

OLMo-core (FSDP) versions of the 32B Think RL script and a 32B-scaled RLZero math script, running open_instruct/grpo.py instead of the DeepSpeed grpo_fast.py. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Update CHANGELOG with PR number

8ab9cff

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

gemini-code-assist Bot reviewed Jun 16, 2026

View reviewed changes

mnoukhov and others added 4 commits June 16, 2026 17:09

Use jacobm olmo-core 32B checkpoints and skip dataset caching

e8b55fa

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Revert to HuggingFace 32B checkpoints

5846373

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Use aime_2025_openinstruct as RL-Zero math eval dataset

a12d769

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

cleaner

c46bb88

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add OLMo-core 32B GRPO launch scripts#1726

Add OLMo-core 32B GRPO launch scripts#1726
mnoukhov wants to merge 6 commits into
mainfrom
olmo-core-32b-rl-scripts

mnoukhov commented Jun 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant



		### Added
		- Add OLMo-core (FSDP) launch scripts `scripts/train/olmo3/32b_think_rl_olmocore.sh` and `scripts/train/olmo3/32b_rlzero_math_olmocore.sh`, which run 32B GRPO via `open_instruct/grpo.py` instead of the DeepSpeed `grpo_fast.py` (https://github.com/allenai/open-instruct/pull/1726).

	-- source configs/beaker_configs/ray_node_setup.sh \&\& source configs/beaker_configs/code_api_setup.sh \&\& python open_instruct/grpo.py \
	-- source configs/beaker_configs/ray_node_setup.sh \&\& source configs/beaker_configs/code_api_setup.sh \&\& uv run python open_instruct/grpo.py \

Uh oh!

Conversation

mnoukhov commented Jun 16, 2026

Summary

Decisions to sanity-check before launching

Testing

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant