Skip to content

Add OLMo-core 32B GRPO launch scripts#1726

Open
mnoukhov wants to merge 6 commits into
mainfrom
olmo-core-32b-rl-scripts
Open

Add OLMo-core 32B GRPO launch scripts#1726
mnoukhov wants to merge 6 commits into
mainfrom
olmo-core-32b-rl-scripts

Conversation

@mnoukhov

Copy link
Copy Markdown
Contributor

Summary

Adds OLMo-core (FSDP) launch scripts for 32B GRPO, running open_instruct/grpo.py (OLMo-core Trainer) instead of the DeepSpeed grpo_fast.py:

  • scripts/train/olmo3/32b_think_rl_olmocore.sh — faithful OLMo-core port of 32b_think_rl.sh. Identical args except: grpo_fast.pygrpo.py; dropped --deepspeed_stage 3 / --deepspeed_zpg 32 / --gather_whole_model False; added --fsdp_shard_degree 32 --fsdp_num_replicas 3 (= 96 = 8×12 learner GPUs) and --activation_memory_budget 0.1.
  • scripts/train/olmo3/32b_rlzero_math_olmocore.sh7b_rlzero_math.sh scaled to 32B + OLMo-core: model Olmo-3-1025-7BOlmo-3-1025-32B; dropped DeepSpeed args; added --fsdp_shard_degree 32 --fsdp_num_replicas 1 (= 32 = 8×4 learner GPUs) and --activation_memory_budget 0.3. vLLM scaled for a 32B (--vllm_num_engines 10 --vllm_tensor_parallel_size 4).

The DeepSpeed → FSDP conversion swaps --deepspeed_stage/--deepspeed_zpg/--gather_whole_model for --fsdp_shard_degree/--fsdp_num_replicas/--activation_memory_budget. Both topologies satisfy the grpo_utils.py validation (shard_degree × num_replicas == sum(num_learners_per_node)).

Decisions to sanity-check before launching

  • 32B base model for the RLZero script: used allenai/Olmo-3-1025-32B (mirroring the 7B allenai/Olmo-3-1025-7B) — verify it exists / is intended.
  • FSDP/topology (shard degree, replicas, vLLM TP, activation budget) are sensible defaults from existing OLMo-core scripts, not tuned for these exact models.
  • RLZero data mix still references Dolci-RLZero-Math-7B; kept since the change targets model + backend.

Testing

  • bash -n passes on both scripts.
  • FSDP config validation verified by hand (96 = 32×3, 32 = 32×1).

GPU_TESTS=bypass

🤖 Generated with Claude Code

mnoukhov and others added 2 commits June 16, 2026 17:06
OLMo-core (FSDP) versions of the 32B Think RL script and a 32B-scaled
RLZero math script, running open_instruct/grpo.py instead of the
DeepSpeed grpo_fast.py.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds two OLMo-core (FSDP) launch scripts for 32B GRPO training and updates the changelog. The review feedback identifies a dataset-to-split mismatch in the evaluation config, an over-allocation of nodes (requesting 28 instead of 18), a missing uv run prefix for running python, and a placeholder PR number in the changelog.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

DATASETS="allenai/Dolci-RLZero-Math-7B 1.0"

LOCAL_EVALS="allenai/Dolci-RLZero-Math-7B 16"
LOCAL_EVAL_SPLITS="train train"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a mismatch between the number of datasets in LOCAL_EVALS (which has 1 dataset: allenai/Dolci-RLZero-Math-7B) and the number of splits in LOCAL_EVAL_SPLITS (which has 2 splits: "train train"). This will likely cause a runtime error during dataset mixer parsing. Please update LOCAL_EVAL_SPLITS to "train".

Suggested change
LOCAL_EVAL_SPLITS="train train"
LOCAL_EVAL_SPLITS="train"

--priority urgent \
--gs_model_name "sft_olmo3_32b_rl_run_testing" \
--preemptible \
--num_nodes 28 \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a node allocation mismatch. The script requests --num_nodes 28, but the actual resources required by the configuration are only 18 nodes: Learners need 12 nodes (12 entries in --num_learners_per_node of 8 GPUs each = 96 GPUs), and vLLM needs 6 nodes (--vllm_num_engines 6 with --vllm_tensor_parallel_size 8 = 48 GPUs). Total nodes needed: 12 + 6 = 18 nodes. Requesting 28 nodes will waste 10 nodes (80 GPUs) on the cluster and unnecessarily increase queue wait times. Please update --num_nodes to 18.

Suggested change
--num_nodes 28 \
--num_nodes 18 \

Comment thread CHANGELOG.md


### Added
- Add OLMo-core (FSDP) launch scripts `scripts/train/olmo3/32b_think_rl_olmocore.sh` and `scripts/train/olmo3/32b_rlzero_math_olmocore.sh`, which run 32B GRPO via `open_instruct/grpo.py` instead of the DeepSpeed `grpo_fast.py` (https://github.com/allenai/open-instruct/pull/1726).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The pull request URL contains a placeholder XXXX. Please update this with the correct pull request number before merging.

--env LD_LIBRARY_PATH=/var/lib/tcpxo/lib64 \
--env NCCL_LIB_DIR=/var/lib/tcpxo/lib64 \
--env HOSTED_VLLM_API_BASE=http://ceres-cs-aus-447.reviz.ai2.in:8001/v1 \
-- source configs/beaker_configs/ray_node_setup.sh \&\& source configs/beaker_configs/code_api_setup.sh \&\& python open_instruct/grpo.py \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The command inside the container runs python open_instruct/grpo.py directly. Since the environment uses uv for dependency management, running python directly might use the system python instead of the virtual environment, potentially leading to ModuleNotFoundError. It is safer and more consistent with the other launch scripts to use uv run python or uv run.

Suggested change
-- source configs/beaker_configs/ray_node_setup.sh \&\& source configs/beaker_configs/code_api_setup.sh \&\& python open_instruct/grpo.py \
-- source configs/beaker_configs/ray_node_setup.sh \&\& source configs/beaker_configs/code_api_setup.sh \&\& uv run python open_instruct/grpo.py \

mnoukhov and others added 4 commits June 16, 2026 17:09
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant