Support for using step level advantages from verifiers by eligotts · Pull Request #1838 · PrimeIntellect-ai/prime-rl

eligotts · 2026-02-20T04:16:56Z

This PR adds support for reading verifiers defined step level advantages. In order for prime-rl to read these, we need:

flag to be set in config
verifiers to signal this through state, pr here: send step level advantages to prime rl verifiers#941
for each trajectory step, the completion_advantages needs to be the same length as completion_ids

if any of those aren't set/fail, we fallback to distributing the prime-rl calculated advantage to each token

Note

Medium Risk
Touches the RL training data path (transport schema + advantage packing), so mismatched lengths or missing fields could break training; safeguards exist via validation/asserts and fallback behavior.

Overview
Adds optional support for verifier-provided step/per-token advantages via new orchestrator.use_verifier_step_advantages config, plus plumbing to request a use_verifiers_advantages state column in rollout generation.

When enabled and verifier data is present/valid, interleave_rollout populates TrainingSample.completion_advantages from per-step completion_advantages (padding prompt-in-completion segments with zeros); otherwise the orchestrator backfills completion_advantages by repeating the computed rollout advantage per token. This also replaces the transport field advantage with completion_advantages, updates batch packing to consume per-token advantages (prompt tokens forced to 0), and adjusts/extends unit tests accordingly.

^{Written by Cursor Bugbot for commit 2e94744. This will update automatically on new commits. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

Resolve CHANGELOG.md conflict by keeping both branch entries. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

eligotts added 3 commits February 19, 2026 18:55

prime rl uses verifiers advantages

1eb5a5e

Merge remote-tracking branch 'origin/main' into eli/verifiers-advantages

e80715b

updated tests

42c061f

cursor Bot reviewed Feb 20, 2026

View reviewed changes

Comment thread src/prime_rl/configs/orchestrator.py

eligotts and others added 2 commits February 19, 2026 20:29

added changelog

c231401

Merge main into eli/verifiers-advantages

2e94744

Resolve CHANGELOG.md conflict by keeping both branch entries. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

eligotts closed this Feb 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for using step level advantages from verifiers#1838

Support for using step level advantages from verifiers#1838
eligotts wants to merge 5 commits intomainfrom
eli/verifiers-advantages

eligotts commented Feb 20, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eligotts commented Feb 20, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

eligotts commented Feb 20, 2026 •

edited by cursor Bot

Loading