Skip to content

Support for using step level advantages from verifiers#1838

Closed
eligotts wants to merge 5 commits intomainfrom
eli/verifiers-advantages
Closed

Support for using step level advantages from verifiers#1838
eligotts wants to merge 5 commits intomainfrom
eli/verifiers-advantages

Conversation

@eligotts
Copy link
Copy Markdown
Contributor

@eligotts eligotts commented Feb 20, 2026

This PR adds support for reading verifiers defined step level advantages. In order for prime-rl to read these, we need:

if any of those aren't set/fail, we fallback to distributing the prime-rl calculated advantage to each token


Note

Medium Risk
Touches the RL training data path (transport schema + advantage packing), so mismatched lengths or missing fields could break training; safeguards exist via validation/asserts and fallback behavior.

Overview
Adds optional support for verifier-provided step/per-token advantages via new orchestrator.use_verifier_step_advantages config, plus plumbing to request a use_verifiers_advantages state column in rollout generation.

When enabled and verifier data is present/valid, interleave_rollout populates TrainingSample.completion_advantages from per-step completion_advantages (padding prompt-in-completion segments with zeros); otherwise the orchestrator backfills completion_advantages by repeating the computed rollout advantage per token. This also replaces the transport field advantage with completion_advantages, updates batch packing to consume per-token advantages (prompt tokens forced to 0), and adjusts/extends unit tests accordingly.

Written by Cursor Bugbot for commit 2e94744. This will update automatically on new commits. Configure here.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Comment thread src/prime_rl/configs/orchestrator.py
eligotts and others added 2 commits February 19, 2026 20:29
Resolve CHANGELOG.md conflict by keeping both branch entries.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@eligotts eligotts closed this Feb 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant