WIP: Support per-agent rewards in multi-agent setups by nph4rd · Pull Request #1910 · PrimeIntellect-ai/prime-rl

nph4rd · 2026-02-27T05:34:29Z

Adds support for per-agent rewards and advantages in multi-agent environments. This is a companion change to PrimeIntellect-ai/verifiers#965 which adds abstractions for multi-agent setups and heterogeneous reward functions.

nph4rd force-pushed the multiagent-heterogeneous-rewards branch 2 times, most recently from 1c71dea to 7e3fa23 Compare March 7, 2026 06:02

nph4rd added 11 commits March 21, 2026 17:48

support per-agent rewards from multi-agent environments

1f2b7fe

point verifiers to multiagent-heterogeneous-rewards branch

e0a3a87

remove multi-agent debug log

1bc9bc5

log per-agent rewards to wandb for multi-agent environments

da11cbf

revert agent_rewards logging, now handled via metrics

3f6a975

compute per-agent grpo advantages for multi-agent environments

59ebe1d

respect per-step is_trainable flag in interleave_rollout

d190da2

add multi-agent lora support for per-agent policy training

58b819d

split merged multi-agent samples by agent for per-agent lora training

c6f3990

fix multi-agent lora orch.toml and add pack_full_step

871b947

enforce async level in multi-actor policy updates

ce00876

nph4rd force-pushed the multiagent-heterogeneous-rewards branch from 1af84fb to f875ab3 Compare March 22, 2026 00:03

auto-set max_concurrent_runs and fix packer timeout for multi-agent lora

9c69f89

nph4rd force-pushed the multiagent-heterogeneous-rewards branch from f875ab3 to 9c69f89 Compare March 22, 2026 00:07

use upstream dedup pattern for multi-actor policy updates

ef8659c

faresobeid closed this Mar 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Support per-agent rewards in multi-agent setups#1910

WIP: Support per-agent rewards in multi-agent setups#1910
nph4rd wants to merge 13 commits intoPrimeIntellect-ai:mainfrom
nph4rd:multiagent-heterogeneous-rewards

nph4rd commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nph4rd commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants