Skip to content

multi-agent v2: Agent, TaskSet, Registry abstractions#961

Draft
Bhoy1 wants to merge 67 commits intoPrimeIntellect-ai:mainfrom
Bhoy1:Bhoy1/Multiagent_v2
Draft

multi-agent v2: Agent, TaskSet, Registry abstractions#961
Bhoy1 wants to merge 67 commits intoPrimeIntellect-ai:mainfrom
Bhoy1:Bhoy1/Multiagent_v2

Conversation

@Bhoy1
Copy link

@Bhoy1 Bhoy1 commented Feb 26, 2026

Summary

  • Rebuilt multi-agent support with clean separation of concerns
  • Agent: WHO responds — model, client, system_prompt, sampling config
  • TaskSet: WHAT to solve — game rules, prompts, scoring, reusable across agents
  • Registry: HOW to wire — connects agents to envs, enables Registry.spawn() for parallel child rollouts
  • MultiAgentEnv: engine that runs the game loop, supports TaskSet mode and subclass mode
  • Agent system_prompt auto-prepended to prompts built by TaskSet
  • ActorAgent, ProtocolRegistry

Environments on new pattern:

  • twenty_questions, poker, rock_paper_scissors, proposer_solver (TaskSet)
  • poker_multi (TaskSet, 2-9 players)
  • proposer_code_solver (TaskSet + tool execution, new)
  • arc_multiagent (subclass mode with Registry.spawn(), hierarchical pipeline)

Test plan

  • poker_multi: 6-player games with per-agent models, JSON actions, rewards
  • twenty_questions: alternating turns, win detection, efficiency scoring
  • proposer_code_solver: tool execution with code REPL
  • arc_multiagent: hierarchical spawning pipeline
  • Training loop integration (GRPO with per-actor states)

Note

Medium Risk
Large new environments add complex orchestration (subprocess sandboxing, parallel spawning, multimodal prompting) and game logic, which may introduce runtime/platform issues, but they are additive with minimal impact on existing core behavior.

Overview
Adds a new docs/multiagent.md guide describing multi-agent environment patterns (turn-taking, simultaneous moves, per-actor rewards/GRPO, and hierarchical spawning).

Introduces a full ARC-AGI multi-agent solver environment (environments/arc_multiagent) that orchestrates parallel solver strategies (shallow/deep, codegen with sandboxed execution, vision, objects/hints) via Registry.spawn(), then selects final answers via a duo-pick judge council with logic/consistency judging fallback; includes packaging metadata.

Adds new poker environments: environments/poker (heads-up) and environments/poker_multi (2–9 players) implemented in the new TaskSet composition style with per-player agents, JSON action parsing, and chip-based per-actor rewards, along with their pyproject.toml configs.

Written by Cursor Bugbot for commit 6821879. This will update automatically on new commits. Configure here.

wjh581 and others added 25 commits January 25, 2026 05:37
…lay different models per actor. Should show in logs in CLI display. Example in twenty question/poker.
… Also, can setup currently to have different models with different system prompts play each other. Still validating environments
…up signatures, fix eval_utils KeyError for child states
Replace Actor/Protocol with Agent/TaskSet/Registry pattern.
Agent owns model config + system prompt, TaskSet owns game logic,
Registry wires agents to envs and enables parallel spawn.
@Bhoy1 Bhoy1 marked this pull request as draft February 26, 2026 02:22
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 5 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable autofix in the Cursor dashboard.


for i, (inp, out) in enumerate(grids):
ax_in = axes[i][0] if num_pairs > 1 else axes[0]
ax_out = axes[i][1] if num_pairs > 1 else axes[1]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image generation crashes when only one training pair

High Severity

When num_pairs == 1, plt.subplots(1, 2) returns a 1D array [ax1, ax2]. After wrapping with axes = [axes], the list becomes [[ax1, ax2]]. The else branch then accesses axes[0] (which gives the inner array, not a single axis) and axes[1] (which raises an IndexError since the outer list has only one element). The condition incorrectly bypasses the wrapped indexing axes[i][0] / axes[i][1] which would work correctly after the wrapping.

Fix in Cursor Fix in Web

async def should_stop(self, state: State) -> bool:
extras = state.get("extras", {})
if len(extras.get("folded", [])) > 0:
return True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fold immediately ends multi-hand poker session

High Severity

The should_stop check if len(extras.get("folded", [])) > 0: return True immediately ends the entire game session on any fold, even though num_hands defaults to 5. Since folds are extremely common in heads-up poker, most games will end after a single hand without ever reaching _resolve_showdown, which is the only code path that starts new hands. The four remaining hands are silently skipped.

Fix in Cursor Fix in Web

if e.get("actions_hand", 0) >= self.max_actions:
self._resolve_showdown(state)
return True
return False
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fold path skips new hand in multi-hand poker

Low Severity

When all but one player fold, should_stop awards the pot and sets phase = "complete" without attempting to start a new hand, unlike _resolve_showdown which checks hands_played < num_hands and calls _start_hand. With num_hands > 1, a hand ending via fold terminates the entire session instead of rotating the dealer and starting the next hand.

Fix in Cursor Fix in Web

dealer = extras["dealer"]
non_dealer = "player2" if dealer == "player1" else "player1"
if non_dealer not in extras["folded"]:
extras["current_actor_id"] = non_dealer
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postflop turn order reversed by _advance_phase override

Medium Severity

_advance_phase sets extras["current_actor_id"] = non_dealer intending for the non-dealer to act first postflop. But the framework's get_next_role returns the opposite of current_actor_id, so this causes the dealer to act first postflop instead. In heads-up poker, the non-dealer acts first on flop/turn/river. When the dealer was the last preflop actor (e.g., dealer called a BB raise), the natural turn order would have been correct, but this override actively breaks it.

Fix in Cursor Fix in Web

rollouts_per_example = 1
num_players = 6
num_hands = 3
max_actions_per_hand = 50
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config parameter name mismatches load_environment signature

Medium Severity

The pyproject.toml specifies max_actions_per_hand = 50 under [tool.verifiers.eval], but load_environment expects the parameter max_actions, not max_actions_per_hand. This mismatch means the config value either causes a TypeError (unexpected keyword argument) or is silently ignored, with the function falling back to its default of 50. Users adjusting this config setting would see no effect.

Additional Locations (1)

Fix in Cursor Fix in Web

Override run_group() in MultiAgentEnv to split game states into
per-actor states before scoring. Each trainable actor gets its own
RolloutOutput with correct reward. Non-trainable actors excluded.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants