multi-agent v2: Agent, TaskSet, Registry abstractions#961
multi-agent v2: Agent, TaskSet, Registry abstractions#961Bhoy1 wants to merge 67 commits intoPrimeIntellect-ai:mainfrom
Conversation
…lay different models per actor. Should show in logs in CLI display. Example in twenty question/poker.
… Also, can setup currently to have different models with different system prompts play each other. Still validating environments
…into Bhoy1/Multiagent
…up signatures, fix eval_utils KeyError for child states
Replace Actor/Protocol with Agent/TaskSet/Registry pattern. Agent owns model config + system prompt, TaskSet owns game logic, Registry wires agents to envs and enables parallel spawn.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 5 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable autofix in the Cursor dashboard.
|
|
||
| for i, (inp, out) in enumerate(grids): | ||
| ax_in = axes[i][0] if num_pairs > 1 else axes[0] | ||
| ax_out = axes[i][1] if num_pairs > 1 else axes[1] |
There was a problem hiding this comment.
Image generation crashes when only one training pair
High Severity
When num_pairs == 1, plt.subplots(1, 2) returns a 1D array [ax1, ax2]. After wrapping with axes = [axes], the list becomes [[ax1, ax2]]. The else branch then accesses axes[0] (which gives the inner array, not a single axis) and axes[1] (which raises an IndexError since the outer list has only one element). The condition incorrectly bypasses the wrapped indexing axes[i][0] / axes[i][1] which would work correctly after the wrapping.
environments/poker/poker.py
Outdated
| async def should_stop(self, state: State) -> bool: | ||
| extras = state.get("extras", {}) | ||
| if len(extras.get("folded", [])) > 0: | ||
| return True |
There was a problem hiding this comment.
Fold immediately ends multi-hand poker session
High Severity
The should_stop check if len(extras.get("folded", [])) > 0: return True immediately ends the entire game session on any fold, even though num_hands defaults to 5. Since folds are extremely common in heads-up poker, most games will end after a single hand without ever reaching _resolve_showdown, which is the only code path that starts new hands. The four remaining hands are silently skipped.
| if e.get("actions_hand", 0) >= self.max_actions: | ||
| self._resolve_showdown(state) | ||
| return True | ||
| return False |
There was a problem hiding this comment.
Fold path skips new hand in multi-hand poker
Low Severity
When all but one player fold, should_stop awards the pot and sets phase = "complete" without attempting to start a new hand, unlike _resolve_showdown which checks hands_played < num_hands and calls _start_hand. With num_hands > 1, a hand ending via fold terminates the entire session instead of rotating the dealer and starting the next hand.
environments/poker/poker.py
Outdated
| dealer = extras["dealer"] | ||
| non_dealer = "player2" if dealer == "player1" else "player1" | ||
| if non_dealer not in extras["folded"]: | ||
| extras["current_actor_id"] = non_dealer |
There was a problem hiding this comment.
Postflop turn order reversed by _advance_phase override
Medium Severity
_advance_phase sets extras["current_actor_id"] = non_dealer intending for the non-dealer to act first postflop. But the framework's get_next_role returns the opposite of current_actor_id, so this causes the dealer to act first postflop instead. In heads-up poker, the non-dealer acts first on flop/turn/river. When the dealer was the last preflop actor (e.g., dealer called a BB raise), the natural turn order would have been correct, but this override actively breaks it.
| rollouts_per_example = 1 | ||
| num_players = 6 | ||
| num_hands = 3 | ||
| max_actions_per_hand = 50 |
There was a problem hiding this comment.
Config parameter name mismatches load_environment signature
Medium Severity
The pyproject.toml specifies max_actions_per_hand = 50 under [tool.verifiers.eval], but load_environment expects the parameter max_actions, not max_actions_per_hand. This mismatch means the config value either causes a TypeError (unexpected keyword argument) or is silently ignored, with the function falling back to its default of 50. Users adjusting this config setting would see no effect.
Additional Locations (1)
Override run_group() in MultiAgentEnv to split game states into per-actor states before scoring. Each trainable actor gets its own RolloutOutput with correct reward. Non-trainable actors excluded.
…game debug fields
…d fallback actor ID lookups
…p and state_to_output


Summary
Registry.spawn()for parallel child rolloutsActor→Agent,Protocol→RegistryEnvironments on new pattern:
twenty_questions,poker,rock_paper_scissors,proposer_solver(TaskSet)poker_multi(TaskSet, 2-9 players)proposer_code_solver(TaskSet + tool execution, new)arc_multiagent(subclass mode with Registry.spawn(), hierarchical pipeline)Test plan
Note
Medium Risk
Large new environments add complex orchestration (subprocess sandboxing, parallel spawning, multimodal prompting) and game logic, which may introduce runtime/platform issues, but they are additive with minimal impact on existing core behavior.
Overview
Adds a new
docs/multiagent.mdguide describing multi-agent environment patterns (turn-taking, simultaneous moves, per-actor rewards/GRPO, and hierarchical spawning).Introduces a full ARC-AGI multi-agent solver environment (
environments/arc_multiagent) that orchestrates parallel solver strategies (shallow/deep, codegen with sandboxed execution, vision, objects/hints) viaRegistry.spawn(), then selects final answers via a duo-pick judge council with logic/consistency judging fallback; includes packaging metadata.Adds new poker environments:
environments/poker(heads-up) andenvironments/poker_multi(2–9 players) implemented in the newTaskSetcomposition style with per-player agents, JSON action parsing, and chip-based per-actor rewards, along with theirpyproject.tomlconfigs.Written by Cursor Bugbot for commit 6821879. This will update automatically on new commits. Configure here.