fix(supervisor,hub): kill legacy agent spawn + cap orphan replay + cap restarts#86
Merged
Merged
Conversation
…p restarts
Structural fixes from the 2026-05-27 autonomous-loop RCA.
Supervisor (process-manager.ts):
- spawn() no longer invokes the retired CLI agent via npx -y. Every
session.start now finalizes immediately as stopped with
exit_reason='legacy_agent_spawn_disabled'. The cached buggy v0.4.1 of
that retired package was the source of the "Always delegate" loop.
- scheduleRestart() now hard-caps at MAX_RESTART_COUNT=10. After 10
consecutive attempts the run finalizes as max_restarts_exceeded and
the supervisor stops respawning, instead of looping forever at the
30s backoff plateau.
Hub auto-resume (ws/agent.ts on supervisor.hello):
- Filters orphan session_runs to rows where started_at > now() - 24h.
- Sweeps older open rows in one UPDATE, finalizing them as
exit_reason='stale'.
- Skips replay for any orphan whose restart_count >= 10, finalizing as
exit_reason='max_restarts_exceeded'.
Tests:
- supervisor/test/no-legacy-agent-spawn.test.ts — canary that greps
supervisor/src/** for the retired package name and --append-system-prompt
flag. Fails the build if either reappears.
- supervisor/test/process-manager.test.ts — happy-path test rewritten
to assert the new legacy_agent_spawn_disabled stopped state; concurrency
+ audit tests now monkey-patch spawn() to hold the slot so the cap
gates remain testable until the inline runner lands.
- hub/test/auto-resume-caps.test.ts — DB-gated unit test covering the
24h age cap and restart_count >= 10 cap (skipped without REMO_E2E_DB_URL).
Version bump: supervisor 0.5.1 → 0.5.2 across Cargo.toml, tauri.conf.json,
ui/package.json, and the inline VERSION constants in supervisor/src/{index,
hub-client}.ts.
Follow-up not in this PR: the in-process claude-runner that bridges
stream-json over the supervisor WS. Until that lands, session.start
rolls to stopped instead of trapping the supervisor in a respawn loop.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
finedesignz
added a commit
that referenced
this pull request
May 28, 2026
… bridge (#96) Phase 09 (PR #86) gutted ProcessManager.spawn() and returned every session.start with `legacy_agent_spawn_disabled`. Result: clicking Launch on a repo in Settings → Supervisor / Sidebar did nothing — the supervisor received `session.start` from the hub and immediately finalized the run as stopped. Reproduced in supervisor.log at 2026-05-27 23:51:54. Root cause: PR #86 killed the legacy external CLI agent spawn path (autonomous-loop bug in cached v0.4.1) without an in-process replacement. Fix: re-wire spawn() to construct a `SessionBridge` per run. Each bridge: - opens its own /ws/agent connection with project_dir = repoPath and role = 'agent' (mirroring what the retired external agent did); - spawns `claude --input-format stream-json --output-format stream-json --verbose` directly via a new `ClaudeRunner`; - parses Claude's stream-json events and translates them onto the hub's agent-protocol (thinking / text_delta / tool_use / tool_result / status / assistant_message / permission_request / user_question); - routes hub-originated user_message / permission_response / question_response / cancel / shutdown back into the runner. Preserved from PR #86: - sandbox-escape, not-git-repo, concurrency, dangerous-skip-permissions gates + audit-log entry per decision; - CIRCUIT_THRESHOLD (5 crashes / 10 min) and MAX_RESTART_COUNT (10) caps so a broken spawn can no longer loop forever. Phase 09 canary `no-legacy-agent-spawn.test.ts` still passes — the new runner never references the retired npm package name or forbidden flags. ProcessManager unit tests rewritten to use `bridgeFactory` instead of the old `spawnImpl` (Bun.spawn) hook. Supervisor bumped 0.5.2 → 0.5.3 in all four source-of-truth locations (supervisor/src/index.ts, supervisor/src/hub-client.ts, supervisor/tauri/src-tauri/{Cargo.toml,tauri.conf.json}, supervisor/tauri/ui/package.json) so the next MSI advertises the fix. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Structural fixes from the 2026-05-27 autonomous-loop RCA.
process-manager.ts:spawn()no longer invokes the retired CLI agent. Everysession.startfinalizes immediately asstoppedwithexit_reason='legacy_agent_spawn_disabled'. The cached buggy v0.4.1 of the retired package was the source of the loop.scheduleRestart()caps atMAX_RESTART_COUNT=10. After 10 consecutive restarts the run finalizes asmax_restarts_exceededinstead of looping forever at the 30s backoff plateau.ws/agent.tsonsupervisor.hello):session_runstostarted_at > now() - 24h.exit_reason='stale'in one UPDATE.restart_count >= 10, finalizing asmax_restarts_exceeded.Tests
supervisor/test/no-legacy-agent-spawn.test.ts— canary that grepssupervisor/src/**for the retired package name +--append-system-prompt. Fails the build if either reappears.supervisor/test/process-manager.test.ts— happy-path tests rewritten to assert the newlegacy_agent_spawn_disabledstopped state; concurrency/audit tests monkey-patchspawn()to hold the slot so cap gates stay exercisable.hub/test/auto-resume-caps.test.ts— DB-gated unit test covering the 24h age cap andrestart_count >= 10cap (skipped withoutREMO_E2E_DB_URL).Version bump
Supervisor 0.5.1 → 0.5.2 across
Cargo.toml,tauri.conf.json,ui/package.json, and the inlineVERSIONconstants insupervisor/src/{index,hub-client}.ts.Follow-up (not in this PR)
The in-process
claude-runnerthat bridgesclaude --input-format stream-json --output-format stream-jsonover the supervisor WS. Until that lands,session.startrolls cleanly to stopped instead of trapping the supervisor in a respawn loop.Test plan
cd supervisor && bun test→ 48 pass, 0 failcd hub && bun test→ 367 pass (same 8 pre-existing test-ordering flakes asmain, unrelated to this PR)agent.tsmodule loads cleanly🤖 Generated with Claude Code