Skip to content

fix(supervisor): restore session spawn via in-process claude-runner + bridge#96

Merged
finedesignz merged 1 commit into
mainfrom
fix/session-spawn-after-phase09
May 28, 2026
Merged

fix(supervisor): restore session spawn via in-process claude-runner + bridge#96
finedesignz merged 1 commit into
mainfrom
fix/session-spawn-after-phase09

Conversation

@finedesignz
Copy link
Copy Markdown
Owner

Root cause

Phase 09 (PR #86) gutted `ProcessManager.spawn()` and finalized every `session.start` with reason `legacy_agent_spawn_disabled`, so clicking Launch on a repo in Settings → Supervisor / Sidebar did nothing. The retired external CLI agent had the autonomous-loop bug, so PR #86 removed it — but no in-process replacement was wired up.

Reproduced live in `supervisor.log` at 2026-05-27 23:51:54Z:

```
[security] legacy_agent_spawn_disabled: ... session.start cannot proceed until the
in-process claude-runner lands. Run will be finalized as stopped.
```

Fix

Re-wire `spawn()` to construct a per-run `SessionBridge`. Each bridge:

  1. Opens its own `/ws/agent` connection with `project_dir = repoPath` and `role = 'agent'` — mirroring what the retired external agent did from its own process.
  2. Spawns `claude --input-format stream-json --output-format stream-json --verbose` directly via a new in-process `ClaudeRunner` (NOT the retired external npm CLI; canary still passes).
  3. Translates Claude's stream-json events to the hub's agent-protocol shapes (thinking / text_delta / tool_use / tool_result / status / assistant_message / permission_request / user_question / agent_log).
  4. Routes hub-originated `user_message` / `permission_response` / `question_response` / `cancel` / `shutdown` back into the runner.

Preserved from PR #86:

  • All five security gates (sandbox-escape, not-git-repo, concurrency, duplicate-run, dangerous-skip cap) + audit-log row per decision.
  • `CIRCUIT_THRESHOLD = 5 crashes / 10 min` circuit-breaker.
  • `MAX_RESTART_COUNT = 10` restart cap — so a broken spawn can never loop forever again.

Tests

  • Phase 09 canary `supervisor/test/no-legacy-agent-spawn.test.ts` — still passes (new code uses neither the retired package name nor the forbidden `--append-system-prompt` flag).
  • `supervisor/test/process-manager.test.ts` rewritten to use the new `bridgeFactory` hook in place of the deleted `spawnImpl` (Bun.spawn) hook. Asserts: session.start constructs the bridge with correct opts, gates still reject before bridge construction, dangerous-skip is correctly threaded through, audit log captures allow + reject decisions, crash path schedules restart.
  • Full supervisor suite green: `49 pass / 0 fail / 7 files`.

Version bump

Supervisor: `0.5.2 → 0.5.3` in all four sources of truth, in lockstep:

  • `supervisor/src/index.ts` (VERSION const)
  • `supervisor/src/hub-client.ts` (VERSION const — reported in supervisor.hello to the hub)
  • `supervisor/tauri/src-tauri/Cargo.toml`
  • `supervisor/tauri/src-tauri/tauri.conf.json`
  • `supervisor/tauri/ui/package.json`

The next MSI release (`supervisor-v0.5.3` tag, built by `.github/workflows/release-supervisor.yml`) will carry the fix.

Test plan

  • Unit tests: `bun test supervisor/test/` — 49 pass / 0 fail.
  • Phase 09 canary still passes.
  • After MSI build + install: click Launch on a repo, confirm session.start spawns claude and the chat surfaces stream activity in the web UI.

🤖 Generated with Claude Code

… bridge

Phase 09 (PR #86) gutted ProcessManager.spawn() and returned every
session.start with `legacy_agent_spawn_disabled`. Result: clicking Launch
on a repo in Settings → Supervisor / Sidebar did nothing — the supervisor
received `session.start` from the hub and immediately finalized the run
as stopped. Reproduced in supervisor.log at 2026-05-27 23:51:54.

Root cause: PR #86 killed the legacy external CLI agent spawn path
(autonomous-loop bug in cached v0.4.1) without an in-process replacement.

Fix: re-wire spawn() to construct a `SessionBridge` per run. Each bridge:
  - opens its own /ws/agent connection with project_dir = repoPath and
    role = 'agent' (mirroring what the retired external agent did);
  - spawns `claude --input-format stream-json --output-format stream-json
    --verbose` directly via a new `ClaudeRunner`;
  - parses Claude's stream-json events and translates them onto the hub's
    agent-protocol (thinking / text_delta / tool_use / tool_result /
    status / assistant_message / permission_request / user_question);
  - routes hub-originated user_message / permission_response /
    question_response / cancel / shutdown back into the runner.

Preserved from PR #86:
  - sandbox-escape, not-git-repo, concurrency, dangerous-skip-permissions
    gates + audit-log entry per decision;
  - CIRCUIT_THRESHOLD (5 crashes / 10 min) and MAX_RESTART_COUNT (10)
    caps so a broken spawn can no longer loop forever.

Phase 09 canary `no-legacy-agent-spawn.test.ts` still passes — the new
runner never references the retired npm package name or forbidden flags.
ProcessManager unit tests rewritten to use `bridgeFactory` instead of the
old `spawnImpl` (Bun.spawn) hook.

Supervisor bumped 0.5.2 → 0.5.3 in all four source-of-truth locations
(supervisor/src/index.ts, supervisor/src/hub-client.ts,
supervisor/tauri/src-tauri/{Cargo.toml,tauri.conf.json},
supervisor/tauri/ui/package.json) so the next MSI advertises the fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@finedesignz finedesignz merged commit 55307b2 into main May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant