Skip to content

fix(rpc-child-controller): unref detached children so parent Pi can exit (FIX-6)#3

Merged
tiziano-contorno merged 1 commit into
Tiziano-AI:mainfrom
ShinonKagura:pr1-fix-6-child-unref
May 28, 2026
Merged

fix(rpc-child-controller): unref detached children so parent Pi can exit (FIX-6)#3
tiziano-contorno merged 1 commit into
Tiziano-AI:mainfrom
ShinonKagura:pr1-fix-6-child-unref

Conversation

@ShinonKagura

Copy link
Copy Markdown

This is one of three focused PRs splitting out the usable parts of #2 against current main, as requested in your review.

What

Single-line fix in extensions/multiagent/src/rpc-child-controller.ts (line 58):

 this.child = this.spawnProcess(invocation.command, invocation.args, { cwd: this.options.cwd, shell: false, stdio: ["pipe", "pipe", "pipe"], detached: process.platform !== "win32" });
+this.child.unref?.();

Why

Found in real-Pi smoke testing on Pi 0.75.5. RpcChildController spawns step children with detached: true (non-Windows) and piped stdio. Without child.unref(), libuv keeps the parent's event loop alive on the dead-child ChildProcess handle even after the child has exited cleanly and stdio has closed.

Symptom: pi -p ... hangs at the shell prompt after the model's terminal JSON.

The fix mirrors the existing timeoutTimer.unref?.() pattern at the same call site (line 62 in rpc-child-controller.ts) and exitCloseTimer.unref?.() (line 299).

Required checks against this branch

Check Result
npx tsc --noEmit ✅ exit 0
node --test tests/*.test.ts ✅ 244/244 pass, 0 fail
node tests/check-source-size.ts ✅ PASS (18165/18432 bytes)
node tests/check-public-docs.ts ✅ PASS
git diff --check HEAD~1..HEAD ✅ clean

Side-finding (out-of-scope but worth flagging)

tests/smoke-fake-pi.ts fails on clean origin/main with an assertion mismatch:

  • expected regex: # agent_team terminal notice|Objective:|...
  • actual output contains: untrusted status evidence; run_status/step_result for artifacts

This appears to be a stale test expectation against a recent rendering refactor, completely independent of this PR. Filing separately if you'd like, or happy to include a small follow-up PR for it.

Discipline notes

  • No version bump
  • Changelog entry under ## Unreleased only (1 sentence)
  • Single focused commit (fix(rpc-child-controller): ...)
  • Mirrors existing unref-pattern style at the same call site
  • No new dependencies, no API changes

Related

Closes a portion of #2 (separately scoped). Two more focused PRs follow:

  • preflight-shape GRAPH_BODY_FIELDS trim
  • per-step outputLimit knobs as a small feature

Three proposal-issues for the larger features (scheduling, persistent state + reattach, worktree isolation) will be opened separately so we can agree on shape before code, as you requested.

@tiziano-contorno

Copy link
Copy Markdown
Contributor

Thanks for splitting this out. The child.unref?.() change is plausible and likely the right fix for the reported pi -p hang, but I don’t want to merge a child-process lifecycle change on claim-only proof.

Please add focused proof before this lands:

  • a fake-spawn/fake-child regression test with an unref() spy asserting it is called after spawn;
  • proof the normal terminal closeout path still resolves after unref();
  • proof cancellation/timeout kill behavior is not regressed, or a note explaining which existing tests cover that path;
  • fresh validation output for pnpm run typecheck, pnpm test, pnpm run gate, and git diff --check.

A real pi -p smoke receipt with a timeout showing the parent returns shell control would also be useful, but the minimum blocker is a checked-in regression test for the lifecycle behavior.

The code change itself is small and I don’t see a source-level objection; this is about matching the proof floor for a process-liveness fix.

@ShinonKagura ShinonKagura force-pushed the pr1-fix-6-child-unref branch from 65024d7 to d36c619 Compare May 28, 2026 16:40
…xit (FIX-6)

Found in real-Pi smoke test (Pi 0.75.5):

The RPC child controller spawns step children with detached:true (non-Win)
and piped stdio. Without child.unref(), libuv tracks the ChildProcess
handle as a live reference even after the child has exited cleanly and
all stdio has closed. The parent Pi process therefore cannot exit after
`pi -p ...` returns its terminal response.

Symptom: `pi -p` hangs at the shell prompt after the model's terminal
JSON. Mark had to open a new terminal after every smoke run.

Fix: call this.child.unref?.() immediately after spawn. Exit/close/stderr
listeners still fire after unref(); completion handling is unchanged.

unref() is the standard Node.js pattern for detached:true + piped stdio
when the parent uses Promise-based completion tracking instead of
event-loop liveness as the wait mechanism.
@ShinonKagura ShinonKagura force-pushed the pr1-fix-6-child-unref branch from d36c619 to 51e4939 Compare May 28, 2026 16:46
@ShinonKagura

Copy link
Copy Markdown
Author

Thanks — I updated this PR with the checked-in lifecycle proof you asked for and rebased it onto current upstream/main.

Current state:

  • base: upstream/main @ 4d9734b
  • branch/head: pr1-fix-6-child-unref @ 51e4939

What changed:

  • kept the implementation to the single lifecycle line after spawn:
this.child = this.spawnProcess(invocation.command, invocation.args, {
  cwd: this.options.cwd,
  shell: false,
  stdio: ["pipe", "pipe", "pipe"],
  detached: process.platform !== "win32",
});
this.child.unref?.();
  • added tests/rpc-child-controller-unref.test.ts with a fake child and unrefCalls spy;
  • proves normal terminal closeout still resolves after unref();
  • proves cancel still terminates the child with SIGTERM;
  • proves timeout still terminates the child with SIGTERM;
  • no version bump or release-section rewrite; changelog entry remains under ## Unreleased.

Validation after rebasing onto current upstream/main:

pnpm run typecheck  ✅ pass
pnpm test           ✅ pass (247/247)
git diff --check    ✅ pass

pnpm run gate caveat:

upstream/main @ 4d9734b: 244/244 tests pass, then smoke:fake-pi fails at the /tmp artifact-path assertion
this PR      @ 51e4939: 247/247 tests pass, then the same smoke:fake-pi /tmp assertion fails

I reproduced that baseline gate failure on current upstream/main three times in the same environment. So the remaining gate failure is pre-existing in the smoke check and not introduced by this PR.

Repro commands:

# PR branch
cd /tmp/pi-multiagent-upstream
git checkout pr1-fix-6-child-unref
pnpm run typecheck
pnpm test
git diff --check
pnpm run gate  # expected in this env: tests pass, then baseline smoke:fake-pi /tmp assertion fails

# Baseline comparison
git checkout --detach upstream/main
pnpm run gate  # expected in this env: 244/244 tests pass, then same smoke:fake-pi /tmp assertion fails

I also ran independent local read-only reviews against the final branch state; no blockers were found.

@tiziano-contorno tiziano-contorno left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — the revised branch adds the lifecycle proof requested. I verified the checked-in unref regression test, normal closeout, cancel/timeout termination paths, typecheck, full tests, smoke, pack/load checks, source-size/public-doc checks, diff whitespace, and the independent unref probe. Merging with the maintenance release.

@tiziano-contorno tiziano-contorno merged commit 972ee1a into Tiziano-AI:main May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants