Skip to content

fix(core): guard shutdown handler against re-entry and ensure termination via SIGKILL#97

Draft
shibukazu wants to merge 6 commits into
vercel-labs:mainfrom
shibukazu:fix/shutdown-reentry-guard
Draft

fix(core): guard shutdown handler against re-entry and ensure termination via SIGKILL#97
shibukazu wants to merge 6 commits into
vercel-labs:mainfrom
shibukazu:fix/shutdown-reentry-guard

Conversation

@shibukazu

@shibukazu shibukazu commented Jun 27, 2026

Copy link
Copy Markdown

What changed

Added a shutdownStarted re-entry guard to installShutdownHandlers and replaced the conditional process.exit() with process.kill(0, "SIGKILL") after flushActiveRuns().

Why

When an agent SDK (e.g. Codex, Pi) registers its own SIGINT/SIGTERM listener, the handler previously deferred exit because listenerCount > 1. If the co-listener never calls process.exit(), the CLI hangs indefinitely after Ctrl+C.

process.kill(0, "SIGKILL") sends SIGKILL to the entire process group — catching co-listeners that never exit — while flushActiveRuns() has already persisted run state synchronously, so there is no data loss.

Verification

  • `pnpm test` passes
  • `pnpm lint` passes
  • `pnpm knip` passes
  • If this adds a matcher: ran it against at least one real repo and confirmed the candidate count is sane

Notes for reviewer

  • Tests now spawn child processes with `detached: true` so `kill(0, "SIGKILL")` inside the child targets only the child's process group and does not kill Vitest.
  • Exit expectations changed from numeric codes (`130`/`143`) to `signal === "SIGKILL"`.
  • The `sandbox/shutdown.ts` co-listener and its 10 s timeout are unchanged; SIGKILL simply makes the old deferral path unreachable.

… termination

Problem
-------
When an AI agent SDK (e.g. the Codex or Pi SDK) registers its own
SIGINT/SIGTERM listener, `installShutdownHandlers` correctly flushes
active runs but then defers exit because `process.listenerCount(signal)
> 1`. If the co-listener never calls `process.exit()` — as is the case
with some agent SDKs during long-running inference calls — the CLI
hangs indefinitely after Ctrl+C.

A secondary issue: if SIGINT and SIGTERM arrive in rapid succession
(e.g. a process manager sends SIGTERM right after the user hits Ctrl+C),
the handler fires twice, potentially corrupting run metadata that was
already written on the first invocation.

Fix
---
1. Add a `shutdownStarted` boolean guard that causes the handler to
   return immediately on re-entry.

2. After `flushActiveRuns()` completes its synchronous writes, send
   SIGKILL to the calling process's process group (`process.kill(0,
   "SIGKILL")`). SIGKILL cannot be caught or deferred, so this
   guarantees the CLI and any agent child-processes it spawned are
   terminated even when co-listeners are present.
   Run metadata has already been persisted at this point, so there is
   no data loss. A `catch` block falls back to `process.exit()` for
   restricted environments where `kill(0)` is not permitted.

Tests
-----
- Spawn child processes with `detached: true` so `kill(0, "SIGKILL")`
  inside the child kills only its own process group and does not
  propagate back to the Vitest runner.
- Update exit expectations from numeric codes (130/143) to
  `signal === "SIGKILL"`, reflecting the new termination path.
- Replace the "defers to co-listener" test with a "still kills even
  when co-listener hangs" test that directly covers the regression.
@vercel

vercel Bot commented Jun 27, 2026

Copy link
Copy Markdown

@shibukazu is attempting to deploy a commit to the Vercel Labs Team on Vercel.

A member of the Team first needs to authorize it.

@shibukazu shibukazu marked this pull request as draft June 27, 2026 05:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant