Skip to content

fix(ipc): close uninitialized supervised process#1774

Open
rosetta-livekit-bot[bot] wants to merge 1 commit into
1.5.0from
civies-hopes-remedy
Open

fix(ipc): close uninitialized supervised process#1774
rosetta-livekit-bot[bot] wants to merge 1 commit into
1.5.0from
civies-hopes-remedy

Conversation

@rosetta-livekit-bot

@rosetta-livekit-bot rosetta-livekit-bot Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Ports livekit/agents#6051 to agents-js.\n\nWhen a supervised process is closed before initialization completes, reject the initialization future to unblock the supervisor task, kill the child process directly, and wait for join cleanup instead of sending a graceful shutdown the child cannot handle yet.\n\nTests:\n- pnpm test agents/src/ipc/supervised_proc.test.ts\n- pnpm build:agents\n\nNo tests were added, matching the porting instruction.


Ported from livekit/agents#6051

Original PR description

Summary

test_slow_initialization flakes on CI with a leaked ProcJobExecutor._supervise_task at teardown. The root cause is a real shutdown race in SupervisedProc, not a test issue.

When the owning task (e.g. ProcPool._proc_spawn_task during pool close) is cancelled while awaiting start(), the shielded _start() keeps running and creates the supervise task afterwards. The cleanup path then calls aclose(), which no-ops because the proc isn't marked started yet — leaking the supervise task and the child process. The orphaned child blocks forever waiting for InitializeRequest, and its non-daemon join thread can hang worker shutdown indefinitely.

Changes

  • start() keeps a handle on the shielded _start() task; aclose() waits for it before checking started, so an abandoned start can't race past the check.
  • aclose() kills a never-initialized process instead of attempting a graceful shutdown: the child reads InitializeRequest before servicing any other message, so a ShutdownRequest could never be acked.
  • kill() resolves a pending _initialize_fut so the supervise task (which waits on it before supervising) can observe the process exit.
  • Added test_aclose_after_cancelled_start, which deterministically reproduces the leak (fails the leaked-tasks check and hangs pytest exit without the fix).

nit: Also sets the agent session tests to speed = 1 and raises the default drain_delay — they run under virtual time, so the speed factor no longer buys wall-clock time.

@changeset-bot

changeset-bot Bot commented Jun 12, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: 409b1b6

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@rosetta-livekit-bot rosetta-livekit-bot Bot requested a review from longcw June 12, 2026 00:17

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

Open in Devin Review

Comment on lines +232 to +238
if (!this.init.done) {
this.init.reject(new Error('process closed before initialization completed'));
this.proc?.kill();
await this.#join.await.then(() => {
this.clearTimers();
});
return;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 initialize() caller may hang when close() is called during initialization

The new close() path correctly rejects init and resolves #join, fixing the issue where close() itself would hang. However, if a caller (e.g. proc_pool.ts:108-109 in procWatchTask) is concurrently awaiting proc.initialize(), that call hangs forever on once(this.proc!, 'message') at supervised_proc.ts:217 because killing a child process emits 'exit'/'close' but NOT 'error', so events.once() never resolves or rejects. This is a pre-existing issue not introduced by this PR — in fact, the old code was worse because close() itself would also hang. The fix here is a meaningful improvement, but a complete solution would also need initialize() to detect the killed process (e.g., by racing once(proc, 'message') with once(proc, 'exit') or using an AbortSignal).

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants