Skip to content

fix: terminate WebSocket and remove listeners on connectWs timeout#1788

Open
chenghao-mou wants to merge 8 commits into
mainfrom
fix/connectws-timeout-socket-leak
Open

fix: terminate WebSocket and remove listeners on connectWs timeout#1788
chenghao-mou wants to merge 8 commits into
mainfrom
fix/connectws-timeout-socket-leak

Conversation

@chenghao-mou

Copy link
Copy Markdown
Member

Summary

  • When the connection timeout fired in connectWs, the WebSocket was never terminated — if it later connected, resolve(socket) was called on an already-settled promise, leaving an orphaned socket with no owner. This affected all inference callers (STT, TTS, EOT cloud transport).
  • Introduced a shared cleanup() helper that clears the timeout and removes all three listeners (open, error, close). Called by every settlement path (timeout, onOpen, onError, onClose) to prevent cross-path double-firing.
  • The timeout handler now calls socket.terminate() before rejecting, ensuring the socket is always closed when the promise rejects.
  • Changed timeout rejection from APIConnectionError to APITimeoutError for clearer semantics (both are retryable; APITimeoutError extends APIConnectionError).

Test plan

  • Build passes (pnpm build:agents)
  • Manually verify that a deliberate connect timeout (e.g. unreachable host + short timeout) no longer leaks an open socket

🤖 Generated with Claude Code

chenghao-mou and others added 2 commits June 12, 2026 15:13
When the connection timeout fired, the socket was never terminated —
if it later connected, resolve() was called on an already-settled
promise, leaving an orphaned WebSocket with no owner. The fix calls
socket.terminate() and strips the pending listeners in the timeout
handler via a shared cleanup() helper (also used by onOpen/onError/
onClose to prevent cross-path double-firing). Uses APITimeoutError
instead of the generic APIConnectionError for clearer retry semantics.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@changeset-bot

changeset-bot Bot commented Jun 12, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 8ffdee6

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 35 packages
Name Type
@livekit/agents Patch
@livekit/agents-plugin-anam Patch
@livekit/agents-plugin-assemblyai Patch
@livekit/agents-plugin-baseten Patch
@livekit/agents-plugin-bey Patch
@livekit/agents-plugin-cartesia Patch
@livekit/agents-plugin-cerebras Patch
@livekit/agents-plugin-deepgram Patch
@livekit/agents-plugin-did Patch
@livekit/agents-plugin-elevenlabs Patch
@livekit/agents-plugin-fishaudio Patch
@livekit/agents-plugin-google Patch
@livekit/agents-plugin-hedra Patch
@livekit/agents-plugin-hume Patch
@livekit/agents-plugin-inworld Patch
@livekit/agents-plugin-lemonslice Patch
@livekit/agents-plugin-liveavatar Patch
@livekit/agents-plugin-livekit Patch
@livekit/agents-plugin-minimax Patch
@livekit/agents-plugin-mistral Patch
@livekit/agents-plugin-mistralai Patch
@livekit/agents-plugin-neuphonic Patch
@livekit/agents-plugin-openai Patch
@livekit/agents-plugin-perplexity Patch
@livekit/agents-plugin-phonic Patch
@livekit/agents-plugin-resemble Patch
@livekit/agents-plugin-rime Patch
@livekit/agents-plugin-runway Patch
@livekit/agents-plugin-sarvam Patch
@livekit/agents-plugin-silero Patch
@livekit/agents-plugin-soniox Patch
@livekit/agents-plugin-tavus Patch
@livekit/agents-plugins-test Patch
@livekit/agents-plugin-trugen Patch
@livekit/agents-plugin-xai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

Open in Devin Review

Comment thread agents/src/inference/utils.ts Outdated
Comment on lines 126 to 130
const onClose = (code: number) => {
clearTimeout(timeout);
cleanup();
if (code !== 1000) {
reject(
new APIConnectionError({

@devin-ai-integration devin-ai-integration Bot Jun 12, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: Close-code semantic change: previously code 1000 did not reject

The old code had if (code !== 1000) which meant a server-initiated normal close (1000) before open would NOT reject the promise, causing a permanent hang. The new if (!opened) logic correctly rejects on ANY close before the connection opens, regardless of close code. This is a behavioral change — in the rare edge case where a server sends code 1000 during the handshake (before open event), the old code would hang while the new code properly rejects. This is the intended fix per the changeset description.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

chenghao-mou and others added 4 commits June 12, 2026 15:25
ws closes the socket automatically after a connection error (always
emits close after error), so terminate() there was redundant. The only
path that needs it is the timeout handler, where no automatic cleanup
occurs. Removing it from onError also untangles the circular dependency
where cleanup() in onOpen was only needed to prevent the onError
handler from calling terminate() on the caller's open socket.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Calling cleanup() in onOpen removed the error and close once-listeners
right as the socket was handed to the caller, leaving a gap with no
error handler. Since onError no longer calls socket.terminate(), the
stale once-listeners are harmless (they'd only do a no-op reject on an
already-settled promise), so there's no reason to strip them. Just
clear the timeout and let the listeners expire naturally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…Error

The only real change from the original: terminate the socket when the
timeout fires (preventing an orphaned WebSocket) and use APITimeoutError
for clearer semantics. The cleanup() helper and listener unregistration
were unnecessary — onError/onClose don't need them since the socket is
closing anyway and the once-listeners expire naturally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

Open in Devin Review

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Pattern alignment with ws_transport.ts

This change aligns connectWs with the established pattern in agents/src/inference/interruption/ws_transport.ts, which already calls ws.terminate() before rejecting with APITimeoutError on connection timeout. The ws_transport.ts version also terminates on error events (not just timeout), which connectWs does not do — the onError and onClose handlers here don't call terminate(). This is a pre-existing inconsistency that could leave sockets lingering after error/close events if the caller doesn't clean up, though in practice the ConnectionPool manages the socket lifecycle.

(Refers to lines 102-129)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the handshake, ws self-cleans on error and close. All the connection-phase error paths (lines 729, 886, 912) funnel through emitErrorAndClose (line 1039):

  function emitErrorAndClose(websocket, err) {
    websocket._readyState = WebSocket.CLOSING;
    websocket._errorEmitted = true;
    websocket.emit('error', err);   // <-- our onError runs here
    websocket.emitClose();
  }

A normal (code 1000) close during the handshake previously left the
ThrowsPromise unsettled forever. Track whether the socket opened and
reject on any close that happens before open, regardless of code. After
open, the lingering close listener is a no-op on the settled promise.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant