fix: terminate WebSocket and remove listeners on connectWs timeout#1788
fix: terminate WebSocket and remove listeners on connectWs timeout#1788chenghao-mou wants to merge 8 commits into
Conversation
When the connection timeout fired, the socket was never terminated — if it later connected, resolve() was called on an already-settled promise, leaving an orphaned WebSocket with no owner. The fix calls socket.terminate() and strips the pending listeners in the timeout handler via a shared cleanup() helper (also used by onOpen/onError/ onClose to prevent cross-path double-firing). Uses APITimeoutError instead of the generic APIConnectionError for clearer retry semantics. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
🦋 Changeset detectedLatest commit: 8ffdee6 The changes in this PR will be included in the next version bump. This PR includes changesets to release 35 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| const onClose = (code: number) => { | ||
| clearTimeout(timeout); | ||
| cleanup(); | ||
| if (code !== 1000) { | ||
| reject( | ||
| new APIConnectionError({ |
There was a problem hiding this comment.
📝 Info: Close-code semantic change: previously code 1000 did not reject
The old code had if (code !== 1000) which meant a server-initiated normal close (1000) before open would NOT reject the promise, causing a permanent hang. The new if (!opened) logic correctly rejects on ANY close before the connection opens, regardless of close code. This is a behavioral change — in the rare edge case where a server sends code 1000 during the handshake (before open event), the old code would hang while the new code properly rejects. This is the intended fix per the changeset description.
Was this helpful? React with 👍 or 👎 to provide feedback.
ws closes the socket automatically after a connection error (always emits close after error), so terminate() there was redundant. The only path that needs it is the timeout handler, where no automatic cleanup occurs. Removing it from onError also untangles the circular dependency where cleanup() in onOpen was only needed to prevent the onError handler from calling terminate() on the caller's open socket. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Calling cleanup() in onOpen removed the error and close once-listeners right as the socket was handed to the caller, leaving a gap with no error handler. Since onError no longer calls socket.terminate(), the stale once-listeners are harmless (they'd only do a no-op reject on an already-settled promise), so there's no reason to strip them. Just clear the timeout and let the listeners expire naturally. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…Error The only real change from the original: terminate the socket when the timeout fires (preventing an orphaned WebSocket) and use APITimeoutError for clearer semantics. The cleanup() helper and listener unregistration were unnecessary — onError/onClose don't need them since the socket is closing anyway and the once-listeners expire naturally. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
🚩 Pattern alignment with ws_transport.ts
This change aligns connectWs with the established pattern in agents/src/inference/interruption/ws_transport.ts, which already calls ws.terminate() before rejecting with APITimeoutError on connection timeout. The ws_transport.ts version also terminates on error events (not just timeout), which connectWs does not do — the onError and onClose handlers here don't call terminate(). This is a pre-existing inconsistency that could leave sockets lingering after error/close events if the caller doesn't clean up, though in practice the ConnectionPool manages the socket lifecycle.
(Refers to lines 102-129)
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
During the handshake, ws self-cleans on error and close. All the connection-phase error paths (lines 729, 886, 912) funnel through emitErrorAndClose (line 1039):
function emitErrorAndClose(websocket, err) {
websocket._readyState = WebSocket.CLOSING;
websocket._errorEmitted = true;
websocket.emit('error', err); // <-- our onError runs here
websocket.emitClose();
}
A normal (code 1000) close during the handshake previously left the ThrowsPromise unsettled forever. Track whether the socket opened and reject on any close that happens before open, regardless of code. After open, the lingering close listener is a no-op on the settled promise. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
connectWs, theWebSocketwas never terminated — if it later connected,resolve(socket)was called on an already-settled promise, leaving an orphaned socket with no owner. This affected all inference callers (STT, TTS, EOT cloud transport).cleanup()helper that clears the timeout and removes all three listeners (open,error,close). Called by every settlement path (timeout,onOpen,onError,onClose) to prevent cross-path double-firing.socket.terminate()before rejecting, ensuring the socket is always closed when the promise rejects.APIConnectionErrortoAPITimeoutErrorfor clearer semantics (both are retryable;APITimeoutError extends APIConnectionError).Test plan
pnpm build:agents)🤖 Generated with Claude Code