Fix serverless crashes (socket hang up) and 60s idle invocations in MCP function#185
Conversation
Two serverless-vs-long-lived-connection problems surfaced as Netlify function errors: 1. socket hang up / ECONNRESET -> Unhandled Promise Rejection -> 'Invalid request ID'. The cached upstream MCP clients (Kapa, Bump) hold persistent connections reused across warm invocations. When the container freezes/thaws, the idle socket is dropped and the error fires in the transport's background read loop with no awaiter, so the runtime kills the invocation. Fix: set onerror/onclose on each transport to reset the cached connection at the source, plus a process-level unhandledRejection/uncaughtException safety net that logs and resets instead of crashing. 2. Duration: 60000 ms. Every connected client opens the optional GET SSE stream; this server is request/response only, so on serverless that stream idles open until the function's max duration — a wasted full-length invocation per client. Fix: decline GET with 405 (spec allows this when no SSE stream is offered; clients use POST). Bumps SERVER_VERSION to 1.1.3.
✅ Deploy Preview for redpanda-documentation ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
micheleRP
left a comment
There was a problem hiding this comment.
Approving — the serverless fixes are well-reasoned and the implementation is sound. I verified the one thing that could have undermined the transport-level handlers: in @modelcontextprotocol/sdk@1.17.0, Protocol.connect() (dist/esm/shared/protocol.js) assigns this._transport = transport and then captures and chains the existing onerror/onclose (_onerror?.(error); this._onerror(error)) rather than overwriting them. So the handlers you set before kapaClient.connect(...) / bumpClient.connect(...) do fire. The reconnect-after-reset path is also already exercised by the existing catch blocks, and the 405 on GET is spec-compliant and trivially revertable.
One non-blocking discussion point for your consideration:
- Breadth of the
uncaughtExceptionguard. TheunhandledRejectionhandler is well-targeted at exactly the described failure (a background-read-loop rejection with no awaiter that the runtime treats as fatal). TheuncaughtExceptionhandler is broader: swallowing it and continuing leaves the process in a state Node's own docs flag as unsafe, and it catches all exceptions process-wide, not just upstream socket drops. The mitigation is real — the only recovery action is nulling cached connection promises (cheap and safe), and everything is logged so real bugs stay visible. Might be worth scoping it to known socket errors (ECONNRESET/socket hang up) and rethrowing otherwise, vs. keeping the broad catch for serverless robustness. Your call — not a blocker.
Per review: the broad uncaughtException handler could mask real bugs and leave the process in an unsafe state. Recover only from known upstream socket drops (ECONNRESET / socket hang up / EPIPE / ECONNREFUSED); log and re-throw anything else so it surfaces normally. unhandledRejection (the actual incident path) stays a recover-all, as reviewed.
|
Thanks @micheleRP — and thanks for verifying the SDK Addressed the |

Problem
The MCP function was logging two distinct serverless-vs-long-lived-connection failures:
socket hang up/ECONNRESET→ Unhandled Promise Rejection →LAMBDA_RUNTIME Failed to post handler success response … Invalid request ID. The upstream MCP clients (kapaClient,bumpClient) are module-global and hold persistent connections reused across warm invocations. When the Lambda container freezes between requests, the idle upstream socket is dropped; on thaw, Node emits the socket error inside the transport's background read loop — a rejection with no awaiter. The existingisTransientErrorretry only catches errors during a tool call, so this slips through as an unhandled rejection and the runtime kills the invocation.Duration: 60000 msinvocations. Every connected client opens Streamable HTTP's optionalGETserver→client SSE stream. This server is request/response only (it never pushes server-initiated messages), so on serverless that stream just idles open until the function hits its max duration — a wasted full-length invocation per connected client.Fix
onerror/oncloseon both Kapa and Bump transports → reset the cached connection at the source, so a dropped socket reconnects on the next request instead of bubbling up.unhandledRejection/uncaughtException, registered once per cold start) → log and reset the cached connections rather than letting the runtime crash the invocation. Errors are logged clearly, so real bugs stay visible.405— the MCP spec explicitly allows405 Method Not AllowedonGETwhen the server doesn't offer an SSE stream there, and clients fall back to POST. Eliminates the 60s invocations.Bumps
SERVER_VERSION→1.1.3.Risk / notes
405onGETis spec-compliant and our supported clients (ChatGPT, Claude, Cursor, VS Code) handle it — sessions initialize over POST, which is unchanged. If any client unexpectedly depends on the GET stream, reverting just that hunk restores the old behavior.mcp.mjsfix targetingmainso it can ship to production quickly.