fix(websocket): reset pinned auth after unrecoverable upstream auth errors by excelwang · Pull Request #1946 · router-for-me/CLIProxyAPI

excelwang · 2026-03-08T00:51:35Z

Summary

This fixes a severe websocket failover bug where a session could stay pinned to an exhausted or invalid upstream auth even after the upstream had already returned an unrecoverable credential error.

In practice, once a websocket session got pinned to a bad auth, subsequent turns in the same session could continue failing instead of switching to another healthy credential. For multi-auth deployments, that turns a single upstream auth failure into a user-visible session outage.

Root Cause

/v1/responses websocket handling keeps a pinned auth for session continuity, but the pin was not cleared when upstream returned quota exhaustion or token invalidation style errors.

Fix

detect unrecoverable upstream auth / quota errors in websocket forwarding
reset the pinned auth when those errors are observed
close the upstream execution session so the next turn can reselect a healthy auth

Why this matters

This is operationally severe because:

one bad upstream auth can poison the whole websocket session
healthy fallback auths remain unused even though scheduler failover is otherwise available
users see repeated failures in the same interactive session until they reconnect or the bad auth recovers

Tests

go test ./sdk/api/handlers/openai -count=1

Scope

This PR is limited to websocket pinned-auth recovery behavior and does not include any usage aggregation changes.

gemini-code-assist · 2026-03-08T00:51:39Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

excelwang · 2026-03-08T00:54:07Z

Concrete failure mode this fixes:

a /v1/responses websocket session pins auth A
auth A later hits quota exhaustion or token invalidation upstream
the same websocket session keeps the stale pin
subsequent turns keep failing even though another healthy auth is available

So a single bad upstream credential can effectively poison the whole interactive session until reconnect. This PR only resets the pinned auth on unrecoverable upstream auth/quota signals so the next turn can fail over normally.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 875491db34

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-08T00:54:42Z

sdk/api/handlers/openai/openai_responses_websocket.go

+		return false
+	}
+	switch errMsg.StatusCode {
+	case http.StatusUnauthorized, http.StatusPaymentRequired, http.StatusForbidden, http.StatusNotFound, http.StatusTooManyRequests:


Avoid resetting pinned auth on generic 404 responses

shouldResetPinnedAuthForWebsocketError currently treats any 404 as an unrecoverable auth error, but websocket upstreams can return 404 for request/state issues (for example, stale previous_response_id) that are not credential failures. In that case this branch clears pinnedAuthID and closes the execution session, so the next turn can be routed to a different auth and lose session continuity even if the client corrects the request. This makes a recoverable request error degrade into cross-auth context loss.

Useful? React with 👍 / 👎.

xkonjin

Quick review pass:

Main risk area here is auth/session state and stale credential handling.
Good to see test coverage move with the code; I’d still make sure it exercises the unhappy path around auth/session state and stale credential handling rather than only the happy path.
Before merge, I’d smoke-test the behavior touched by openai_responses_websocket.go, openai_responses_websocket_test.go with malformed input / retry / rollback cases, since that’s where this class of change usually breaks.

excelwang · 2026-03-08T01:58:50Z

Follow-up on the stale credential / unhappy-path concern:

I re-ran targeted tests against the touched websocket path.

Re-ran locally:

go test ./sdk/api/handlers/openai -run "TestShouldResetPinnedAuthForWebsocketError|TestNormalizeResponsesWebsocketRequestCreateWithHistory|TestNormalizeResponsesWebsocketRequestWithPreviousResponseIDMergedWhenIncrementalDisabled" -count=1

Coverage split is:

TestShouldResetPinnedAuthForWebsocketError exercises the unrecoverable upstream auth/quota signals that should break the stale pin (token_invalidated, quota/429-style signal, and non-credential control case).
The existing normalization tests in the same file cover the request-shaping path around websocket history / previous-response handling so the pin-reset change is not only validated on the happy path.

This change does not alter retry backoff logic itself; it only decides when a poisoned pinned auth must be dropped so the next turn can fail over normally.

luispater

Summary

This correctly clears the pinned auth after unrecoverable websocket auth/quota errors, but it still leaves websocket-v2 continuation enabled for the next turn.

Key findings

Blocking: after resetPinnedAuth, the handler clears pinnedAuthID, but the next request still falls back to websocketUpstreamSupportsIncrementalInputForModel(...). That check only asks whether any available auth for the model supports websocket incremental input, so normalizeResponseSubsequentRequest(...) can still forward the old previous_response_id unchanged to a newly selected auth. At that point the new auth does not own the old upstream continuation/session, so the next turn can still fail with the same continuation error instead of actually failing over.
Non-blocking: the added test only covers the helper classification. Please add an end-to-end regression that pins auth A, forces an unrecoverable websocket error, then sends a follow-up turn with previous_response_id and verifies the handler no longer forwards that upstream continuation token after the pin reset.

Test plan

go test ./sdk/api/handlers/openai -count=1

chatgpt-codex-connector · 2026-03-09T02:20:13Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

excelwang · 2026-03-09T02:20:32Z

Follow-up pushed in a3caa0bc.

This addresses the continuation-token blocker, not just the helper classification. After an unrecoverable websocket auth/quota error resets the pinned auth, the next turn now forces incremental continuation off once, so the stale previous_response_id is not forwarded to the newly selected auth.

I also added the requested end-to-end regression test:

TestResponsesWebsocket_DropsPreviousResponseIDAfterPinnedAuthReset

That scenario covers:

auth A handles the first turn and establishes the original response id
a follow-up turn on pinned auth A hits an unrecoverable websocket auth error (token_invalidated)
the next turn still arrives with the old previous_response_id
failover switches to auth B
the forwarded upstream payload for auth B no longer contains that stale continuation token

Re-ran:

go test ./sdk/api/handlers/openai -count=1

xkonjin

Code Review

Overall: ✅ Approvable — the core logic is correct, but the continuation-token leak that luispater flagged needs verification.

What's good

resetPinnedAuthDisablesIncrementalInput flag correctly blocks previous_response_id forwarding for exactly one turn after a pin reset — the flag is cleared only after a successful non-resetting turn (!resetPinnedAuth), which is the right state machine.
shouldResetPinnedAuthForWebsocketError is well-scoped: status codes 401/402/403/404/429 plus textual quota/invalidation signals. Limiting 404 to the websocket path (where it indicates the upstream session is gone) is correct — this is a different semantic than 404 on REST.
CloseExecutionSession is called on pin reset, which is important for connection pool hygiene.
The end-to-end test (TestResponsesWebsocket_DropsPreviousResponseIDAfterPinnedAuthReset) exercises the full sequence: success → auth failure → failover with stripped continuation token. This directly addresses luispater's requested regression test.

Issue (luispater's blocker — needs verification)

The test verifies executor.payloads[2] does not contain previous_response_id. But look at the turn-3 request construction:

thirdRequest := fmt.Sprintf(`{..., "previous_response_id":%q,...}`, previousResponseID)

The client sends previous_response_id on turn 3. The test then checks that the upstream payload (executor.payloads[2]) doesn't contain it — which exercises normalizeResponseSubsequentRequest stripping the field when allowIncrementalInputWithPreviousResponseID == false.

Please confirm that normalizeResponseSubsequentRequest (or equivalent) is the codepath that removes previous_response_id when the flag is false, and that this behavior is exercised here. If the test passes by coincidence (e.g. the field is stripped elsewhere unconditionally), the regression coverage is weaker than it appears.

Non-blocking

The resetPinnedAuthDisablesIncrementalInput flag persists until a successful non-resetting turn. If there are multiple consecutive auth failures (second auth also fails), the flag stays true correctly — but consider whether back-to-back resets could put the session in a state where no incremental input is ever re-enabled if new auths keep failing. Probably fine for the edge case, but worth a comment.
strings.TrimSpace(pinnedAuthID) is used in the log comparison but pinnedAuthID is always assigned from auth ID strings that shouldn't have whitespace. The trim is harmless but slightly misleading.

fix(websocket): unpin auth when codex credential is exhausted

875491d

chatgpt-codex-connector bot reviewed Mar 8, 2026

View reviewed changes

xkonjin reviewed Mar 8, 2026

View reviewed changes

luispater requested changes Mar 9, 2026

View reviewed changes

fix(websocket): drop stale continuation after auth reset

a3caa0b

xkonjin reviewed Mar 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(websocket): reset pinned auth after unrecoverable upstream auth errors#1946

fix(websocket): reset pinned auth after unrecoverable upstream auth errors#1946
excelwang wants to merge 2 commits intorouter-for-me:mainfrom
excelwang:pr/upstream-websocket-unpin-auth

excelwang commented Mar 8, 2026

Uh oh!

gemini-code-assist bot commented Mar 8, 2026

Uh oh!

excelwang commented Mar 8, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 8, 2026

Uh oh!

xkonjin left a comment

Uh oh!

excelwang commented Mar 8, 2026

Uh oh!

luispater left a comment

Uh oh!

chatgpt-codex-connector bot commented Mar 9, 2026

Uh oh!

excelwang commented Mar 9, 2026

Uh oh!

xkonjin left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

excelwang commented Mar 8, 2026

Summary

Root Cause

Fix

Why this matters

Tests

Scope

Uh oh!

gemini-code-assist bot commented Mar 8, 2026

Uh oh!

excelwang commented Mar 8, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

xkonjin left a comment

Choose a reason for hiding this comment

Uh oh!

excelwang commented Mar 8, 2026

Uh oh!

luispater left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot commented Mar 9, 2026

Uh oh!

excelwang commented Mar 9, 2026

Uh oh!

xkonjin left a comment

Choose a reason for hiding this comment

Code Review

What's good

Issue (luispater's blocker — needs verification)

Non-blocking

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants