Skip to content

fix: 上游連線錯誤改為人話 + 自動重試#62

Merged
lis186 merged 1 commit into
mainfrom
fix/upstream-error-ux
Jun 10, 2026
Merged

fix: 上游連線錯誤改為人話 + 自動重試#62
lis186 merged 1 commit into
mainfrom
fix/upstream-error-ux

Conversation

@lis186

@lis186 lis186 commented Jun 10, 2026

Copy link
Copy Markdown
Owner

繁中摘要

網路瞬斷時 ccxray 印出 ❌ PROXY ERROR: ETIMEDOUT + ❌ UPSTREAM SOCKET ERROR: ETIMEDOUT 兩行 Node.js 內部 error code,使用者完全不知道發生什麼、也不知道怎麼恢復。

這個 PR 做三件事:

  • 錯誤分類:ETIMEDOUT → connection timed out、ENOTFOUND → DNS lookup failed 等,附上恢復指引(check your network connection)和 agent 版本([Claude Code 2.1.170]
  • 自動重試:暫時性錯誤(ETIMEDOUT / ENOTFOUND / EHOSTUNREACH / ECONNREFUSED / EAI_AGAIN)自動重試 1 次(1s delay),吸收最常見的瞬斷
  • 去重:同一個 ETIMEDOUT 不再印兩行(socket + request error handler 用 setImmediate + flag 去重)

只改 server-side(forward.js),前端不需要改 — ETIMEDOUT 時根本沒有 entry 被建立,dashboard 看不到失敗的 request。


Problem

When the upstream connection (ccxray → api.anthropic.com) fails due to a transient network issue, the terminal output is:

❌ PROXY ERROR: ETIMEDOUT
❌ UPSTREAM SOCKET ERROR: ETIMEDOUT

This violates Nielsen #9 (help users recognize, diagnose, and recover from errors) — raw Node.js error codes give no actionable guidance. It also violates Norman's mapping principle — two error lines for one failure makes users think two different things broke.

Solution

1. Error classification (describeUpstreamError)

Maps Node.js error codes to human-readable labels + recovery hints:

Code Label Hint Retryable
ETIMEDOUT connection timed out check your network connection
ENOTFOUND DNS lookup failed check your network or DNS settings
EHOSTUNREACH host unreachable check your network connection
ECONNREFUSED connection refused is the upstream API available?
EAI_AGAIN DNS temporarily unavailable DNS will likely recover on its own
ECONNRESET connection reset by peer
EPIPE broken pipe

Includes the latest detected agent version from store.versionIndex (e.g. [Claude Code 2.1.170]) for debugging context.

2. Auto-retry (sendUpstream)

Wraps the outbound transport.request() in a sendUpstream(attempt) function. Transient errors retry once after 1s delay. Guards: clientRes.destroyed check prevents retry after client disconnect; activeRequests is only decremented on final failure (not on retry), preventing double-decrement.

3. Socket error dedup

The socket error event fires before the request error event for the same failure. A reqErrorHandled flag + setImmediate in the socket handler suppresses the duplicate line while preserving the safety-net for late socket errors (EPIPE/ECONNRESET after response received) that don't propagate to the request.

Output comparison

Before:

❌ PROXY ERROR: ETIMEDOUT
❌ UPSTREAM SOCKET ERROR: ETIMEDOUT

After (retry succeeds):

⏳ api.anthropic.com: connection timed out (ETIMEDOUT) [Claude Code 2.1.170] — retrying…
📥 [10:24:08]  ✓ 200  2.3s  out=1,234 tok

After (retry fails):

⏳ api.anthropic.com: connection timed out (ETIMEDOUT) [Claude Code 2.1.170] — retrying…
❌ api.anthropic.com: connection timed out (ETIMEDOUT) [Claude Code 2.1.170] — retry failed
   → check your network connection

Test plan

  • 845 tests pass (838 existing + 7 new describeUpstreamError tests)
  • All 5 retryable codes classified correctly
  • Non-retryable codes (ECONNRESET, EPIPE) return retryable: false with no hint
  • Unknown error codes fall back to err.message
  • Missing error code handled gracefully

🤖 Generated with Claude Code

Replace raw Node.js error codes (ETIMEDOUT, ENOTFOUND, etc.) with
classified messages showing host, plain-language label, agent version,
and recovery hints. Transient errors (ETIMEDOUT, ENOTFOUND, EHOSTUNREACH,
ECONNREFUSED, EAI_AGAIN) auto-retry once after 1s before failing.
Deduplicate socket+request error logging via setImmediate flag.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@lis186 lis186 merged commit 3080ee6 into main Jun 10, 2026
2 checks passed
@lis186 lis186 deleted the fix/upstream-error-ux branch June 10, 2026 03:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant