fix: 上游連線錯誤改為人話 + 自動重試#62
Merged
Merged
Conversation
Replace raw Node.js error codes (ETIMEDOUT, ENOTFOUND, etc.) with classified messages showing host, plain-language label, agent version, and recovery hints. Transient errors (ETIMEDOUT, ENOTFOUND, EHOSTUNREACH, ECONNREFUSED, EAI_AGAIN) auto-retry once after 1s before failing. Deduplicate socket+request error logging via setImmediate flag. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
繁中摘要
網路瞬斷時 ccxray 印出
❌ PROXY ERROR: ETIMEDOUT+❌ UPSTREAM SOCKET ERROR: ETIMEDOUT兩行 Node.js 內部 error code,使用者完全不知道發生什麼、也不知道怎麼恢復。這個 PR 做三件事:
connection timed out、ENOTFOUND →DNS lookup failed等,附上恢復指引(check your network connection)和 agent 版本([Claude Code 2.1.170])setImmediate+ flag 去重)只改 server-side(
forward.js),前端不需要改 — ETIMEDOUT 時根本沒有 entry 被建立,dashboard 看不到失敗的 request。Problem
When the upstream connection (ccxray → api.anthropic.com) fails due to a transient network issue, the terminal output is:
This violates Nielsen #9 (help users recognize, diagnose, and recover from errors) — raw Node.js error codes give no actionable guidance. It also violates Norman's mapping principle — two error lines for one failure makes users think two different things broke.
Solution
1. Error classification (
describeUpstreamError)Maps Node.js error codes to human-readable labels + recovery hints:
ETIMEDOUTENOTFOUNDEHOSTUNREACHECONNREFUSEDEAI_AGAINECONNRESETEPIPEIncludes the latest detected agent version from
store.versionIndex(e.g.[Claude Code 2.1.170]) for debugging context.2. Auto-retry (
sendUpstream)Wraps the outbound
transport.request()in asendUpstream(attempt)function. Transient errors retry once after 1s delay. Guards:clientRes.destroyedcheck prevents retry after client disconnect;activeRequestsis only decremented on final failure (not on retry), preventing double-decrement.3. Socket error dedup
The socket
errorevent fires before the requesterrorevent for the same failure. AreqErrorHandledflag +setImmediatein the socket handler suppresses the duplicate line while preserving the safety-net for late socket errors (EPIPE/ECONNRESET after response received) that don't propagate to the request.Output comparison
Before:
After (retry succeeds):
After (retry fails):
Test plan
describeUpstreamErrortests)retryable: falsewith no hinterr.message🤖 Generated with Claude Code