fix(#58): use anthropic-beta context-1m header as non-lagging 1M context-window signal#60
Merged
Conversation
…ow signal getMaxContext took model identity from the (lagging) system-prompt marker, so after a mid-session model switch the context-window denominator resolved to a stale model for several turns — ctx% showed ~99% then cliff-dropped to ~20%. - identity now comes from the request `model` field (updates immediately) - 1M detected from anthropic-beta `context-1m-2025-08-07` (empirically confirmed present on every turn) OR the legacy [1m] marker, gated by SUPPORTS_1M so the client-level beta flag riding haiku title-gen turns can't over-claim 1M - usage escape-hatch retained as monotonic lower bound - beta1m threaded from request header through index→forward→wire-parser + HUD - docs/wire-protocol-reference.md: record context-1m-2025-08-07 + rate-limit≠window Antifragile: any single signal can vanish (beta token renamed, marker never arrives, model absent from table) and detection degrades to the next source, never below the usage lower bound. New 1M families = one line in SUPPORTS_1M. Red→green: test/config.test.js adds fail-on-old/pass-on-new cases keyed on the issue's turn-35 data (stale marker + opus-4-8 + beta1m must return 1M).
The test compared the selected project label against the raw cwd basename, but the dashboard renders it through truncateMiddle(name, 20). When the suite runs from a git worktree (long '.claude/worktrees/<branch>' basename) the label is truncated and the assertion failed spuriously. Mirror the UI truncation so the test is independent of the checkout path length.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
繁中摘要
Claude Code 在 session 中途切換 model 後,per-turn 的 ctx% 會錯(例如 99% → 20% 一回合崩塌)。根因不是「切換」本身,而是
getMaxContext()拿 context-window 分母時,偏好會落後的 system-prompt model marker 而非 requestmodel欄位:切換到claude-opus-4-8後,requestmodel立即更新,但系統提示的"The exact model ID is ..."會落後好幾個 turn 仍報舊 model(且無[1m]),於是那幾個 turn 用錯的 200K 分母算出 ~99%,等 usage 衝破 200K 才靠 escape-hatch 跳 1M,造成可見的斷崖。修法:改用
anthropic-beta: context-1m-2025-08-07request header 當 1M 訊號。實測(loopback 明文抓包)確認此 header 在 1M 啟用時每個 request 都帶、不落後,是真正的 plan 開關。因為它是 client/帳號級旗標(連 haiku title-gen 也帶),所以用SUPPORTS_1M閘門限定 1M-capable model 才套用,避免把 haiku 誤標 1M。model 身份改取自 requestmodel(即時),usage 觀測保留為單調下界。反脆弱:任何單一訊號失效(beta token 改名/消失、marker 永不來、model 不在表)都只是少一票、降級到下一來源,絕不低於 usage 下界、絕不回退。新增 1M model 家族=
SUPPORTS_1M加一行,非改邏輯。實測也讓原本規劃的 session-sticky(Option B)變成多餘而拿掉(via negativa)——因為 header 每個 turn 都在。驗證:
config.test.js紅→綠 differential(用 issue 的 turn-35 資料);全套件 823/823 綠;browser-harness(真 Chrome)9 項 dashboard ctx% 顯示斷言全過;mock-upstream live 路徑證明只差 header → maxContext 200K↔1M。Problem
When the model is switched mid-session, per-turn
ctx%jumps wrong (e.g. 99% → 20% in one turn). The model switch is only a temporal coincidence — the real cause is the context-window denominator resolving against a stale model for several turns.server/config.jsgetMaxContext()preferred the system-prompt model marker over the requestmodelfield:After switching to
claude-opus-4-8, the requestmodelupdates immediately, but Claude Code's"The exact model ID is ..."line lags several turns, still emittingclaude-opus-4-6(no[1m]). Those turns resolve to 200K → ctx% ~99%. When usage finally crosses 200K,inferMaxContext's usage escape-hatch bumps to 1M → the visible 99.8% → 20.2% cliff.Fix
Use the
anthropic-beta: context-1m-2025-08-07request header as the 1M signal.Empirically confirmed via a loopback cleartext capture of this session's own traffic: the header is present on every request when the account's 1M context is enabled — it does not lag a mid-session model switch. It is the actual plan gate.
getMaxContext(model, system, opts):modelfield (updates immediately); the system marker is only an identity fallback.opts.beta1m(header) or the legacy[1m]system marker is present...SUPPORTS_1M(/^claude-(opus|sonnet)-4/). The header is a client/account-level flag that also rides haiku title-gen requests, so without the capability gate it would over-claim a 1M window for haiku. New 1M families are one line inSUPPORTS_1M, not a logic change.beta1mis derived once fromanthropic-betaat request receipt (server/index.js) and threaded throughforward.js→ the anthropic wire-parser and the terminal HUD.Why this is antifragile, not just robust
Any single signal can disappear and detection degrades to the next source, never below the usage lower bound, never re-claiming below a value already shown:
[1m]marker + usage[1m]marker never arrives (title-gen)The empirical finding also let us remove the originally-planned session-sticky state (Option B in #58): because the header is per-turn, session stickiness is redundant — via negativa, not added machinery.
Files
server/config.js—getMaxContext/inferMaxContextrewrite +SUPPORTS_1Mserver/index.js— derivebeta1mfromanthropic-beta, carry onctxserver/forward.js,server/wire-parsers/anthropic.js,server/helpers.js— threadbeta1mintoinferMaxContext(entry build + HUD)docs/wire-protocol-reference.md— recordcontext-1m-2025-08-07; note rate-limit headers ≠ context windowtest/config.test.js— differential teststest/dashboard-codex-e2e.test.js— make the project-label assertion truncation-aware (passed spuriously-failed only from long worktree paths)Tests
config.test.js— fail-on-old / pass-on-new keyed on the issue's turn-35 data (stale marker + opus-4-8 +beta1mmust return 1M; haiku +beta1mmust stay 200K). Red on the unmodified function, green after.Verification — dashboard ctx% (browser-harness, real Chrome)
Isolated instance seeded with a fixture reproducing the turn 34–40 sequence (a "fixed-1m" session, a "buggy-200k" contrast session, a haiku turn, a Codex session). Confirmed that the dashboard uses the server's
e.maxContext(it does not re-derive the window client-side), so the server fix drives the display:[4,0,0,0])196,602 / 1,000,000 (20%); minimapdata-max-context=1000000ctx-critical/risk-critical)8,000 / 200,000 (4%)(a 1M denominator would show ~1%)81 → 98 → 100%(critical)→ 20%— the reported cliff, proving the dashboard faithfully reflectsmaxContextMath.maxkeeps 1M)Verification — live header → maxContext (mock upstream)
Two requests through a mock Anthropic SSE upstream, identical except the header, usage fixed at 50,000 (rules out the usage hatch; marker is a stale
opus-4-6with no[1m]):anthropic-beta: ...,context-1m-2025-08-07,...Exercises the full path:
index.jsheader derivation →ctx.beta1m→forward.js→ wire-parser →config.getMaxContext.Closes #58 (Option A + the header-based root fix; the separate dual-model session-card display gap noted in the issue remains out of scope).
🤖 Generated with Claude Code