Skip to content

fix(#58): use anthropic-beta context-1m header as non-lagging 1M context-window signal#60

Merged
lis186 merged 2 commits into
mainfrom
fix/58-ctx-window-beta
Jun 9, 2026
Merged

fix(#58): use anthropic-beta context-1m header as non-lagging 1M context-window signal#60
lis186 merged 2 commits into
mainfrom
fix/58-ctx-window-beta

Conversation

@lis186

@lis186 lis186 commented Jun 9, 2026

Copy link
Copy Markdown
Owner

繁中摘要

Claude Code 在 session 中途切換 model 後,per-turn 的 ctx% 會錯(例如 99% → 20% 一回合崩塌)。根因不是「切換」本身,而是 getMaxContext() 拿 context-window 分母時,偏好會落後的 system-prompt model marker 而非 request model 欄位:切換到 claude-opus-4-8 後,request model 立即更新,但系統提示的 "The exact model ID is ..." 會落後好幾個 turn 仍報舊 model(且無 [1m]),於是那幾個 turn 用錯的 200K 分母算出 ~99%,等 usage 衝破 200K 才靠 escape-hatch 跳 1M,造成可見的斷崖。

修法:改用 anthropic-beta: context-1m-2025-08-07 request header 當 1M 訊號。實測(loopback 明文抓包)確認此 header 在 1M 啟用時每個 request 都帶、不落後,是真正的 plan 開關。因為它是 client/帳號級旗標(連 haiku title-gen 也帶),所以用 SUPPORTS_1M 閘門限定 1M-capable model 才套用,避免把 haiku 誤標 1M。model 身份改取自 request model(即時),usage 觀測保留為單調下界。

反脆弱:任何單一訊號失效(beta token 改名/消失、marker 永不來、model 不在表)都只是少一票、降級到下一來源,絕不低於 usage 下界、絕不回退。新增 1M model 家族=SUPPORTS_1M 加一行,非改邏輯。實測也讓原本規劃的 session-sticky(Option B)變成多餘而拿掉(via negativa)——因為 header 每個 turn 都在。

驗證:config.test.js 紅→綠 differential(用 issue 的 turn-35 資料);全套件 823/823 綠;browser-harness(真 Chrome)9 項 dashboard ctx% 顯示斷言全過;mock-upstream live 路徑證明只差 header → maxContext 200K↔1M。


Problem

When the model is switched mid-session, per-turn ctx% jumps wrong (e.g. 99% → 20% in one turn). The model switch is only a temporal coincidence — the real cause is the context-window denominator resolving against a stale model for several turns.

server/config.js getMaxContext() preferred the system-prompt model marker over the request model field:

const effective = extractModelFromSystem(system) || model; // stale marker wins

After switching to claude-opus-4-8, the request model updates immediately, but Claude Code's "The exact model ID is ..." line lags several turns, still emitting claude-opus-4-6 (no [1m]). Those turns resolve to 200K → ctx% ~99%. When usage finally crosses 200K, inferMaxContext's usage escape-hatch bumps to 1M → the visible 99.8% → 20.2% cliff.

turn  model            system-marker        used      maxCtx     ctx%
34    opus-4-6         opus-4-6             162141    200000     81.1
35    opus-4-8         opus-4-6  STALE      196602    200000     98.3
36    opus-4-8         opus-4-6  STALE      199672    200000     99.8
37    opus-4-8         opus-4-6  STALE      201887   1000000     20.2   <- cliff
40    opus-4-8         opus-4-8[1m] OK      202015   1000000     20.2   <- marker caught up

Fix

Use the anthropic-beta: context-1m-2025-08-07 request header as the 1M signal.

Empirically confirmed via a loopback cleartext capture of this session's own traffic: the header is present on every request when the account's 1M context is enabled — it does not lag a mid-session model switch. It is the actual plan gate.

getMaxContext(model, system, opts):

  • Model identity comes from the request model field (updates immediately); the system marker is only an identity fallback.
  • 1M is active if opts.beta1m (header) or the legacy [1m] system marker is present...
  • ...gated by SUPPORTS_1M (/^claude-(opus|sonnet)-4/). The header is a client/account-level flag that also rides haiku title-gen requests, so without the capability gate it would over-claim a 1M window for haiku. New 1M families are one line in SUPPORTS_1M, not a logic change.
  • The usage escape-hatch is retained unchanged as a monotonic lower bound.

beta1m is derived once from anthropic-beta at request receipt (server/index.js) and threaded through forward.js → the anthropic wire-parser and the terminal HUD.

Why this is antifragile, not just robust

Any single signal can disappear and detection degrades to the next source, never below the usage lower bound, never re-claiming below a value already shown:

Signal lost Behavior
Anthropic renames/removes the beta token falls back to [1m] marker + usage
[1m] marker never arrives (title-gen) header + usage take over
neither header nor marker (older client) model table + usage = today's behavior
usage still small header gives 1M immediately, before the cliff

The empirical finding also let us remove the originally-planned session-sticky state (Option B in #58): because the header is per-turn, session stickiness is redundant — via negativa, not added machinery.

Files

  • server/config.jsgetMaxContext/inferMaxContext rewrite + SUPPORTS_1M
  • server/index.js — derive beta1m from anthropic-beta, carry on ctx
  • server/forward.js, server/wire-parsers/anthropic.js, server/helpers.js — thread beta1m into inferMaxContext (entry build + HUD)
  • docs/wire-protocol-reference.md — record context-1m-2025-08-07; note rate-limit headers ≠ context window
  • test/config.test.js — differential tests
  • test/dashboard-codex-e2e.test.js — make the project-label assertion truncation-aware (passed spuriously-failed only from long worktree paths)

Tests

  • config.test.js — fail-on-old / pass-on-new keyed on the issue's turn-35 data (stale marker + opus-4-8 + beta1m must return 1M; haiku + beta1m must stay 200K). Red on the unmodified function, green after.
  • Full suite: 823/823 pass.

Verification — dashboard ctx% (browser-harness, real Chrome)

Isolated instance seeded with a fixture reproducing the turn 34–40 sequence (a "fixed-1m" session, a "buggy-200k" contrast session, a haiku turn, a Codex session). Confirmed that the dashboard uses the server's e.maxContext (it does not re-derive the window client-side), so the server fix drives the display:

  1. lag-window opus turns render 16/20/20/20/20% (not ~99%)
  2. no display cliff in the fixed session (deltas [4,0,0,0])
  3. denominator renders 196,602 / 1,000,000 (20%); minimap data-max-context=1000000
  4. fixed turns are healthy (no ctx-critical/risk-critical)
  5. turn label / minimap / session badge all agree on 1M
  6. haiku over-claim guard renders 8,000 / 200,000 (4%) (a 1M denominator would show ~1%)
  7. contrast: buggy-200k renders 81 → 98 → 100% (critical) → 20% — the reported cliff, proving the dashboard faithfully reflects maxContext
  8. Codex session unaffected (30%, its own 400K window)
  9. values survive a server restart (restore's Math.max keeps 1M)

Verification — live header → maxContext (mock upstream)

Two requests through a mock Anthropic SSE upstream, identical except the header, usage fixed at 50,000 (rules out the usage hatch; marker is a stale opus-4-6 with no [1m]):

Request header used maxContext
A anthropic-beta: ...,context-1m-2025-08-07,... 50,000 1,000,000
B none 50,000 200,000

Exercises the full path: index.js header derivation → ctx.beta1mforward.js → wire-parser → config.getMaxContext.

Closes #58 (Option A + the header-based root fix; the separate dual-model session-card display gap noted in the issue remains out of scope).

🤖 Generated with Claude Code

lis186 added 2 commits June 9, 2026 00:44
…ow signal

getMaxContext took model identity from the (lagging) system-prompt marker, so
after a mid-session model switch the context-window denominator resolved to a
stale model for several turns — ctx% showed ~99% then cliff-dropped to ~20%.

- identity now comes from the request `model` field (updates immediately)
- 1M detected from anthropic-beta `context-1m-2025-08-07` (empirically confirmed
  present on every turn) OR the legacy [1m] marker, gated by SUPPORTS_1M so the
  client-level beta flag riding haiku title-gen turns can't over-claim 1M
- usage escape-hatch retained as monotonic lower bound
- beta1m threaded from request header through index→forward→wire-parser + HUD
- docs/wire-protocol-reference.md: record context-1m-2025-08-07 + rate-limit≠window

Antifragile: any single signal can vanish (beta token renamed, marker never
arrives, model absent from table) and detection degrades to the next source,
never below the usage lower bound. New 1M families = one line in SUPPORTS_1M.

Red→green: test/config.test.js adds fail-on-old/pass-on-new cases keyed on the
issue's turn-35 data (stale marker + opus-4-8 + beta1m must return 1M).
The test compared the selected project label against the raw cwd basename, but
the dashboard renders it through truncateMiddle(name, 20). When the suite runs
from a git worktree (long '.claude/worktrees/<branch>' basename) the label is
truncated and the assertion failed spuriously. Mirror the UI truncation so the
test is independent of the checkout path length.
@lis186 lis186 merged commit 08e8417 into main Jun 9, 2026
2 checks passed
@lis186 lis186 deleted the fix/58-ctx-window-beta branch June 9, 2026 00:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: ctx% wrong after mid-session model switch (stale system-prompt model marker)

1 participant