Skip to content

fix(#44): retry grouping — hide startup errors from real turns#61

Merged
lis186 merged 1 commit into
mainfrom
fix/44-retry-grouping
Jun 9, 2026
Merged

fix(#44): retry grouping — hide startup errors from real turns#61
lis186 merged 1 commit into
mainfrom
fix/44-retry-grouping

Conversation

@lis186

@lis186 lis186 commented Jun 9, 2026

Copy link
Copy Markdown
Owner

繁中摘要

Codex session 把啟動失敗的 502/429/499 retry(0.1 秒、無 output)跟真正的 API 對話混在同一個 turn list,使用者看到 66% failure rate 但實際 100% 成功。

這個 PR 在 client 端把 retry 從 real turn 分開:

  • Session card 顯示 2t 2r(2 real turns, 2 retries)而非 6t
  • Turn list 隱藏 retry card(drill-down 仍可查看)
  • 全 retry session 顯示 No turns — 1 failed request (504) 而非空白沉默
  • Gap timing、compression detection、keyboard nav、cost efficiency 都跳過 retry

只改 client-side(2 個 production 檔 + 1 個測試檔),server 完全不動。Heuristic 在 27,219 筆真實 log 驗證 99.9% 準確。


Problem

Codex sessions mix startup 502/429/499 retries (0.1s, no output) with real API turns (85–249s, has output). Users see inflated failure rates and a cluttered turn list.

Expert consensus (Tufte/Norman/Charity Majors roundtable):

  • Tufte: proportional ink violation — 0.1s noise gets same visual weight as 249s real turn
  • Norman: conceptual model mismatch — user's "turn" ≠ system's "turn"
  • Charity Majors: alert fatigue — red !http badges lose meaning when 4/6 turns show them

Solution

HeuristicisRetry = !isHttpStatusOk(status) && !(output_tokens > 0)

Validated against 27,219 real log entries with independent adversarial verification (6 auditors + 1 judge, 13 claims cross-checked). 99.9% accuracy; 26 edge cases deferred to Phase 2.

Session card — shows real turn count + retry badge:

BEFORE: gpt-5.5 · 6t          (4 of 6 are retries — misleading)
AFTER:  gpt-5.5 · 2t 2r       (2 real turns, 2 retries)

Turn list — retry entries hidden (still in allEntries for drill-down):

BEFORE                        AFTER
#1  ● !http   0.1s   —       #1  ○ 200   85s   $0.42
#2  ● !http   0.1s   —       #2  ○ 200  249s   $1.87
#3  ● !http   0.1s   —
#4  ○ 200     85s   $0.42    (2 retries accessible via session card badge)
#5  ● !http   0.1s   —
#6  ○ 200    249s   $1.87

Empty state — all-retry sessions explain instead of silence:

No turns — 3 failed requests (502 × 2, 429)

Downstream filters fixed:

  • Gap timing backward scan skips retries (no more 0.1s noise as gap baseline)
  • Compression detection backward scan skips retries (no false compaction from partial billing)
  • getVisibleTurnIndices excludes retries (keyboard nav skips hidden entries)
  • renderCostEfficiencyPanel filter aligned with sparkline (input_tokens > 0)

Files changed (3, client-side only — server untouched)

File Change
public/entry-rendering.js isRetry computation, retryCount counter, allEntries flag, gap timing + compression scan exclusion, early return to skip card rendering
public/miller-columns.js Session card retry badge (Nr), getVisibleTurnIndices filter, cost efficiency filter alignment, updateRetryEmptyState() for all-retry sessions
test/retry-grouping.test.js 15 difference tests covering classification, counters, session card, keyboard nav, gap timing, compression, Claude regression, empty state

Verification

Layer Result
Red→green TDD 12 tests written before implementation, confirmed red on old code, green after
Adversarial audit 6 independent auditors verified heuristic counts, blind spots, blast radius; 3/13 claims corrected
Full test suite 836 tests pass (0 fail), including all e2e
Browser smoke Real Codex session 019e9225 (1t 1r) — retry hidden, badge visible
Claude regression Session ee89450f (521t) — zero retries, completely unaffected
Keyboard nav ArrowUp/Down confirmed to skip retry indices
Empty state Session 019e929d (0t 1r) — shows "No turns — 1 failed request (504)"

Known limitations (Phase 2)

  • 15 entries with status=101 (WS upgrade shells + interrupted turns) bypass the heuristic — needs transport-aware classification
  • 21 entries with status=200 + SSE error bypass — needs SSE event type detection
  • Specimen 2 entries (6 total, input_tokens > 0 but output_tokens = 0) leak through sparkline filter

Closes #44

🤖 Generated with Claude Code

Codex sessions mix 502/429/499 retries (0.1s, no output) with real API
turns (85–249s, has output). Users see inflated failure rates and noisy
turn lists. This separates retries from real turns client-side:

- isRetry heuristic: !isHttpStatusOk(status) && !(output_tokens > 0)
- Session card shows retry count badge (e.g. "4t 2r")
- Turn list hides retry cards (still in allEntries for drill-down)
- Gap timing and compression scans skip retries
- Cost efficiency filter aligned with sparkline (input_tokens > 0)
- Claude sessions completely unaffected (zero retry entries)

Verified: 836 tests pass, browser smoke with real Codex sessions,
keyboard nav confirmed to skip hidden retries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@lis186 lis186 merged commit 0c26e83 into main Jun 9, 2026
4 checks passed
@lis186 lis186 deleted the fix/44-retry-grouping branch June 9, 2026 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UX: Retry grouping — separate startup errors from real turns

1 participant