Skip to content

fix: resume eligibility requires output_tokens > 0 (#44 specimen 2/3)#57

Merged
lis186 merged 2 commits into
mainfrom
fix/resume-output-tokens
Jun 7, 2026
Merged

fix: resume eligibility requires output_tokens > 0 (#44 specimen 2/3)#57
lis186 merged 2 commits into
mainfrom
fix/resume-output-tokens

Conversation

@lis186

@lis186 lis186 commented Jun 7, 2026

Copy link
Copy Markdown
Owner

Problem

PR #56's has-usage heuristic still shows a resume button for codex sessions whose only turns billed input but produced zero output — hung WS turns (specimen 2) and cross-session retry shells (specimen 3) from #44. Codex never writes a rollout file for those, so the copied codex resume <sid> command fails with No saved session found.

Verified against ~/.codex/sessions ground truth: 4/19 codex sessions were false positives, including the originally reported 019e929d-999e.

Fix

markSessionUsage condition: entry.usageentry.usage?.output_tokens > 0.

status (101 false positives) and stopReason (completed on failed turns) are unreliable; output_tokens > 0 separates all 19 real sessions correctly per the analysis in #44.

Tests

  • store: new case — billed zero-output turn (9,953 in / 0 out / 499) does not mark the session
  • restore: new case — restore from index does not resurrect the resume button for a zero-output-only session
  • existing marking fixtures updated to carry output_tokens > 0
  • full suite: 816/816 pass

Smoke (isolated env, real traffic)

CCXRAY_HOME=/tmp/ccxray-smoke-44a ccxray --port 5604 --no-browser + real codex exec: successful turn (24 output tokens) → resumable: true with correct codex resume command.

Refs #44 (this is the "resume button false positive" slice only; turn-list retry grouping remains open)

🤖 Generated with Claude Code

Justin Lee and others added 2 commits June 8, 2026 02:21
…esence

PR #56's has-usage heuristic shows a resume button for codex sessions
whose only turns billed input but produced zero output (hung WS turns,
cross-session retries). Those sessions have no rollout file on disk, so
`codex resume` fails — 4/19 real sessions were false positives against
~/.codex/sessions ground truth (issue #44, specimen 2/3).

status and stopReason are unreliable discriminators (101 and 'completed'
appear on failed turns); output_tokens > 0 separates all 19 sessions
correctly. Tighten markSessionUsage to that condition.

Refs #44

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lis186

lis186 commented Jun 7, 2026

Copy link
Copy Markdown
Owner Author

這個 PR 做了什麼 / 即將做什麼(視覺說明)

1. 改了什麼 — 一行條件的修正

markSessionUsage(entry)   server/store.js:264
┌──────────────────────────────────────────────────────────────┐
│  if (!sid || NON_RESUMABLE_SESSIONS.has(sid)) return;          │
│  if (entry.isSubagent) return;                                 │
│                                                                │
│  - if (!entry.usage) return;              ◀── 舊:有 usage 就算 │
│  + if (!(entry.usage?.output_tokens > 0)) return;  ◀── 新      │
│                                                                │
│  meta.hasUsage = true;   // monotonic,一旦 true 不回頭         │
└─────────���────────────────────────────────────────────────────┘

2. 為什麼 — 三種 wire specimen,舊判別法在兩種上失效

                    舊判別 (has usage)      新判別 (output_tokens > 0)
                    ┌─────────────────┐     ┌─────────────────┐
 ① 正常 turn        │ in:13k out:24   │     │ in:13k out:24   │
    rollout 存在    │   ✅ resumable   │     │   ✅ resumable   │  ← 一致正確
                    └─────────────────┘     └─────────────────┘

 ② 長掛 zero-output │ in:9953 out:0   │     │ in:9953 out:0   │
    45分→499,計費   │   ❌ 假 resumable │     │   ✅ 不顯示       │  ← 修正
    但無 rollout 檔  │  codex resume 失敗│     │                 │
                    └─────────────────┘     └─────────────────┘

 ③ cross-session    │ in:xx out:0     │     │ in:xx out:0     │
    retry 空殼       │   ❌ 假 resumable │     │   ✅ 不顯示       │  ← 修正
    504→換新 sid 重試│                 │     │                 │
                    └─────────────────┘     └─────────────────┘

 信號可靠度:  status(101 有 false positive)✗   stopReason('completed'也出現在失敗)✗
              output_tokens > 0  ✓  ← 唯一能完美分離 19 個真實 session 的判別

3. 資料流 — 改動點落在「單一漏斗」

   wire turn 進來
        │
        ▼
  ┌──────────────┐      ┌─────────────────────────────────────────┐
  │ summarizeEntry│─────▶│ markSessionUsage(entry)  ◀══ 唯一改動點   │
  │ (sse-broadcast│      │   寫入 sessionMeta[sid].hasUsage          │
  │  + restore 都 │      └─────────────────────────────────────────┘
  │  經過這裡)    │                      │
  └──────────────┘                      ▼
        │              ┌─────────────────────────────────────────┐
        └─────────────▶│ computeSessionResume(sid, provider)      │
                       │   讀 hasUsage → resume button 的唯一真相源 │
                       └─────────────────────────────────────────┘
                                         │
                          ┌──────────────┴──────────────┐
                          ▼                              ▼
                   live SSE broadcast            restart restore
                   (新 turn 即時)                (從 index 重建,
                                                  無 rollout 檔探測)
   兩條路徑共用同一個寫入/讀取對 → One source of truth,不需兩處改

4. 架構上的變化 — 其實「沒有」結構變化

  結構層面:    無新模組、無新欄位、無新 API、無依賴
               heuristic 仍在原本的單一漏斗裡

  語意層面:    「session 可 resume」的判定基準收緊
               from: 「Codex 回報了 usage」
               to:   「Codex 真的產出了 output(= rollout 檔必然存在)」

  ┌────────────────────────────────────────────────────────┐
  │  declarative resume profile (providers.js) 完全不動:    │
  │    openai:    { condition: 'has-usage' }                │
  │    anthropic: { condition: 'always'   }                │
  │  ── 只是 'has-usage' 背後的定義變精確了                   │
  └────────────────────────────────────────────────────────┘

5. 帶來的價值

 ┌─ 對使用者 ───────────────────────────────────────────────┐
 │  • 不再看到「點了會失敗」的 resume 按鈕                    │
 │    (舊版 4/19 codex session 會複製出 codex 端不存在的     │
 │     session id,執行 codex resume 直接報錯)              │
 │  • 按鈕出現 = 保證能 resume → 信任感,不再踩雷             │
 └─────────────────────────────────────────────────────────┘

 ┌─ 對開發者 ───────────────────────────────────────────────┐
 │  • 判別信號從「不可靠的 status/stopReason」收斂到一個      │
 │    可靠 discriminator(output_tokens > 0),且 fail-closed │
 │    (缺欄位 = 不顯示,比錯誤顯示安全)                     │
 │  • #44 PR-B/PR-C(turn-list retry 分組)會重用同一信號     │
 │    → 先把地基的判別修對,下游分類才不會建在錯的信號上      │
 │  • 改動侷限在單一漏斗,零 call-site 連帶破壞               │
 │    (codex 二審確認 hasUsage 只有 computeSessionResume 讀) │
 └─────────────────────────────────────────────────────────┘

6. 即將做什麼 — PR-A 是 #44 三刀切的第一刀

  #44「error/empty wire events 呈現策略」
  │
  ├─ PR-A ✅ 已完成(本 PR)
  │    resume button false positive — 後端 1 行 + 測試
  │    └─ specimen 2/3 的「按鈕」面向
  │
  ├─ PR-B ⏭️ 下一步(M,前端為主)
  │    turn-list retry 分組:isRetry classifier + retryCount/
  │    turnCount 拆分 + 「(N retries)」session card
  │    ⚠️ 動到 addEntry(中央 bookkeeping),爆炸半徑大
  │    └─ 只切 specimen-1(快速 startup retry)
  │
  └─ PR-C 🅿️ 延後(M/L,需先寫 reason/ 設計文件)
       specimen 2 降階顯示 + specimen 3 cross-session 連結
       └─ 三個 specimen 要共用一套視覺語言才動

@lis186 lis186 merged commit f4807e6 into main Jun 7, 2026
2 checks passed
@lis186 lis186 deleted the fix/resume-output-tokens branch June 7, 2026 23:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant