Skip to content

fix(summarizer): summarize conversations that end on an extended-thinking turn#110

Open
minyek wants to merge 1 commit into
obra:mainfrom
minyek:fix/summarizer-thinking-block-resume
Open

fix(summarizer): summarize conversations that end on an extended-thinking turn#110
minyek wants to merge 1 commit into
obra:mainfrom
minyek:fix/summarizer-thinking-block-resume

Conversation

@minyek
Copy link
Copy Markdown
Contributor

@minyek minyek commented Jun 7, 2026

Summarizer fails permanently on conversations whose last assistant turn used extended thinking

Description

To summarize a short conversation (≤15 exchanges) the summarizer resumes the session via the Claude Agent SDK and asks for a <summary>. If that session's latest assistant message contains thinking / redacted_thinking blocks (extended thinking was on), resume replays that turn and the API rejects the continuation with a deterministic 400 — those blocks "cannot be modified". The conversation is written an error sentinel and retried, identically, on every sync forever — re-parsing a ~600 KB transcript and re-spawning a Claude subprocess each cycle.

Log fingerprint:

⚠️  Errors: 1
  <archive>/<uuid>.jsonl:
    Summary generation failed: Summarizer SDK error: success (session <uuid>)

success is misleading: the SDK reports this as an SDKResultSuccess with subtype: 'success' but is_error: true, api_error_status: 400, and the real message in result. The code discarded all of that and logged only the meaningless subtype.

Isolated but real: in an archive of 2,129 summaries this was the only errored conversation — the one that both ended on a thinking-block turn and was short enough (6 exchanges) to take the resume path. Size is not the cause; 500 MB+ transcripts summarize fine via the non-resume hierarchical path.

Environment

  • episodic-memory v1.4.2
  • @anthropic-ai/claude-agent-sdk bundled with v1.4.2

Root Cause

The Anthropic API constraint is the trigger; two code defects turn a recoverable condition into a permanent, opaque failure.

1. callClaude throws away the real error. It built SummarizerSdkError from only subtype, discarding message.result ("API Error: 400 … thinking blocks … cannot be modified") and api_error_status (400). subtype: 'success' + is_error: true is a documented SDKResultSuccess combination — the agent loop finished, but the turn hit an API error.

2. The non-resume fallback never fired. isResumeFailure matched only subtype === 'error_during_execution'. This case is 'success', so no fallback ran and the error propagated to a sentinel. The non-resume path would summarize it fine — it never replays the thinking blocks.

Reproduced (resume the offending session, logging the raw result):

{ "subtype": "success", "is_error": true, "api_error_status": 400, "num_turns": 1,
  "result": "API Error: 400 messages.1.content.25: `thinking` or `redacted_thinking` blocks in the latest assistant message cannot be modified. ..." }

Fix Summary

Two parts; see the diff for code.

  1. Surface the real error. SummarizerSdkError now carries apiErrorStatus and the API error text and renders them in the message, so the log shows the actual 400 instead of success.
  2. Broaden the non-resume fallback. isResumeFailure now also treats an HTTP 400 on a resume call as a resume failure. Our summary prompt is plain, well-formed text, so a 400 there can only come from the replayed history; the non-resume path (fresh session, transcript as text) sidesteps the immutable thinking blocks. Non-resume statuses (401/403/429/5xx) still propagate without a futile retry.

Tests added: test/summarizer-options.test.ts (unit coverage for the broadened isResumeFailure and the diagnostic SummarizerSdkError fields/message) and test/summarizer-resume-fallback.test.ts (integration via a mocked SDK — the thinking-block 400 reproduced exactly, and recovery through the non-resume path).

Fix Verified

Full suite passes (213/213). The integration test reproduces the exact SDK result shape from the real failure (subtype: 'success', is_error: true, api_error_status: 400) and asserts the summary is recovered via the non-resume retry. Validated end-to-end against the original failing session (fd2f1897) on claude-agent-sdk 0.2.141: the resume reproduces the 400 as that exact result shape, SummarizerSdkError surfaces it, isResumeFailure routes it, and the non-resume fallback returns a correct summary. The diagnostic log now reads:

resume failed for <uuid> (Summarizer SDK error: success (HTTP 400) (session <uuid>): API Error: 400 … `thinking` … cannot be modified …); retrying without resume

Also in this PR: test(show) makes a timestamp assertion locale/timezone-independent — a pre-existing failure on a clean checkout under non-US locales (the assertion hardcoded a US/ISO date regex), unrelated to the summarizer but required to get the suite green.

…king turn

Summarizing a short conversation resumes its session and asks for a <summary>.
When the session's last assistant turn used extended thinking, resume replays
the immutable thinking/redacted_thinking blocks and the API rejects the
continuation with a deterministic 400. The summarizer logged this as the opaque
"Summarizer SDK error: success" and never fell back, so that conversation failed
to summarize on every sync, indefinitely.

- SummarizerSdkError now carries api_error_status and the API error text, so the
  log shows the real 400 rather than a bare subtype.
- isResumeFailure treats an HTTP 400 on a resume call as a resume failure (the
  summary prompt is plain text, so a 400 there can only come from the replayed
  turns), routing to the non-resume transcript path that sidesteps the immutable
  blocks. Non-resume statuses (401/403/429/5xx) still propagate without a futile
  retry.

Also: cover the resume fallback and diagnostic errors with tests, make the show
timestamp assertion locale/timezone-independent, and tighten the summarizer
docstrings. dist rebuilt.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant