fix(summarizer): summarize conversations that end on an extended-thinking turn#110
Open
minyek wants to merge 1 commit into
Open
fix(summarizer): summarize conversations that end on an extended-thinking turn#110minyek wants to merge 1 commit into
minyek wants to merge 1 commit into
Conversation
…king turn Summarizing a short conversation resumes its session and asks for a <summary>. When the session's last assistant turn used extended thinking, resume replays the immutable thinking/redacted_thinking blocks and the API rejects the continuation with a deterministic 400. The summarizer logged this as the opaque "Summarizer SDK error: success" and never fell back, so that conversation failed to summarize on every sync, indefinitely. - SummarizerSdkError now carries api_error_status and the API error text, so the log shows the real 400 rather than a bare subtype. - isResumeFailure treats an HTTP 400 on a resume call as a resume failure (the summary prompt is plain text, so a 400 there can only come from the replayed turns), routing to the non-resume transcript path that sidesteps the immutable blocks. Non-resume statuses (401/403/429/5xx) still propagate without a futile retry. Also: cover the resume fallback and diagnostic errors with tests, make the show timestamp assertion locale/timezone-independent, and tighten the summarizer docstrings. dist rebuilt.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summarizer fails permanently on conversations whose last assistant turn used extended thinking
Description
To summarize a short conversation (≤15 exchanges) the summarizer resumes the session via the Claude Agent SDK and asks for a
<summary>. If that session's latest assistant message containsthinking/redacted_thinkingblocks (extended thinking was on), resume replays that turn and the API rejects the continuation with a deterministic 400 — those blocks "cannot be modified". The conversation is written an error sentinel and retried, identically, on every sync forever — re-parsing a ~600 KB transcript and re-spawning a Claude subprocess each cycle.Log fingerprint:
successis misleading: the SDK reports this as anSDKResultSuccesswithsubtype: 'success'butis_error: true,api_error_status: 400, and the real message inresult. The code discarded all of that and logged only the meaningless subtype.Isolated but real: in an archive of 2,129 summaries this was the only errored conversation — the one that both ended on a thinking-block turn and was short enough (6 exchanges) to take the resume path. Size is not the cause; 500 MB+ transcripts summarize fine via the non-resume hierarchical path.
Environment
@anthropic-ai/claude-agent-sdkbundled with v1.4.2Root Cause
The Anthropic API constraint is the trigger; two code defects turn a recoverable condition into a permanent, opaque failure.
1.
callClaudethrows away the real error. It builtSummarizerSdkErrorfrom onlysubtype, discardingmessage.result("API Error: 400 … thinking blocks … cannot be modified") andapi_error_status(400).subtype: 'success'+is_error: trueis a documentedSDKResultSuccesscombination — the agent loop finished, but the turn hit an API error.2. The non-resume fallback never fired.
isResumeFailurematched onlysubtype === 'error_during_execution'. This case is'success', so no fallback ran and the error propagated to a sentinel. The non-resume path would summarize it fine — it never replays the thinking blocks.Reproduced (resume the offending session, logging the raw result):
{ "subtype": "success", "is_error": true, "api_error_status": 400, "num_turns": 1, "result": "API Error: 400 messages.1.content.25: `thinking` or `redacted_thinking` blocks in the latest assistant message cannot be modified. ..." }Fix Summary
Two parts; see the diff for code.
SummarizerSdkErrornow carriesapiErrorStatusand the API error text and renders them in the message, so the log shows the actual 400 instead ofsuccess.isResumeFailurenow also treats an HTTP 400 on a resume call as a resume failure. Our summary prompt is plain, well-formed text, so a 400 there can only come from the replayed history; the non-resume path (fresh session, transcript as text) sidesteps the immutable thinking blocks. Non-resume statuses (401/403/429/5xx) still propagate without a futile retry.Tests added:
test/summarizer-options.test.ts(unit coverage for the broadenedisResumeFailureand the diagnosticSummarizerSdkErrorfields/message) andtest/summarizer-resume-fallback.test.ts(integration via a mocked SDK — the thinking-block 400 reproduced exactly, and recovery through the non-resume path).Fix Verified
Full suite passes (213/213). The integration test reproduces the exact SDK result shape from the real failure (
subtype: 'success',is_error: true,api_error_status: 400) and asserts the summary is recovered via the non-resume retry. Validated end-to-end against the original failing session (fd2f1897) onclaude-agent-sdk0.2.141: the resume reproduces the 400 as that exact result shape,SummarizerSdkErrorsurfaces it,isResumeFailureroutes it, and the non-resume fallback returns a correct summary. The diagnostic log now reads:Also in this PR:
test(show)makes a timestamp assertion locale/timezone-independent — a pre-existing failure on a clean checkout under non-US locales (the assertion hardcoded a US/ISO date regex), unrelated to the summarizer but required to get the suite green.