fix(engine): resume length-truncated turns; unclamp Opus output (32K→128K)#374
Merged
Merged
Conversation
…128K) A long agent turn that hit the model output ceiling was silently lost: the loop treated a `finishReason: "length"` response with no tool call as "model done" and ended the run mid-answer. Two independent causes, two layers: - Magnitude: @ai-sdk/anthropic@3.0.64 predates Opus 4.7/4.8 — both fall into its `opus-4-` catch-all (32K) and the SDK silently clamps the requested 128K down. Bump to 3.0.81, whose model table maps opus-4-7/4-8 → 128K, so an unset maxOutputTokens now resolves to the real catalog ceiling. - Mechanism: the engine now auto-resumes a length-truncated turn from its partial text, bounded by MAX_LENGTH_CONTINUATIONS (4) and emitting a `context.length_continuation` event. Resumes are skipped when a reasoning block lacks its provider signature (a mid-thinking cut) — replaying an unsigned thinking block as the trailing assistant message is rejected by Anthropic; that case surfaces stopReason "length" instead. Also corrects the stale "Default: 16384" doc on RuntimeConfig.maxOutputTokens (unset resolves to the catalog ceiling; 16384 is only the unknown-model fallback). Tests: new test/unit/engine-length-continuation.test.ts covers resume + stitch, the continuation bound, the non-length no-regression, the tool-call path, and both signed/unsigned reasoning cases.
…h-resume Record why the resume isn't gated by provider (engine is provider-agnostic; the path is bounded and Anthropic-default) and what to do if a non-Anthropic model becomes a default. No behavior change.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
RCA of a production session where a long agent turn was silently cut off mid-response. Two independent defects at two layers; both fixed here.
Symptom 1 — turn lost at the output ceiling
@ai-sdk/anthropic@3.0.64'sgetModelCapabilities()predates Opus 4.7/4.8, soclaude-opus-4-7falls into theopus-4-catch-all (32K) and the SDK silently clamps the platform's 128K request down (dist/index.js:3168). A turn truncated at exactlyout=32000, finishReason=length. Bumped to 3.0.81, whose table mapsopus-4-7/opus-4-8→ 128K. WithmaxOutputTokensunset (the correct default), Opus now gets its real ceiling.finishReason: "length"response with no tool call, treating truncation as "done" (engine.ts). It now auto-resumes from the partial text, bounded byMAX_LENGTH_CONTINUATIONS(4) and emitting acontext.length_continuationevent. Past the bound it endsstopReason: "length"(unchanged from before).Safety: reasoning models
A length cut mid-thinking drains an unsigned reasoning block; replaying it as the trailing assistant message is what Anthropic rejects ("thinking blocks in the latest assistant message cannot be modified"). The resume is guarded by
hasUnsignedReasoning()— that case surfacesstopReason: "length"instead of resuming. (Caught by adversarial review; our default model runs with extended thinking.)Doc
Corrected the stale
RuntimeConfig.maxOutputTokensdoc ("Default: 16384") — unset resolves to the model's catalog ceiling; 16384 is only the unknown-model fallback.Tests
test/unit/engine-length-continuation.test.ts(6): resume + seamless stitch, continuation bound →length, non-length no-regression, tool-call path bypasses resume, unsigned-reasoning surfaces, signed-reasoning resumes.finishReasontest to the corrected behavior.bun run verify:static✅,tsc✅, full unit suite green except one pre-existing unrelated failure (dompurifyin an untouched bundle UI, fails identically onmain).Follow-ups (not in this PR)
str; needs a repro with full args before fixing at the right layer."Default: 16384"text also lives in theschemasrepo (fetched artifact) — small follow-up there.