v1.16.3 hotfix — collectStream parts accumulation (Fix #1: empty-text terminator + Fix #2/T35: multi-chunk parts fragmentation)#59
Conversation
…i empirical fix)
v1.16.2 ask_agentic returned `(model returned empty response)` on every
tool-using prompt against live Gemini Pro/Flash. Symptoms surfaced on the
first end-to-end smoke test of the global install.
Root cause:
Gemini's actual stream protocol for tool-using turns emits two chunks:
1. `{ candidates: [{ content: { parts: [{ functionCall }] }}] }`
2. `{ candidates: [{ content: { parts: [{ text: '' }] }, finishReason: 'STOP' }] }`
v1.16.2's gate required `parts.length > 0`. Chunk 2 has length 1 (one
empty-text Part) so the gate accepted it, overwriting chunk 1's
functionCall. The loop then saw zero functionCallParts and routed to the
final-text path, returning the empty fallback.
Fix:
Strengthen the gate to require AT LEAST ONE part to carry actual content
— any non-text Part (functionCall, executableCode, codeExecutionResult,
fileData, inlineData) OR a text Part with non-empty string. Empty-text
terminator chunks (whether `[]`, `[{}]`, or `[{ text: '' }]`) fail the
gate; the previous chunk's functionCall survives.
Two regression tests pin the empirical pattern:
- test/unit/stream-collector.test.ts — collector-level
- test/unit/ask-agentic.test.ts — agentic-loop-level
766 tests pass (764 → 766).
Why review missed it: gemini Round-1 called out this class of bug; the
fold targeted `parts: []` (empty array) and pinned that shape. The
empirical `parts: [{ text: '' }]` (length-1 array with empty content)
was not captured in any pre-merge test because no one captured a raw
stream from live Gemini before merging. Will extend
real-gemini.smoke.test.ts in a follow-up to add a pre-merge live smoke
covering streaming + tool-calling.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
/6step on the v1.16.3 hotfix surfaced two non-blocking issues that need documentation in docs/FOLLOW-UP-PRS.md, not code changes: T35 — collectStream multi-chunk parts fragmentation Pre-existing latent bug in code.tool.ts:726-733 (iterates candidates for executableCode + codeExecutionResult). If Gemini ever emits these parts in separate chunks, last-write-wins on lastCandidates drops the earlier one. v1.16.3 doesn't fix this — it's not the empirical regression. True parts accumulation would address it. Fix-on-evidence; zero empirical reports in v1.14.x → v1.16.3 cycle. T36 — toolCall not in v1.16.3 content-bearing predicate ACCEPTED-DEFERRED for completeness. Server-side tool variant (Part.toolCall vs Part.functionCall). Currently unused in our codebase (we configure functionDeclarations, never server-side tools). Add when enabling Gemini's grounded-search or similar. No source code changes. Documentation only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same-release fold of T35 (multi-chunk parts fragmentation), escalated from "deferred / no empirical reports" after end-to-end MCP smoke-test on 2026-05-01 surfaced the empirical failure: ask_agentic against live Gemini Pro returned 400 "missing thought_signature, position 2" on multi-file prompts (Test 5). Hotfix-A's content-bearing-parts gate kept ONE chunk's parts under last-write-wins. Single-FC prompts (Tests 1–4) all passed. Parallel multi-FC prompts dropped FC1's Gemini-3-mandated thoughtSignature — the "FIRST functionCall part" signature contract per https://ai.google.dev/gemini-api/docs/thought-signatures — because chunks 2..N each overwrote the signature-bearing chunk 1. Fix: collectStream no longer gates or overwrites — accumulates parts across every chunk verbatim. Empty-text terminator parts, parallel functionCall parts, and standalone signature-bearing parts all preserved exactly as Gemini emitted them. Candidate scaffold (finishReason / safetyRatings / groundingMetadata / citationMetadata) stays last-write-wins; on stream exit the final candidates shape is synthesised by overlaying accumulated parts onto the last seen scaffold. Side-effects verified clean: - ask_agentic iterResult.signatures still operates on parts array - content-aware NO_PROGRESS dedupe (v1.16.0) keys on (name, args) pairs in signatures — no change - ask + code text extraction reads accumulated response.text — no change - code.tool.ts executableCode + codeExecutionResult extraction now correct across chunks (closes T35 secondary class — bonus) - T36 also resolved by construction: accumulation preserves toolCall parts the same way as functionCall, no predicate gating remains Tests: 771 pass (766 → 771; +5 regression pins covering parallel-FC across 3 chunks pinning thoughtSignature on FC1, executableCode + codeExecutionResult cross-chunk, standalone signature on empty-text terminator, content.role synthesis, candidates-undefined preservation for naked-text streams). Authoritative docs fetched verbatim during diagnosis: - ai.google.dev/gemini-api/docs/thought-signatures - ai.google.dev/gemini-api/docs/function-calling Closes T35 (RESOLVED in same release). Closes T36 (RESOLVED-BY-T35). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Hotfix release v1.16.3 addressing a v1.16.2 Gemini streaming regression by changing the stream collector to preserve all emitted candidates[0].content.parts across chunks (including empty-text terminators and multi-chunk/parallel function calls), with accompanying regression tests and release metadata updates.
Changes:
- Update
collectStreamto accumulatecontent.partsacross all chunks and synthesize the final candidate by overlaying accumulated parts onto the last candidate “scaffold” metadata. - Add/extend unit tests covering empty-text terminators, parallel functionCall fragmentation with thoughtSignature, and cross-chunk codeExecution parts.
- Bump release/version metadata and document T35/T36 status in docs and CHANGELOG.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/tools/shared/stream-collector.ts |
Implements parts accumulation across stream chunks and scaffold overlay on exit. |
test/unit/stream-collector.test.ts |
Adds regression coverage for multi-chunk parts accumulation and related edge cases. |
test/unit/ask-agentic.test.ts |
Adds a regression test for the empirical empty-text terminator pattern in tool-using turns. |
CHANGELOG.md |
Documents v1.16.3 hotfix rationale and scope. |
docs/FOLLOW-UP-PRS.md |
Marks T35 resolved in v1.16.3 and documents T36 status. |
package.json |
Bumps package version to 1.16.3. |
server.json |
Bumps server/package version references to 1.16.3. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ### Notes | ||
|
|
||
| - Anyone on v1.16.2 with `ask_agentic` calls in production should upgrade IMMEDIATELY. Tool-calling iterations were silently failing on every prompt. | ||
| - T35 (true parts accumulation) — closed in this release. T36 (`toolCall` predicate completeness for future Gemini server-side tools) remains tracked in `docs/FOLLOW-UP-PRS.md` and is unaffected (the accumulation contract handles `toolCall` exactly the same way as `functionCall`). |
There was a problem hiding this comment.
CHANGELOG note says T36 “remains tracked” in FOLLOW-UP-PRS, but this PR also updates docs/FOLLOW-UP-PRS.md to mark T36 as “RESOLVED-BY-T35 in v1.16.3”. Please reconcile the release notes with the tracking doc (either mark T36 resolved here too, or adjust FOLLOW-UP-PRS if it’s still intended to be open).
| - T35 (true parts accumulation) — closed in this release. T36 (`toolCall` predicate completeness for future Gemini server-side tools) remains tracked in `docs/FOLLOW-UP-PRS.md` and is unaffected (the accumulation contract handles `toolCall` exactly the same way as `functionCall`). | |
| - T35 (true parts accumulation) — closed in this release. T36 (`toolCall` predicate completeness for future Gemini server-side tools) is also resolved by T35 in v1.16.3 (the accumulation contract handles `toolCall` exactly the same way as `functionCall`). |
| // Live API smoke against v1.16.2 published binary on 2026-05-01 returned | ||
| // `(model returned empty response)` on every tool-using prompt — both | ||
| // gemini-pro-latest and gemini-flash-latest. Raw chunk capture revealed | ||
| // the empirical terminator shape `[{text: ''}]` (length 1) which slipped | ||
| // through v1.16.2's `parts.length > 0` gate. v1.16.3 fix requires AT | ||
| // LEAST ONE part with actual content. | ||
| const root = mkdtempSync(join(tmpdir(), 'gcctx-askagent-empty-text-terminator-')); | ||
| writeFileSync(join(root, 'a.ts'), 'export const answer = 42;'); | ||
|
|
||
| const { ctx, generateContent, generateContentStream } = buildCtx({ script: [] }); | ||
| generateContent.mockReset(); | ||
| generateContentStream.mockReset(); | ||
| // Iter 1: empirical Gemini shape — functionCall in chunk 1, [{text:''}] | ||
| // terminator in chunk 2. Pre-fix: chunk 2 overwrites chunk 1 in | ||
| // lastCandidates → loop sees no functionCall → final-text path → empty. | ||
| // Post-fix: chunk 2 fails the content-bearing gate → chunk 1 wins. |
There was a problem hiding this comment.
This test’s commentary still describes the v1.16.3 fix as a “content-bearing parts gate” that rejects the [{text: ''}] terminator chunk. In the current implementation, collectStream accumulates parts across chunks (including empty-text terminators), so the explanation is now misleading. Please update the comments to match the accumulation-based behavior (functionCall is preserved because parts from chunk 1 and chunk 2 are both retained, and downstream extraction filters for functionCall parts).
…comment reconcile R1 4-way review (GPT + Gemini + Grok + Copilot) on PR #59 surfaced a multi-candidate type-narrowing regression in the initial T35 implementation (GPT F1 High / Gemini F1 Nit / Grok F2 Medium — 3-way consensus, /6step PARTIAL Medium) plus four doc/comment residues. Folds: - A. stream-collector.ts: collapse-to-single-element synth replaced with per-index `Map<number, Part[]>` + `Map<number, Candidate>` accumulation. Pre-T35 `lastCandidates = chunk.candidates` semantics restored for any future caller setting `candidateCount > 1` (today none does, but `code.tool.ts:727` iterates ALL candidates so the narrowing was a regression-in-waiting). +1 multi-candidate regression pin (Fold B). - D. stream-collector.test.ts:101 title retitled to mention hotfix-A history AND T35 strengthening explicitly. - G. CHANGELOG T36 reconciled with FOLLOW-UP-PRS.md (RESOLVED-BY-T35, not "remains tracked"). - H. ask-agentic.test.ts:2469 comment updated to describe T35 accumulation behaviour instead of stale "content-bearing gate". ACCEPT-DEFERRED C (Grok F3 Medium): end-to-end multi-FC + thoughtSignature test through askAgenticTool.execute. Collector contract pinned at stream-collector.test.ts:189; live-API integration smoke is the documented next-PR scope. ACCEPT E (Grok F4 Low) + F (Grok F5 Low): doc-precision items consistent with project conventions (verified empirically against test count + prior release format). 772 tests pass (was 771). Lint + typecheck green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R2 verification (3-way: GPT + Gemini + Grok) on R1 fold commit 1adda5e surfaced a 2-way HIGH consensus (GPT F1 / Gemini F1) on per-index Map keying in stream-collector.ts. Folded: - I. stream-collector.ts: keys now `cand.index ?? i` instead of loop position alone. Per @google/genai SDK Candidate.index doc, sparse stream emission like `candidates: [{ index: 1, content: ... }]` (only non-zero index) would cross-wire that part into bucket 0 under the loop-i keying. /6step verdict TP Medium (HIGH inflated by reviewers — today exercises only single-candidate path so practical impact is 0; fix is structural). - I-comment: synth `.sort()` becomes load-bearing under cand.index keying because Map insertion order can diverge from ordinal index when chunks emit indices in reverse — comment added. - J. New regression pin `keys multi-candidate buckets by Candidate.index, not array position` emits chunks 1+2+3 with sparse emission (chunk 1: index 1 only, chunk 2: index 0 only, chunk 3: index 1 only — Map insertion order [1, 0]) and asserts ordinal output ordering plus no cross-bucket bleed. Locks both keying invariant and sort load-bearing-ness. ACCEPT K (Grok R2 F2 Nit): sort was redundant under R1 keying; becomes load-bearing under R2 keying so it stays. 773 tests pass (was 772). Lint + typecheck green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| * the final chunk — earlier chunks may report | ||
| * "STOP_REASON_UNSPECIFIED" mid-stream. We retain the LAST chunk's | ||
| * candidate scaffold and synthesise the final `candidates` array on | ||
| * exit by overlaying the accumulated `parts` onto it. |
There was a problem hiding this comment.
Docstring says the collector retains the LAST chunk’s candidate scaffold, but the implementation only updates scaffolds when chunk.candidates includes entries (e.g., a later chunk with candidates: [] leaves the earlier scaffold intact). Consider rewording to “last seen candidate scaffold per index” / “latest candidate-bearing chunk” to match actual behavior and the unit test that expects scaffold preservation when later chunks have no candidates.
| * the final chunk — earlier chunks may report | |
| * "STOP_REASON_UNSPECIFIED" mid-stream. We retain the LAST chunk's | |
| * candidate scaffold and synthesise the final `candidates` array on | |
| * exit by overlaying the accumulated `parts` onto it. | |
| * the final candidate-bearing chunk — earlier chunks may report | |
| * "STOP_REASON_UNSPECIFIED" mid-stream. We retain the last seen | |
| * candidate scaffold per index and synthesise the final `candidates` | |
| * array on exit by overlaying the accumulated `parts` onto it. Chunks | |
| * with `candidates: []` do not clear a previously retained scaffold. |
| // through v1.16.2's `parts.length > 0` gate. v1.16.3 fix requires AT | ||
| // LEAST ONE part with actual content. |
There was a problem hiding this comment.
This test comment says “v1.16.3 fix requires AT LEAST ONE part with actual content”, but the current v1.16.3 implementation is true parts accumulation (no content-bearing gate). Please update the comment to reflect the accumulation behavior (and, if needed, refer to the earlier hotfix-A gate as historical context) so future readers don’t infer a predicate that no longer exists.
| // through v1.16.2's `parts.length > 0` gate. v1.16.3 fix requires AT | |
| // LEAST ONE part with actual content. | |
| // through v1.16.2's `parts.length > 0` gate. v1.16.3 instead does true | |
| // parts accumulation: later chunks append their parts verbatim, so an | |
| // empty-text terminator no longer suppresses an earlier functionCall. | |
| // (The earlier hotfix-A "content-bearing part" gate is historical | |
| // context only, not the current predicate.) |
Pure doc/comment drift cleanup left behind by the R1 + R2 fold cycles. No code change. R3 Copilot review — 4 line-comments, /6step verdict: - Findings #1, #2 (CHANGELOG T36 status + ask-agentic test inner comment, pinned on commit 4510f9d): FALSE POSITIVE at HEAD — already addressed by Fold G + Fold H in the R1 commit (1adda5e). Copilot pinned on a stale commit; resolve threads in the GH UI. - Finding #3 (stream-collector.ts:35-40 docstring "LAST chunk's candidate scaffold", pinned on 083c57e): TRUE POSITIVE LOW. Docstring lagged the per-index map keying introduced by R1+R2. Reword to "last CANDIDATE-BEARING chunk's scaffold per `cand.index`" and document that a chunk emitting `candidates: []` does NOT clear earlier scaffolds. - Finding #4 (ask-agentic.test.ts:2454-2459 outer test header "v1.16.3 fix requires AT LEAST ONE part with actual content", pinned on 083c57e): TRUE POSITIVE LOW. Co-located with the Fold-H-updated inner block (lines 2466-2474) — the outer header still narrated hotfix-A semantics. Reword to describe true parts accumulation; add a one-line historical note that hotfix-A's content-bearing gate was superseded in this same release. Verification: lint, typecheck, 773 tests pass. dist/ unchanged in behaviour — pure comment edits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
v1.16.3 ships TWO related fixes for the v1.16.2 streaming regression, folded into a single release because the second was discovered during end-to-end MCP smoke-test of the first.
Fix #1 — empty-text-terminator regression (already in branch)
v1.16.2 ask_agentic returned
(model returned empty response)on every tool-using prompt against live Gemini Pro / Flash. Gemini's stream protocol for tool-using turns emits a TWO-chunk pattern:parts: [{ functionCall }]thenparts: [{ text: '' }]terminator withfinishReason: 'STOP'. v1.16.2'sparts.length > 0gate accepted the terminator and overwrote the functionCall.Fix #2 — multi-chunk parts fragmentation / T35 (this commit)
End-to-end MCP smoke-test on 2026-05-01 — Tests 1–4 passed (single-FC + ask + code paths), Test 5 (
read package.json + server.json + CHANGELOG.md) failed with Gemini 400:Function call default_api:read_file ... missing thought_signature, position 2. Hotfix-A's content-bearing-parts gate kept ONE chunk under last-write-wins; parallel multi-FC dropped FC1's Gemini-3-mandated thoughtSignature.Vendor docs fetched verbatim during diagnosis:
Resolution:
collectStreamno longer gates or overwrites — it ACCUMULATESpartsacross every chunk verbatim. Empty-text terminator parts, parallelfunctionCallparts, and standalone signature-bearing parts all preserved exactly as Gemini emitted them. Candidate scaffold (finishReason/ safety / grounding / citation) stays last-write-wins; on stream exit the final candidates shape is synthesised by overlaying accumulated parts onto the last seen scaffold.Side-effects verified clean:
ask_agenticiterResult.signaturesoperates on parts array — no behavioural change(name, args)pairs — no changeask+codetext extraction reads accumulatedresponse.text— no changecode.tool.tsexecutableCode+codeExecutionResultextraction now correct across chunks (closes T35 secondary class — bonus)toolCallparts the same way asfunctionCall, no predicate gating remainsCloses: T35 (RESOLVED in same release), T36 (RESOLVED-BY-T35).
Test plan
.claude/local-v1.16.3-resume-after-session-reset.md🤖 Generated with Claude Code