Skip to content

v1.16.3 hotfix — collectStream parts accumulation (Fix #1: empty-text terminator + Fix #2/T35: multi-chunk parts fragmentation)#59

Merged
qmt merged 6 commits into
mainfrom
v1.16.3-hotfix-fragmentation-gate
May 1, 2026
Merged

v1.16.3 hotfix — collectStream parts accumulation (Fix #1: empty-text terminator + Fix #2/T35: multi-chunk parts fragmentation)#59
qmt merged 6 commits into
mainfrom
v1.16.3-hotfix-fragmentation-gate

Conversation

@qmt
Copy link
Copy Markdown
Member

@qmt qmt commented May 1, 2026

Summary

v1.16.3 ships TWO related fixes for the v1.16.2 streaming regression, folded into a single release because the second was discovered during end-to-end MCP smoke-test of the first.

Fix #1 — empty-text-terminator regression (already in branch)

v1.16.2 ask_agentic returned (model returned empty response) on every tool-using prompt against live Gemini Pro / Flash. Gemini's stream protocol for tool-using turns emits a TWO-chunk pattern: parts: [{ functionCall }] then parts: [{ text: '' }] terminator with finishReason: 'STOP'. v1.16.2's parts.length > 0 gate accepted the terminator and overwrote the functionCall.

Fix #2 — multi-chunk parts fragmentation / T35 (this commit)

End-to-end MCP smoke-test on 2026-05-01 — Tests 1–4 passed (single-FC + ask + code paths), Test 5 (read package.json + server.json + CHANGELOG.md) failed with Gemini 400: Function call default_api:read_file ... missing thought_signature, position 2. Hotfix-A's content-bearing-parts gate kept ONE chunk under last-write-wins; parallel multi-FC dropped FC1's Gemini-3-mandated thoughtSignature.

Vendor docs fetched verbatim during diagnosis:

Resolution: collectStream no longer gates or overwrites — it ACCUMULATES parts across every chunk verbatim. Empty-text terminator parts, parallel functionCall parts, and standalone signature-bearing parts all preserved exactly as Gemini emitted them. Candidate scaffold (finishReason / safety / grounding / citation) stays last-write-wins; on stream exit the final candidates shape is synthesised by overlaying accumulated parts onto the last seen scaffold.

Side-effects verified clean:

  • ask_agentic iterResult.signatures operates on parts array — no behavioural change
  • Content-aware NO_PROGRESS dedupe (v1.16.0) keys on (name, args) pairs — no change
  • ask + code text extraction reads accumulated response.text — no change
  • code.tool.ts executableCode + codeExecutionResult extraction now correct across chunks (closes T35 secondary class — bonus)
  • T36 also resolved by construction: accumulation preserves toolCall parts the same way as functionCall, no predicate gating remains

Closes: T35 (RESOLVED in same release), T36 (RESOLVED-BY-T35).

Test plan

  • Unit tests: 771 pass (766 → 771; +5 regression pins for the multi-chunk cases)
  • Lint: clean
  • Typecheck: clean
  • End-to-end MCP smoke against live Gemini Pro: Tests 1–4 PASS (Fix v1.0 core: SDK-based MCP with persistent Context Caching #1 verified)
  • End-to-end MCP smoke Test 5 (Fix fix: generateContent({cachedContent, systemInstruction}) → 400 — v1.0.0 release #2 / T35 verified) — requires Claude Code session restart to load the rebuilt MCP binary; documented in .claude/local-v1.16.3-resume-after-session-reset.md
  • 3-way Round-1 review (gemini + grok + codex)
  • /6step on every finding (per CLAUDE.md mandate)
  • Round-2 verification + Copilot pass
  • User-driven merge (NEVER auto-merge per CLAUDE.md)
  • npm publish + MCP Registry + GitHub Release after merge

🤖 Generated with Claude Code

qmt and others added 3 commits May 1, 2026 09:07
…i empirical fix)

v1.16.2 ask_agentic returned `(model returned empty response)` on every
tool-using prompt against live Gemini Pro/Flash. Symptoms surfaced on the
first end-to-end smoke test of the global install.

Root cause:
Gemini's actual stream protocol for tool-using turns emits two chunks:
1. `{ candidates: [{ content: { parts: [{ functionCall }] }}] }`
2. `{ candidates: [{ content: { parts: [{ text: '' }] }, finishReason: 'STOP' }] }`

v1.16.2's gate required `parts.length > 0`. Chunk 2 has length 1 (one
empty-text Part) so the gate accepted it, overwriting chunk 1's
functionCall. The loop then saw zero functionCallParts and routed to the
final-text path, returning the empty fallback.

Fix:
Strengthen the gate to require AT LEAST ONE part to carry actual content
— any non-text Part (functionCall, executableCode, codeExecutionResult,
fileData, inlineData) OR a text Part with non-empty string. Empty-text
terminator chunks (whether `[]`, `[{}]`, or `[{ text: '' }]`) fail the
gate; the previous chunk's functionCall survives.

Two regression tests pin the empirical pattern:
- test/unit/stream-collector.test.ts — collector-level
- test/unit/ask-agentic.test.ts — agentic-loop-level

766 tests pass (764 → 766).

Why review missed it: gemini Round-1 called out this class of bug; the
fold targeted `parts: []` (empty array) and pinned that shape. The
empirical `parts: [{ text: '' }]` (length-1 array with empty content)
was not captured in any pre-merge test because no one captured a raw
stream from live Gemini before merging. Will extend
real-gemini.smoke.test.ts in a follow-up to add a pre-merge live smoke
covering streaming + tool-calling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
/6step on the v1.16.3 hotfix surfaced two non-blocking issues that need
documentation in docs/FOLLOW-UP-PRS.md, not code changes:

T35 — collectStream multi-chunk parts fragmentation
  Pre-existing latent bug in code.tool.ts:726-733 (iterates candidates for
  executableCode + codeExecutionResult). If Gemini ever emits these parts
  in separate chunks, last-write-wins on lastCandidates drops the earlier
  one. v1.16.3 doesn't fix this — it's not the empirical regression. True
  parts accumulation would address it. Fix-on-evidence; zero empirical
  reports in v1.14.x → v1.16.3 cycle.

T36 — toolCall not in v1.16.3 content-bearing predicate
  ACCEPTED-DEFERRED for completeness. Server-side tool variant
  (Part.toolCall vs Part.functionCall). Currently unused in our codebase
  (we configure functionDeclarations, never server-side tools). Add when
  enabling Gemini's grounded-search or similar.

No source code changes. Documentation only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same-release fold of T35 (multi-chunk parts fragmentation), escalated
from "deferred / no empirical reports" after end-to-end MCP smoke-test
on 2026-05-01 surfaced the empirical failure: ask_agentic against live
Gemini Pro returned 400 "missing thought_signature, position 2" on
multi-file prompts (Test 5).

Hotfix-A's content-bearing-parts gate kept ONE chunk's parts under
last-write-wins. Single-FC prompts (Tests 1–4) all passed. Parallel
multi-FC prompts dropped FC1's Gemini-3-mandated thoughtSignature —
the "FIRST functionCall part" signature contract per
https://ai.google.dev/gemini-api/docs/thought-signatures — because
chunks 2..N each overwrote the signature-bearing chunk 1.

Fix: collectStream no longer gates or overwrites — accumulates parts
across every chunk verbatim. Empty-text terminator parts, parallel
functionCall parts, and standalone signature-bearing parts all
preserved exactly as Gemini emitted them. Candidate scaffold
(finishReason / safetyRatings / groundingMetadata / citationMetadata)
stays last-write-wins; on stream exit the final candidates shape is
synthesised by overlaying accumulated parts onto the last seen
scaffold.

Side-effects verified clean:
  - ask_agentic iterResult.signatures still operates on parts array
  - content-aware NO_PROGRESS dedupe (v1.16.0) keys on (name, args)
    pairs in signatures — no change
  - ask + code text extraction reads accumulated response.text — no
    change
  - code.tool.ts executableCode + codeExecutionResult extraction now
    correct across chunks (closes T35 secondary class — bonus)
  - T36 also resolved by construction: accumulation preserves toolCall
    parts the same way as functionCall, no predicate gating remains

Tests: 771 pass (766 → 771; +5 regression pins covering parallel-FC
across 3 chunks pinning thoughtSignature on FC1, executableCode +
codeExecutionResult cross-chunk, standalone signature on empty-text
terminator, content.role synthesis, candidates-undefined preservation
for naked-text streams).

Authoritative docs fetched verbatim during diagnosis:
  - ai.google.dev/gemini-api/docs/thought-signatures
  - ai.google.dev/gemini-api/docs/function-calling

Closes T35 (RESOLVED in same release). Closes T36 (RESOLVED-BY-T35).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@qmt qmt changed the title v1.16.3 HOTFIX — collectStream content-bearing parts gate (live Gemini empirical fix) v1.16.3 hotfix — collectStream parts accumulation (Fix #1: empty-text terminator + Fix #2/T35: multi-chunk parts fragmentation) May 1, 2026
@qmt qmt requested a review from Copilot May 1, 2026 12:16
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Hotfix release v1.16.3 addressing a v1.16.2 Gemini streaming regression by changing the stream collector to preserve all emitted candidates[0].content.parts across chunks (including empty-text terminators and multi-chunk/parallel function calls), with accompanying regression tests and release metadata updates.

Changes:

  • Update collectStream to accumulate content.parts across all chunks and synthesize the final candidate by overlaying accumulated parts onto the last candidate “scaffold” metadata.
  • Add/extend unit tests covering empty-text terminators, parallel functionCall fragmentation with thoughtSignature, and cross-chunk codeExecution parts.
  • Bump release/version metadata and document T35/T36 status in docs and CHANGELOG.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/tools/shared/stream-collector.ts Implements parts accumulation across stream chunks and scaffold overlay on exit.
test/unit/stream-collector.test.ts Adds regression coverage for multi-chunk parts accumulation and related edge cases.
test/unit/ask-agentic.test.ts Adds a regression test for the empirical empty-text terminator pattern in tool-using turns.
CHANGELOG.md Documents v1.16.3 hotfix rationale and scope.
docs/FOLLOW-UP-PRS.md Marks T35 resolved in v1.16.3 and documents T36 status.
package.json Bumps package version to 1.16.3.
server.json Bumps server/package version references to 1.16.3.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread CHANGELOG.md Outdated
### Notes

- Anyone on v1.16.2 with `ask_agentic` calls in production should upgrade IMMEDIATELY. Tool-calling iterations were silently failing on every prompt.
- T35 (true parts accumulation) — closed in this release. T36 (`toolCall` predicate completeness for future Gemini server-side tools) remains tracked in `docs/FOLLOW-UP-PRS.md` and is unaffected (the accumulation contract handles `toolCall` exactly the same way as `functionCall`).
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CHANGELOG note says T36 “remains tracked” in FOLLOW-UP-PRS, but this PR also updates docs/FOLLOW-UP-PRS.md to mark T36 as “RESOLVED-BY-T35 in v1.16.3”. Please reconcile the release notes with the tracking doc (either mark T36 resolved here too, or adjust FOLLOW-UP-PRS if it’s still intended to be open).

Suggested change
- T35 (true parts accumulation) — closed in this release. T36 (`toolCall` predicate completeness for future Gemini server-side tools) remains tracked in `docs/FOLLOW-UP-PRS.md` and is unaffected (the accumulation contract handles `toolCall` exactly the same way as `functionCall`).
- T35 (true parts accumulation) — closed in this release. T36 (`toolCall` predicate completeness for future Gemini server-side tools) is also resolved by T35 in v1.16.3 (the accumulation contract handles `toolCall` exactly the same way as `functionCall`).

Copilot uses AI. Check for mistakes.
Comment thread test/unit/ask-agentic.test.ts Outdated
Comment on lines +2454 to +2469
// Live API smoke against v1.16.2 published binary on 2026-05-01 returned
// `(model returned empty response)` on every tool-using prompt — both
// gemini-pro-latest and gemini-flash-latest. Raw chunk capture revealed
// the empirical terminator shape `[{text: ''}]` (length 1) which slipped
// through v1.16.2's `parts.length > 0` gate. v1.16.3 fix requires AT
// LEAST ONE part with actual content.
const root = mkdtempSync(join(tmpdir(), 'gcctx-askagent-empty-text-terminator-'));
writeFileSync(join(root, 'a.ts'), 'export const answer = 42;');

const { ctx, generateContent, generateContentStream } = buildCtx({ script: [] });
generateContent.mockReset();
generateContentStream.mockReset();
// Iter 1: empirical Gemini shape — functionCall in chunk 1, [{text:''}]
// terminator in chunk 2. Pre-fix: chunk 2 overwrites chunk 1 in
// lastCandidates → loop sees no functionCall → final-text path → empty.
// Post-fix: chunk 2 fails the content-bearing gate → chunk 1 wins.
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test’s commentary still describes the v1.16.3 fix as a “content-bearing parts gate” that rejects the [{text: ''}] terminator chunk. In the current implementation, collectStream accumulates parts across chunks (including empty-text terminators), so the explanation is now misleading. Please update the comments to match the accumulation-based behavior (functionCall is preserved because parts from chunk 1 and chunk 2 are both retained, and downstream extraction filters for functionCall parts).

Copilot uses AI. Check for mistakes.
qmt and others added 2 commits May 1, 2026 14:27
…comment reconcile

R1 4-way review (GPT + Gemini + Grok + Copilot) on PR #59 surfaced a
multi-candidate type-narrowing regression in the initial T35 implementation
(GPT F1 High / Gemini F1 Nit / Grok F2 Medium — 3-way consensus, /6step
PARTIAL Medium) plus four doc/comment residues. Folds:

- A. stream-collector.ts: collapse-to-single-element synth replaced with
  per-index `Map<number, Part[]>` + `Map<number, Candidate>` accumulation.
  Pre-T35 `lastCandidates = chunk.candidates` semantics restored for any
  future caller setting `candidateCount > 1` (today none does, but
  `code.tool.ts:727` iterates ALL candidates so the narrowing was a
  regression-in-waiting). +1 multi-candidate regression pin (Fold B).
- D. stream-collector.test.ts:101 title retitled to mention hotfix-A
  history AND T35 strengthening explicitly.
- G. CHANGELOG T36 reconciled with FOLLOW-UP-PRS.md (RESOLVED-BY-T35,
  not "remains tracked").
- H. ask-agentic.test.ts:2469 comment updated to describe T35 accumulation
  behaviour instead of stale "content-bearing gate".

ACCEPT-DEFERRED C (Grok F3 Medium): end-to-end multi-FC + thoughtSignature
test through askAgenticTool.execute. Collector contract pinned at
stream-collector.test.ts:189; live-API integration smoke is the documented
next-PR scope.

ACCEPT E (Grok F4 Low) + F (Grok F5 Low): doc-precision items consistent
with project conventions (verified empirically against test count + prior
release format).

772 tests pass (was 771). Lint + typecheck green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R2 verification (3-way: GPT + Gemini + Grok) on R1 fold commit 1adda5e
surfaced a 2-way HIGH consensus (GPT F1 / Gemini F1) on per-index Map keying
in stream-collector.ts. Folded:

- I. stream-collector.ts: keys now `cand.index ?? i` instead of loop position
  alone. Per @google/genai SDK Candidate.index doc, sparse stream emission
  like `candidates: [{ index: 1, content: ... }]` (only non-zero index)
  would cross-wire that part into bucket 0 under the loop-i keying.
  /6step verdict TP Medium (HIGH inflated by reviewers — today exercises
  only single-candidate path so practical impact is 0; fix is structural).
- I-comment: synth `.sort()` becomes load-bearing under cand.index keying
  because Map insertion order can diverge from ordinal index when chunks
  emit indices in reverse — comment added.
- J. New regression pin `keys multi-candidate buckets by Candidate.index,
  not array position` emits chunks 1+2+3 with sparse emission (chunk 1:
  index 1 only, chunk 2: index 0 only, chunk 3: index 1 only — Map
  insertion order [1, 0]) and asserts ordinal output ordering plus no
  cross-bucket bleed. Locks both keying invariant and sort load-bearing-ness.

ACCEPT K (Grok R2 F2 Nit): sort was redundant under R1 keying; becomes
load-bearing under R2 keying so it stays.

773 tests pass (was 772). Lint + typecheck green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/tools/shared/stream-collector.ts Outdated
Comment on lines +37 to +40
* the final chunk — earlier chunks may report
* "STOP_REASON_UNSPECIFIED" mid-stream. We retain the LAST chunk's
* candidate scaffold and synthesise the final `candidates` array on
* exit by overlaying the accumulated `parts` onto it.
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring says the collector retains the LAST chunk’s candidate scaffold, but the implementation only updates scaffolds when chunk.candidates includes entries (e.g., a later chunk with candidates: [] leaves the earlier scaffold intact). Consider rewording to “last seen candidate scaffold per index” / “latest candidate-bearing chunk” to match actual behavior and the unit test that expects scaffold preservation when later chunks have no candidates.

Suggested change
* the final chunk earlier chunks may report
* "STOP_REASON_UNSPECIFIED" mid-stream. We retain the LAST chunk's
* candidate scaffold and synthesise the final `candidates` array on
* exit by overlaying the accumulated `parts` onto it.
* the final candidate-bearing chunk earlier chunks may report
* "STOP_REASON_UNSPECIFIED" mid-stream. We retain the last seen
* candidate scaffold per index and synthesise the final `candidates`
* array on exit by overlaying the accumulated `parts` onto it. Chunks
* with `candidates: []` do not clear a previously retained scaffold.

Copilot uses AI. Check for mistakes.
Comment thread test/unit/ask-agentic.test.ts Outdated
Comment on lines +2458 to +2459
// through v1.16.2's `parts.length > 0` gate. v1.16.3 fix requires AT
// LEAST ONE part with actual content.
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test comment says “v1.16.3 fix requires AT LEAST ONE part with actual content”, but the current v1.16.3 implementation is true parts accumulation (no content-bearing gate). Please update the comment to reflect the accumulation behavior (and, if needed, refer to the earlier hotfix-A gate as historical context) so future readers don’t infer a predicate that no longer exists.

Suggested change
// through v1.16.2's `parts.length > 0` gate. v1.16.3 fix requires AT
// LEAST ONE part with actual content.
// through v1.16.2's `parts.length > 0` gate. v1.16.3 instead does true
// parts accumulation: later chunks append their parts verbatim, so an
// empty-text terminator no longer suppresses an earlier functionCall.
// (The earlier hotfix-A "content-bearing part" gate is historical
// context only, not the current predicate.)

Copilot uses AI. Check for mistakes.
Pure doc/comment drift cleanup left behind by the R1 + R2 fold cycles.
No code change.

R3 Copilot review — 4 line-comments, /6step verdict:
- Findings #1, #2 (CHANGELOG T36 status + ask-agentic test inner comment,
  pinned on commit 4510f9d): FALSE POSITIVE at HEAD — already addressed
  by Fold G + Fold H in the R1 commit (1adda5e). Copilot pinned on a
  stale commit; resolve threads in the GH UI.
- Finding #3 (stream-collector.ts:35-40 docstring "LAST chunk's candidate
  scaffold", pinned on 083c57e): TRUE POSITIVE LOW. Docstring lagged the
  per-index map keying introduced by R1+R2. Reword to "last
  CANDIDATE-BEARING chunk's scaffold per `cand.index`" and document that
  a chunk emitting `candidates: []` does NOT clear earlier scaffolds.
- Finding #4 (ask-agentic.test.ts:2454-2459 outer test header
  "v1.16.3 fix requires AT LEAST ONE part with actual content",
  pinned on 083c57e): TRUE POSITIVE LOW. Co-located with the
  Fold-H-updated inner block (lines 2466-2474) — the outer header
  still narrated hotfix-A semantics. Reword to describe true parts
  accumulation; add a one-line historical note that hotfix-A's
  content-bearing gate was superseded in this same release.

Verification: lint, typecheck, 773 tests pass. dist/ unchanged in
behaviour — pure comment edits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@qmt qmt merged commit 8f64110 into main May 1, 2026
4 checks passed
@qmt qmt deleted the v1.16.3-hotfix-fragmentation-gate branch May 1, 2026 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants