Skip to content

fix: preserve tool messages in history to prevent Gemini ordering errors#251

Open
elpiarthera wants to merge 3 commits into
get-convex:mainfrom
elpiarthera:fix/issue-200-gemini-tool-history
Open

fix: preserve tool messages in history to prevent Gemini ordering errors#251
elpiarthera wants to merge 3 commits into
get-convex:mainfrom
elpiarthera:fix/issue-200-gemini-tool-history

Conversation

@elpiarthera
Copy link
Copy Markdown

@elpiarthera elpiarthera commented Apr 9, 2026

Summary

  • Fixed docsToModelMessages to preserve tool-role messages even when content appears empty — prevents Gemini's strict functionCall → functionResponse ordering from breaking
  • Added console.warn in filterOutOrphanedToolMessages when tool messages are dropped, logging the message ID, its toolCallIds, and available toolCallIds for diagnostics

Root cause

When tool messages have empty content after orphan filtering, docsToModelMessages filtered them out via .filter(m => !!m.content.length). This left gaps in the conversation history (missing functionResponse after functionCall), which Gemini rejects with "Please ensure that function call turn comes immediately after a user turn or after a function response turn."

Test plan

  • npm run build passes
  • npm run lint passes (0 errors)
  • npm test passes (262/262)

Fixes #200

Orchestrator: Zeta — VantageOS Team Dev | 2026-04-09

docsToModelMessages filtered out messages with empty content, which
could drop tool-role messages after orphan filtering. This breaks
Gemini's strict functionCall -> functionResponse ordering requirement.

Also adds console.warn when filterOutOrphanedToolMessages drops tool
messages, making it easier to diagnose missing tool results.

Fixes get-convex#200
Comment thread src/mapping.ts Outdated
.map((m) => m.message)
.filter((m) => !!m)
.filter((m) => !!m.content.length)
.filter((m) => m.role === "tool" || !!m.content.length)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we passing through all tools here just to filter them out elsewhere?

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 9, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

The filterOutOrphanedToolMessages function is refactored from a single-pass to a three-pass filtering approach: (1) collect all tool-call, tool-result, and approval identifiers; (2) filter assistant messages into an intermediate list while computing surviving tool-call ids; (3) filter tool-role messages to retain only tool-result parts whose toolCallId exists in the surviving set. It now logs a console.warn when a non-empty tool message is dropped. New Vitest tests were added to reproduce the cascade regression and assert invariants and regression cases.

Changes

Cohort / File(s) Summary
Tool Message Filtering & Tests
src/client/search.ts, src/client/search.test.ts
Refactored filterOutOrphanedToolMessages to a three-pass algorithm that tracks surviving tool-call ids and filters tool messages accordingly; added warnings when non-empty tool messages are dropped. Added four tests covering cascade regression, invariant property, regression pair preservation, and mixed-case partial-resolution behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • get-convex/agent#230: Also modifies filterOutOrphanedToolMessages in src/client/search.ts to change tool-call/tool-result/approval processing.

Suggested reviewers

  • ianmacartney
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main change: fixing tool message preservation in history to prevent Gemini ordering errors, which is the central fix implemented in the PR.
Description check ✅ Passed The description clearly explains the fix (preserving tool messages in history), root cause (empty content filtering), and provides test plan verification, all directly related to the changeset.
Linked Issues check ✅ Passed The PR addresses the core issue #200 requirements: prevents tool-result message drops via three-pass filtering, adds console.warn diagnostics, and ensures the Gemini-required message sequence by preserving tool messages rather than silently dropping them.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing issue #200: refactored filterOutOrphanedToolMessages, added diagnostic logging, and included comprehensive test coverage for the cascade behavior—no unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@elpiarthera
Copy link
Copy Markdown
Author

Re: @ianmacartney comment on line 138 — good point.

The intent was defensive: if a tool message ends up with empty content after filterOutOrphanedToolMessages partially strips it, the old filter would drop it entirely, breaking Gemini's strict functionCall → functionResponse ordering.

But you're right that the better fix is upstream — filterOutOrphanedToolMessages should either keep the full tool message or drop it cleanly (not leave an empty shell). The console.warn in this PR helps diagnose when that happens.

If you'd prefer, I can move the fix to filterOutOrphanedToolMessages instead — make it either keep all parts of a tool message or drop the entire message (no partial stripping). That way docsToModelMessages doesn't need the special case. Let me know which approach you prefer.

Orchestrator: Zeta — VantageOS Team Dev | 2026-04-10

@ianmacartney
Copy link
Copy Markdown
Member

ianmacartney commented Apr 10, 2026 via email

Per maintainer feedback on get-convex#251: keep tool messages fully intact or drop
them cleanly — no partial-stripped shells. Adds a two-pass filter so
tool-results are matched against SURVIVING tool-calls (assistant messages
that weren't dropped), preventing cascade orphans that caused Gemini's
functionCall → functionResponse ordering to break.

Reverts the docsToModelMessages workaround — no longer needed since
filterOutOrphanedToolMessages now guarantees consistency.
@elpiarthera
Copy link
Copy Markdown
Author

Thanks for the quick feedback, @ianmacartney — pushed the revision.

Moved the cascade handling fully into filterOutOrphanedToolMessages (no more partial stripping leaking downstream), and reverted the docsToModelMessages workaround. The filter now runs in two relevant passes after ID collection:

  1. Filter assistant messages first — any tool-call whose toolCallId has no matching tool-result and no approval request/response is stripped. If that leaves the assistant message empty, the whole message is dropped (existing behavior).
  2. Track survivingToolCallIds — only the tool-calls in kept assistant messages make it into this set. Tool-calls that were stripped OR lived in a dropped message do NOT survive.
  3. Filter tool messages against survivingToolCallIds (instead of the raw toolCallIds from pass 1) — this closes the cascade: when an assistant message is dropped, its orphaned tool-results in later messages are now correctly filtered out too. Messages become either fully kept or fully dropped — no empty shells.

Approval-workflow logic (approvalRequestsByToolCallId, approvalResponseIds, hasApprovalResponse, hasApprovalRequest) is untouched. The console.warn now logs "surviving toolCallIds" to reflect the new semantic.

All 262 tests pass (including existing filterOutOrphanedToolMessages coverage in src/client/search.test.ts). Diff is tight: 2 files, +32/-7.

Orchestrator: Zeta — VantageOS Team Dev | 2026-04-22

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/client/search.ts (1)

309-330: ⚠️ Potential issue | 🟠 Major

Do not drop pending assistant tool-call turns.

Line 309 filters out assistant tool-call parts without a result/approval, and Lines 316-330 then drop tool-call-only assistant messages. That regresses the existing src/client/index.test.ts expectation where [response1, call2] should keep call2 while dropping only the orphaned tool response.

🐛 Proposed direction
-      const content = doc.message.content.filter(
-        (p) =>
-          p.type !== "tool-call" ||
-          toolResultIds.has(p.toolCallId) ||
-          hasApprovalResponse(p.toolCallId) ||
-          hasApprovalRequest(p.toolCallId),
-      );
+      const content = doc.message.content;

If assistant tool-call pruning is now intentional, the existing tests and behavior contract should be updated explicitly; otherwise, keep assistant tool calls intact and only filter tool-role responses against surviving assistant calls.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/client/search.ts` around lines 309 - 330, The current filter in
src/client/search.ts removes assistant tool-call parts when they lack
results/approvals, which causes assistant tool-call turns to be dropped later;
change the predicate so assistant messages with type "tool-call" are always
retained and only filter out tool-role (p.role === "tool") responses that lack a
surviving call/result/approval. Concretely, modify the content filter (the block
using toolResultIds, hasApprovalResponse, hasApprovalRequest) to apply the
toolResultIds/approval checks only when p.role === "tool" (or equivalently keep
all p.type === "tool-call" where p.role === "assistant"), leaving
survivingToolCallIds and intermediate construction unchanged so assistant
tool-call turns like call2 are preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/client/search.ts`:
- Around line 345-351: The current filter on doc.message.content keeps every
non-"tool-result" part, which lets orphaned "tool-approval-response" entries
pass; update the filter so that when c.type === "tool-approval-response" you
only keep it if its c.toolCallId is present in survivingToolCallIds (similar to
the "tool-result" branch), otherwise drop it; preserve the existing behavior for
truly non-tool parts by still returning true for other types. Use the existing
identifiers doc.message.content, survivingToolCallIds, and the c.type checks to
locate and change the logic.

---

Outside diff comments:
In `@src/client/search.ts`:
- Around line 309-330: The current filter in src/client/search.ts removes
assistant tool-call parts when they lack results/approvals, which causes
assistant tool-call turns to be dropped later; change the predicate so assistant
messages with type "tool-call" are always retained and only filter out tool-role
(p.role === "tool") responses that lack a surviving call/result/approval.
Concretely, modify the content filter (the block using toolResultIds,
hasApprovalResponse, hasApprovalRequest) to apply the toolResultIds/approval
checks only when p.role === "tool" (or equivalently keep all p.type ===
"tool-call" where p.role === "assistant"), leaving survivingToolCallIds and
intermediate construction unchanged so assistant tool-call turns like call2 are
preserved.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 78add62d-6c7d-456f-becc-0a1c8797e849

📥 Commits

Reviewing files that changed from the base of the PR and between 06c6795 and 0648e28.

📒 Files selected for processing (1)
  • src/client/search.ts

Comment thread src/client/search.ts
Comment on lines 345 to 351
const content = doc.message.content.filter((c) => {
// tool-result parts have toolCallId
if (c.type === "tool-result") {
return toolCallIds.has(c.toolCallId);
return survivingToolCallIds.has(c.toolCallId);
}
// tool-approval-response parts don't have toolCallId, so include them
return true;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Filter approval responses against surviving approval requests.

Line 351 keeps every non-tool-result part, so orphaned tool-approval-response parts survive even when their approval request/tool call was dropped or never fetched.

🐛 Proposed fix
   const survivingToolCallIds = new Set<string>();
+  const survivingApprovalIds = new Set<string>();
   const intermediate: MessageDoc[] = [];
@@
         for (const p of content) {
           if (p.type === "tool-call") {
             survivingToolCallIds.add(p.toolCallId);
+          } else if (p.type === "tool-approval-request") {
+            survivingApprovalIds.add(p.approvalId);
           }
         }
@@
         if (c.type === "tool-result") {
           return survivingToolCallIds.has(c.toolCallId);
         }
-        // tool-approval-response parts don't have toolCallId, so include them
+        if (c.type === "tool-approval-response") {
+          return survivingApprovalIds.has(c.approvalId);
+        }
         return true;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/client/search.ts` around lines 345 - 351, The current filter on
doc.message.content keeps every non-"tool-result" part, which lets orphaned
"tool-approval-response" entries pass; update the filter so that when c.type ===
"tool-approval-response" you only keep it if its c.toolCallId is present in
survivingToolCallIds (similar to the "tool-result" branch), otherwise drop it;
preserve the existing behavior for truly non-tool parts by still returning true
for other types. Use the existing identifiers doc.message.content,
survivingToolCallIds, and the c.type checks to locate and change the logic.

@ianmacartney ianmacartney requested a review from sethconvex April 27, 2026 23:12
@magicseth
Copy link
Copy Markdown

Thanks for the refactor!
Do we have the actual stored thread messages from the failing #200 case, especially the messages immediately before the Gemini 400?

I’m trying to understand whether the corruption was:

  1. an assistant tool-call was preserved but the matching tool-result/function response was dropped,
  2. a tool-result/approval response was preserved after its assistant tool-call was dropped,
  3. an empty role: "tool" message reached docsToModelMessages and was dropped there,
  4. Gemini/provider conversion changed a tool response into a user role in a way this filter doesn’t see, or
    something else in the saved order/stepOrder sequence.

Without the concrete message sequence, it’s hard to tell whether commit 2 fixes the real failure or just a plausible orphan-filtering case. Could you add or share a minimal reproduction using the original bad history?

@elpiarthera
Copy link
Copy Markdown
Author

@magicseth — fair pushback, let me be transparent.

I don't have the original reporter's failing thread either — ianmacartney asked the same question in January (#200 (comment): "can you dump the messages in the thread earlier than that") and it was never answered. So commit 2 was reasoned from the failure mode described in the issue, not from a captured production trace.

Mapping the cascade fix against your enumeration:

  1. assistant tool-call preserved, matching tool-result dropped — not addressed by this PR; would require a different fix (drop the orphan assistant call).
  2. tool-result preserved after its assistant tool-call was droppedthis is what commit 2 targets. The previous single-pass version filtered tool messages against the original toolCallIds set, so if pass 1 of the assistant filter dropped an assistant message whose calls had no matching results, the tool messages still pointing at those (now-stale) toolCallIds would survive into docsToModelMessages. The three-pass version tracks survivingToolCallIds from the post-filter assistant set, closing that cascade.
  3. empty role: "tool" reaching docsToModelMessages — addressed indirectly: if all of a tool message's content items were orphans, the message was already dropped by the original filter; cascade handling doesn't change that.
  4. provider conversion flipping role to user — out of scope; happens after this filter in the AI SDK's Gemini adapter.

So the honest claim is narrower than "fixes #200": this PR closes one specific class of orphan cascade (case 2). Whether that's the actual root cause of the reporter's intermittent 400 is genuinely unknown without their thread dump.

I can put together a synthetic minimal repro that constructs the structural state this fix prevents (assistant message with all-orphan tool-calls + a subsequent tool message referencing one of them) and asserts the post-filter output is Gemini-valid — happy to add it to this PR if useful. Let me know.

Orchestrator: Zeta — VantageOS Team Dev | 2026-05-18

Constructs the structural state described in get-convex#200 (assistant tool-call
with no matching tool-result, followed by a tool message referencing
it) and asserts the three-pass filter drops both — closing the orphan
cascade that previously reached docsToModelMessages.

Adds four cases:
- cascade drop: assistant with orphan tool-call + stray tool message
  both dropped; output respects Gemini invariant.
- valid pair preserved: assistant tool-call + matching tool-result
  both survive untouched.
- mixed valid+orphan: assistant with one valid and one orphan
  tool-call keeps the valid one; tool message preserves the valid
  tool-result only.
- Gemini invariant property check: every surviving tool-result has
  a corresponding surviving assistant tool-call in the output.

Addresses @magicseth's request for a concrete repro on get-convex#251.
@elpiarthera
Copy link
Copy Markdown
Author

Orchestrator: Zeta — VantageOS Team Dev | 2026-05-18

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/client/search.test.ts (1)

529-644: ⚡ Quick win

Consider extracting duplicate invariant validation logic.

The Gemini invariant validation logic (lines 608-637) is nearly identical to the validation in the cascade test (lines 499-526). Extracting this into a helper function would improve maintainability and reduce duplication.

♻️ Proposed refactor to extract helper

Add a helper function in the test file:

+// Helper to validate Gemini invariant: every tool-result has a corresponding assistant tool-call
+function validateGeminiInvariant(result: MessageDoc[]) {
+  const survivingCallIds = new Set<string>();
+  for (const d of result) {
+    if (
+      d.message?.role === "assistant" &&
+      Array.isArray(d.message.content)
+    ) {
+      for (const p of d.message.content) {
+        if (p.type === "tool-call") {
+          survivingCallIds.add(
+            (p as { toolCallId: string }).toolCallId,
+          );
+        }
+      }
+    }
+  }
+
+  for (const d of result) {
+    if (d.message?.role === "tool" && Array.isArray(d.message.content)) {
+      for (const c of d.message.content) {
+        if (c.type === "tool-result") {
+          expect(
+            survivingCallIds.has(
+              (c as { toolCallId: string }).toolCallId,
+            ),
+          ).toBe(true);
+        }
+      }
+    }
+  }
+}

Then replace both validation blocks with:

-      const survivingAssistantCallIds = new Set<string>();
-      for (const d of result) {
-        // ... validation logic ...
-      }
+      validateGeminiInvariant(result);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/client/search.test.ts` around lines 529 - 644, The test duplicates the
Gemini invariant validation logic; extract it into a shared helper (e.g.
assertEveryToolResultHasSurvivingAssistantCall or validateGeminiInvariant) that
takes the filtered messages array and performs: 1) collect surviving assistant
tool-call ids by scanning message.role === "assistant" and items with type ===
"tool-call", 2) assert every tool message item with type === "tool-result"
references a collected id, and 3) optionally assert presence/absence of specific
_id's as needed; then replace the inline loops in both the cascade test and the
current test with calls to this helper to remove the duplicate blocks and
improve readability.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/client/search.test.ts`:
- Around line 529-644: The test duplicates the Gemini invariant validation
logic; extract it into a shared helper (e.g.
assertEveryToolResultHasSurvivingAssistantCall or validateGeminiInvariant) that
takes the filtered messages array and performs: 1) collect surviving assistant
tool-call ids by scanning message.role === "assistant" and items with type ===
"tool-call", 2) assert every tool message item with type === "tool-result"
references a collected id, and 3) optionally assert presence/absence of specific
_id's as needed; then replace the inline loops in both the cascade test and the
current test with calls to this helper to remove the duplicate blocks and
improve readability.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: get-convex/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 07a2eaf7-7990-498e-9c96-d25a086397df

📥 Commits

Reviewing files that changed from the base of the PR and between 0648e28 and d857523.

📒 Files selected for processing (1)
  • src/client/search.test.ts

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 27, 2026

Open in StackBlitz

npm i https://pkg.pr.new/@convex-dev/agent@251

commit: d857523

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gemini Tool Call History Corruption

3 participants