Skip to content

fix(mapping): preserve tool messages in multi-step history (#200)#272

Open
robelest wants to merge 2 commits into
get-convex:mainfrom
robelest:fix/issue-200-serialize-new-messages-watermark
Open

fix(mapping): preserve tool messages in multi-step history (#200)#272
robelest wants to merge 2 commits into
get-convex:mainfrom
robelest:fix/issue-200-serialize-new-messages-watermark

Conversation

@robelest
Copy link
Copy Markdown
Collaborator

@robelest robelest commented May 24, 2026

Problem

With Gemini models, multi-step tool conversations intermittently fail on the next request with function call turn must come immediately after a user turn or after a function response turn. Fixes #200.

serializeNewMessagesInStep decided which messages were new in a step by slicing the last 1-2 entries of step.response.messages. In AI SDK v6 that array is cumulative across steps, so the guess silently dropped messages whenever a single step produced 3 or more (text + tool-call + tool-result, for example). Dropped messages are never written to Convex; the next request loads a history with a functionCall missing its functionResponse, which Gemini rejects.

The internal generation loop in startGeneration already tracks the correct watermark, so streamText/generateText are unaffected. The bug only reached callers that hit serializeNewMessagesInStep directly: agent.saveStep and the playground API.

Fix

Replace the heuristic with an explicit watermark:

  • serializeNewMessagesInStep takes a required previousResponseMessageCount. Defaulting it would turn a contract violation into silent duplication, the same class of bug.
  • agent.saveStep takes an optional previousStep and derives the count from it. Self-documenting and harder to pass the wrong number.
  • The playground loop runs sequentially so the watermark accumulates across steps.

Behavior change

agent.saveStep is public. If you call it in a manual multi-step loop, pass the new previousStep argument so only the new messages are saved. Single-step usage needs no change.

Relationship to #251

#251 targets the same Gemini ordering error from the read side: it rewrites filterOutOrphanedToolMessages into a cascade-consistent three-pass filter, so dropping an assistant message also drops its orphaned tool-result instead of leaving a gap. That keeps the history consistent at fetch time.

This PR is the write-side counterpart: it stops the gaps from being written in the first place, so nothing needs dropping later. The two fix different layers and are independent; neither subsumes the other.

Tests

npm test (272), npm run build, npm run typecheck, npm run lint. New coverage exercises the slicing directly and the saveStep path end-to-end (a test asserts that omitting previousStep duplicates, proving the watermark matters). Manually verified against live Gemini in the playground: a multi-step getGeocoding/getWeather flow saves every tool-call paired with its result.

…et-convex#200)

step.response.messages is cumulative across steps in AI SDK v6. The
old heuristic guessed at "what's new" by slicing the last 1-2 entries,
which silently dropped messages in any step that produced 3+ response
messages (a step that interleaves text + tool-call + tool-result, for
instance). Dropped messages are never written to Convex; the next
request loads a malformed history and Gemini rejects it with its
strict functionCall/functionResponse ordering validation.

The internal generation loop in startGeneration already tracks the
correct watermark via previousResponseMessageCount. The bug only
affects callers that hit serializeNewMessagesInStep directly:
agent.saveStep (manual generation loops) and the playground API.

Replace the heuristic with an explicit watermark:

- src/mapping.ts: serializeNewMessagesInStep takes a required
  `previousResponseMessageCount: number` (no default). Defaulting it
  would have turned a contract violation into a silent duplication
  bug — the exact failure mode this PR fixes. The synthetic empty
  assistant fallback for an empty slice is preserved and now
  documented (downstream addMessages needs each step to contribute
  at least one row to anchor an order slot).

- src/client/index.ts: agent.saveStep takes an optional
  `previousStep?: StepResult<TOOLS>` instead of a raw count. Self-
  documenting at the call site, and harder to pass the wrong number.
  The method JSDoc points at the arg's contract.

- src/client/definePlaygroundAPI.ts: rewritten from
  Promise.all((await steps).map(...)) to a sequential for-loop so the
  running watermark can accumulate across steps. Tool-call round-
  trips dominate the cost; serialization is in-memory.

- src/mapping.test.ts: six tests covering first/middle/multi-message
  steps, content-shape assertions (not just role/length), a
  regression case for the actually-broken shape (text + tool-call +
  tool-result in one step, where the old slice(-2) would have
  dropped the leading text), the empty-slice fallback, and pinned
  behavior for caller-drift (watermark > messages.length).

- src/client/index.test.ts: two caller-side tests through the public
  saveStep API using mockModel contentSteps — one asserts that
  passing previousStep saves each step's messages exactly once, the
  other asserts that omitting it duplicates prior messages.

Note: the handoff suggested step.request.messages as the baseline —
that field doesn't exist in AI SDK v6 (step.request is
LanguageModelRequestMetadata which is {body?: unknown}).
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 24, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository: get-convex/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 24daf1c2-a147-4638-89ba-88624f24fe03

📥 Commits

Reviewing files that changed from the base of the PR and between c730935 and 4ddce8e.

📒 Files selected for processing (2)
  • src/client/index.ts
  • src/mapping.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/client/index.ts

📝 Walkthrough

Walkthrough

This PR replaces a heuristic for selecting "new" step messages with explicit watermarking. serializeNewMessagesInStep now accepts previousResponseMessageCount and slices step.response.messages to serialize only newly produced messages, emitting a synthetic empty assistant message when necessary. Agent.saveStep accepts an optional previousStep to compute and validate the watermark, and generateText was changed to process steps sequentially while threading the watermark. Unit and integration tests cover delta serialization and replay behavior with and without watermarking.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant generateText
  participant serializeNewMessagesInStep
  participant Agent.saveStep
  participant ConvexDB
  Client->>generateText: invoke generateText(steps)
  generateText->>serializeNewMessagesInStep: serialize step with previousResponseMessageCount
  serializeNewMessagesInStep->>generateText: return MessageDoc[]
  generateText->>Agent.saveStep: saveStep(step, previousStep?)
  Agent.saveStep->>serializeNewMessagesInStep: compute & pass previousResponseMessageCount
  Agent.saveStep->>ConvexDB: persist MessageDocs
  ConvexDB-->>Agent.saveStep: ack
  Agent.saveStep-->>generateText: saved messages
  generateText-->>Client: aggregated messages
Loading

Possibly Related PRs

  • get-convex/agent#235: Both PRs modify src/mapping.ts's serializeNewMessagesInStep function and message serialization pipeline; this PR further changes it to use explicit previousResponseMessageCount parameter.

Suggested Reviewers

  • ianmacartney
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: fixing message preservation in multi-step tool histories by addressing the watermark issue in serializeNewMessagesInStep.
Description check ✅ Passed The description comprehensively explains the problem (heuristic message slicing dropping tool messages), the fix (explicit watermark), behavior changes (previousStep parameter), and relationship to other work.
Linked Issues check ✅ Passed The PR fully addresses issue #200 objectives: it replaces heuristic message detection with a deterministic watermark-based approach, ensures all messages are preserved in multi-step scenarios, provides write-side protection, and includes comprehensive tests.
Out of Scope Changes check ✅ Passed All code changes are directly scoped to fixing the watermark-based message serialization: updates to definePlaygroundAPI loop sequencing, serializeNewMessagesInStep function signature, agent.saveStep previousStep parameter, and comprehensive test coverage.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/client/index.ts (1)

1230-1241: ⚡ Quick win

Add a monotonic watermark guard in saveStep.

This path trusts previousStep blindly. A mismatched previousStep can produce an invalid watermark and silently serialize the wrong delta. Fail fast when the previous count exceeds the current step’s cumulative count.

Suggested patch
-    const previousResponseMessageCount =
-      args.previousStep?.response.messages.length ?? 0;
+    const previousResponseMessageCount =
+      args.previousStep?.response.messages.length ?? 0;
+    const currentResponseMessageCount = args.step.response.messages.length;
+    if (previousResponseMessageCount > currentResponseMessageCount) {
+      throw new Error(
+        "Invalid previousStep: previous step has more response messages than current step. " +
+          "Pass the immediate prior step from the same generation loop.",
+      );
+    }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/client/index.ts` around lines 1230 - 1241, The saveStep path trusts
args.previousStep blindly which can yield an invalid watermark; before calling
serializeNewMessagesInStep (or before using previousResponseMessageCount),
compute the current step's cumulative response message count from args.step (or
the same source serializeNewMessagesInStep would use) and if
args.previousStep.response.messages.length > current cumulative count, throw or
return an error to fail fast; update the logic around
previousResponseMessageCount and the call site in
saveStep/serializeNewMessagesInStep to validate and reject mismatched
previousStep values to avoid silently serializing the wrong delta.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/client/index.ts`:
- Around line 1230-1241: The saveStep path trusts args.previousStep blindly
which can yield an invalid watermark; before calling serializeNewMessagesInStep
(or before using previousResponseMessageCount), compute the current step's
cumulative response message count from args.step (or the same source
serializeNewMessagesInStep would use) and if
args.previousStep.response.messages.length > current cumulative count, throw or
return an error to fail fast; update the logic around
previousResponseMessageCount and the call site in
saveStep/serializeNewMessagesInStep to validate and reject mismatched
previousStep values to avoid silently serializing the wrong delta.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: get-convex/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b6dc9a6b-a053-441e-be93-a466cc02a836

📥 Commits

Reviewing files that changed from the base of the PR and between 3cf5064 and c730935.

📒 Files selected for processing (5)
  • src/client/definePlaygroundAPI.ts
  • src/client/index.test.ts
  • src/client/index.ts
  • src/mapping.test.ts
  • src/mapping.ts

Copy link
Copy Markdown
Member

@ianmacartney ianmacartney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you test that this works through the streamText / etc. flows? I don't see where streaming would pass in the previous step

Adds two tests demonstrating the cumulative response.messages behavior
in AI SDK v6: without the watermark, step 2 re-saves all 5 cumulative
messages (the bug); with previousStep.response.messages.length as the
watermark, only the 2 new messages are saved (the fix).

Also adds the CodeRabbit guard: saveStep throws if step.response.messages
is shorter than previousStep.response.messages, which indicates a
mismatched previousStep from a different generation loop.
@robelest
Copy link
Copy Markdown
Collaborator Author

robelest commented May 27, 2026

streamText/generateText don't call the public saveStep; they track previousResponseMessageCount internally in start.ts. previousStep is only for callers managing their own step loop.

Added tests in mapping.test.ts showing both sides: without the watermark, step 2 re-saves all 5 cumulative messages; with previousStep, only the 2 new messages are saved.

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 27, 2026

Open in StackBlitz

npm i https://pkg.pr.new/@convex-dev/agent@272

commit: 4ddce8e

Copy link
Copy Markdown
Member

@ianmacartney ianmacartney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had missed that we're already doing this in start.ts - Seth must have made that fix but not applied it to the other places we save steps.
Hopefully v7 doesn't break this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gemini Tool Call History Corruption

2 participants