fix(mapping): preserve tool messages in multi-step history (#200) by robelest · Pull Request #272 · get-convex/agent

robelest · 2026-05-24T17:56:38Z

Problem

With Gemini models, multi-step tool conversations intermittently fail on the next request with function call turn must come immediately after a user turn or after a function response turn. Fixes #200.

serializeNewMessagesInStep decided which messages were new in a step by slicing the last 1-2 entries of step.response.messages. In AI SDK v6 that array is cumulative across steps, so the guess silently dropped messages whenever a single step produced 3 or more (text + tool-call + tool-result, for example). Dropped messages are never written to Convex; the next request loads a history with a functionCall missing its functionResponse, which Gemini rejects.

The internal generation loop in startGeneration already tracks the correct watermark, so streamText/generateText are unaffected. The bug only reached callers that hit serializeNewMessagesInStep directly: agent.saveStep and the playground API.

Fix

Replace the heuristic with an explicit watermark:

serializeNewMessagesInStep takes a required previousResponseMessageCount. Defaulting it would turn a contract violation into silent duplication, the same class of bug.
agent.saveStep takes an optional previousStep and derives the count from it. Self-documenting and harder to pass the wrong number.
The playground loop runs sequentially so the watermark accumulates across steps.

Behavior change

agent.saveStep is public. If you call it in a manual multi-step loop, pass the new previousStep argument so only the new messages are saved. Single-step usage needs no change.

Relationship to #251

#251 targets the same Gemini ordering error from the read side: it rewrites filterOutOrphanedToolMessages into a cascade-consistent three-pass filter, so dropping an assistant message also drops its orphaned tool-result instead of leaving a gap. That keeps the history consistent at fetch time.

This PR is the write-side counterpart: it stops the gaps from being written in the first place, so nothing needs dropping later. The two fix different layers and are independent; neither subsumes the other.

Tests

npm test (272), npm run build, npm run typecheck, npm run lint. New coverage exercises the slicing directly and the saveStep path end-to-end (a test asserts that omitting previousStep duplicates, proving the watermark matters). Manually verified against live Gemini in the playground: a multi-step getGeocoding/getWeather flow saves every tool-call paired with its result.

…et-convex#200) step.response.messages is cumulative across steps in AI SDK v6. The old heuristic guessed at "what's new" by slicing the last 1-2 entries, which silently dropped messages in any step that produced 3+ response messages (a step that interleaves text + tool-call + tool-result, for instance). Dropped messages are never written to Convex; the next request loads a malformed history and Gemini rejects it with its strict functionCall/functionResponse ordering validation. The internal generation loop in startGeneration already tracks the correct watermark via previousResponseMessageCount. The bug only affects callers that hit serializeNewMessagesInStep directly: agent.saveStep (manual generation loops) and the playground API. Replace the heuristic with an explicit watermark: - src/mapping.ts: serializeNewMessagesInStep takes a required `previousResponseMessageCount: number` (no default). Defaulting it would have turned a contract violation into a silent duplication bug — the exact failure mode this PR fixes. The synthetic empty assistant fallback for an empty slice is preserved and now documented (downstream addMessages needs each step to contribute at least one row to anchor an order slot). - src/client/index.ts: agent.saveStep takes an optional `previousStep?: StepResult<TOOLS>` instead of a raw count. Self- documenting at the call site, and harder to pass the wrong number. The method JSDoc points at the arg's contract. - src/client/definePlaygroundAPI.ts: rewritten from Promise.all((await steps).map(...)) to a sequential for-loop so the running watermark can accumulate across steps. Tool-call round- trips dominate the cost; serialization is in-memory. - src/mapping.test.ts: six tests covering first/middle/multi-message steps, content-shape assertions (not just role/length), a regression case for the actually-broken shape (text + tool-call + tool-result in one step, where the old slice(-2) would have dropped the leading text), the empty-slice fallback, and pinned behavior for caller-drift (watermark > messages.length). - src/client/index.test.ts: two caller-side tests through the public saveStep API using mockModel contentSteps — one asserts that passing previousStep saves each step's messages exactly once, the other asserts that omitting it duplicates prior messages. Note: the handoff suggested step.request.messages as the baseline — that field doesn't exist in AI SDK v6 (step.request is LanguageModelRequestMetadata which is {body?: unknown}).

coderabbitai · 2026-05-24T18:00:03Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository: get-convex/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 24daf1c2-a147-4638-89ba-88624f24fe03

📥 Commits

Reviewing files that changed from the base of the PR and between c730935 and 4ddce8e.

📒 Files selected for processing (2)

src/client/index.ts
src/mapping.test.ts

🚧 Files skipped from review as they are similar to previous changes (1)

src/client/index.ts

📝 Walkthrough

Walkthrough

This PR replaces a heuristic for selecting "new" step messages with explicit watermarking. serializeNewMessagesInStep now accepts previousResponseMessageCount and slices step.response.messages to serialize only newly produced messages, emitting a synthetic empty assistant message when necessary. Agent.saveStep accepts an optional previousStep to compute and validate the watermark, and generateText was changed to process steps sequentially while threading the watermark. Unit and integration tests cover delta serialization and replay behavior with and without watermarking.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant generateText
  participant serializeNewMessagesInStep
  participant Agent.saveStep
  participant ConvexDB
  Client->>generateText: invoke generateText(steps)
  generateText->>serializeNewMessagesInStep: serialize step with previousResponseMessageCount
  serializeNewMessagesInStep->>generateText: return MessageDoc[]
  generateText->>Agent.saveStep: saveStep(step, previousStep?)
  Agent.saveStep->>serializeNewMessagesInStep: compute & pass previousResponseMessageCount
  Agent.saveStep->>ConvexDB: persist MessageDocs
  ConvexDB-->>Agent.saveStep: ack
  Agent.saveStep-->>generateText: saved messages
  generateText-->>Client: aggregated messages

Possibly Related PRs

get-convex/agent#235: Both PRs modify src/mapping.ts's serializeNewMessagesInStep function and message serialization pipeline; this PR further changes it to use explicit previousResponseMessageCount parameter.

Suggested Reviewers

ianmacartney

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: fixing message preservation in multi-step tool histories by addressing the watermark issue in serializeNewMessagesInStep.
Description check	✅ Passed	The description comprehensively explains the problem (heuristic message slicing dropping tool messages), the fix (explicit watermark), behavior changes (previousStep parameter), and relationship to other work.
Linked Issues check	✅ Passed	The PR fully addresses issue `#200` objectives: it replaces heuristic message detection with a deterministic watermark-based approach, ensures all messages are preserved in multi-step scenarios, provides write-side protection, and includes comprehensive tests.
Out of Scope Changes check	✅ Passed	All code changes are directly scoped to fixing the watermark-based message serialization: updates to definePlaygroundAPI loop sequencing, serializeNewMessagesInStep function signature, agent.saveStep previousStep parameter, and comprehensive test coverage.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

src/client/index.ts (1)

1230-1241: ⚡ Quick win

Add a monotonic watermark guard in saveStep.

This path trusts previousStep blindly. A mismatched previousStep can produce an invalid watermark and silently serialize the wrong delta. Fail fast when the previous count exceeds the current step’s cumulative count.

Suggested patch

-    const previousResponseMessageCount =
-      args.previousStep?.response.messages.length ?? 0;
+    const previousResponseMessageCount =
+      args.previousStep?.response.messages.length ?? 0;
+    const currentResponseMessageCount = args.step.response.messages.length;
+    if (previousResponseMessageCount > currentResponseMessageCount) {
+      throw new Error(
+        "Invalid previousStep: previous step has more response messages than current step. " +
+          "Pass the immediate prior step from the same generation loop.",
+      );
+    }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/client/index.ts` around lines 1230 - 1241, The saveStep path trusts
args.previousStep blindly which can yield an invalid watermark; before calling
serializeNewMessagesInStep (or before using previousResponseMessageCount),
compute the current step's cumulative response message count from args.step (or
the same source serializeNewMessagesInStep would use) and if
args.previousStep.response.messages.length > current cumulative count, throw or
return an error to fail fast; update the logic around
previousResponseMessageCount and the call site in
saveStep/serializeNewMessagesInStep to validate and reject mismatched
previousStep values to avoid silently serializing the wrong delta.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/client/index.ts`:
- Around line 1230-1241: The saveStep path trusts args.previousStep blindly
which can yield an invalid watermark; before calling serializeNewMessagesInStep
(or before using previousResponseMessageCount), compute the current step's
cumulative response message count from args.step (or the same source
serializeNewMessagesInStep would use) and if
args.previousStep.response.messages.length > current cumulative count, throw or
return an error to fail fast; update the logic around
previousResponseMessageCount and the call site in
saveStep/serializeNewMessagesInStep to validate and reject mismatched
previousStep values to avoid silently serializing the wrong delta.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: get-convex/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b6dc9a6b-a053-441e-be93-a466cc02a836

📥 Commits

Reviewing files that changed from the base of the PR and between 3cf5064 and c730935.

📒 Files selected for processing (5)

src/client/definePlaygroundAPI.ts
src/client/index.test.ts
src/client/index.ts
src/mapping.test.ts
src/mapping.ts

ianmacartney

Can you test that this works through the streamText / etc. flows? I don't see where streaming would pass in the previous step

Adds two tests demonstrating the cumulative response.messages behavior in AI SDK v6: without the watermark, step 2 re-saves all 5 cumulative messages (the bug); with previousStep.response.messages.length as the watermark, only the 2 new messages are saved (the fix). Also adds the CodeRabbit guard: saveStep throws if step.response.messages is shorter than previousStep.response.messages, which indicates a mismatched previousStep from a different generation loop.

robelest · 2026-05-27T18:31:24Z

streamText/generateText don't call the public saveStep; they track previousResponseMessageCount internally in start.ts. previousStep is only for callers managing their own step loop.

Added tests in mapping.test.ts showing both sides: without the watermark, step 2 re-saves all 5 cumulative messages; with previousStep, only the 2 new messages are saved.

pkg-pr-new · 2026-05-27T22:32:05Z

Open in StackBlitz

npm i https://pkg.pr.new/@convex-dev/agent@272

commit: 4ddce8e

ianmacartney

I had missed that we're already doing this in start.ts - Seth must have made that fix but not applied it to the other places we save steps.
Hopefully v7 doesn't break this

coderabbitai Bot reviewed May 24, 2026

View reviewed changes

ianmacartney reviewed May 26, 2026

View reviewed changes

ianmacartney approved these changes May 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(mapping): preserve tool messages in multi-step history (#200)#272

fix(mapping): preserve tool messages in multi-step history (#200)#272
robelest wants to merge 2 commits into
get-convex:mainfrom
robelest:fix/issue-200-serialize-new-messages-watermark

robelest commented May 24, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 24, 2026 •

edited

Loading

Walkthrough

Sequence Diagram(s)

Possibly Related PRs

Suggested Reviewers

Uh oh!

coderabbitai Bot left a comment

Uh oh!

ianmacartney left a comment

Uh oh!

robelest commented May 27, 2026 •

edited

Loading

Uh oh!

pkg-pr-new Bot commented May 27, 2026

Uh oh!

ianmacartney left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

robelest commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Behavior change

Relationship to #251

Tests

Uh oh!

coderabbitai Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Sequence Diagram(s)

Possibly Related PRs

Suggested Reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ianmacartney left a comment

Choose a reason for hiding this comment

Uh oh!

robelest commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pkg-pr-new Bot commented May 27, 2026

Uh oh!

ianmacartney left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

robelest commented May 24, 2026 •

edited

Loading

coderabbitai Bot commented May 24, 2026 •

edited

Loading

robelest commented May 27, 2026 •

edited

Loading