fix(mapping): preserve tool messages in multi-step history (#200)#272
fix(mapping): preserve tool messages in multi-step history (#200)#272robelest wants to merge 2 commits into
Conversation
…et-convex#200) step.response.messages is cumulative across steps in AI SDK v6. The old heuristic guessed at "what's new" by slicing the last 1-2 entries, which silently dropped messages in any step that produced 3+ response messages (a step that interleaves text + tool-call + tool-result, for instance). Dropped messages are never written to Convex; the next request loads a malformed history and Gemini rejects it with its strict functionCall/functionResponse ordering validation. The internal generation loop in startGeneration already tracks the correct watermark via previousResponseMessageCount. The bug only affects callers that hit serializeNewMessagesInStep directly: agent.saveStep (manual generation loops) and the playground API. Replace the heuristic with an explicit watermark: - src/mapping.ts: serializeNewMessagesInStep takes a required `previousResponseMessageCount: number` (no default). Defaulting it would have turned a contract violation into a silent duplication bug — the exact failure mode this PR fixes. The synthetic empty assistant fallback for an empty slice is preserved and now documented (downstream addMessages needs each step to contribute at least one row to anchor an order slot). - src/client/index.ts: agent.saveStep takes an optional `previousStep?: StepResult<TOOLS>` instead of a raw count. Self- documenting at the call site, and harder to pass the wrong number. The method JSDoc points at the arg's contract. - src/client/definePlaygroundAPI.ts: rewritten from Promise.all((await steps).map(...)) to a sequential for-loop so the running watermark can accumulate across steps. Tool-call round- trips dominate the cost; serialization is in-memory. - src/mapping.test.ts: six tests covering first/middle/multi-message steps, content-shape assertions (not just role/length), a regression case for the actually-broken shape (text + tool-call + tool-result in one step, where the old slice(-2) would have dropped the leading text), the empty-slice fallback, and pinned behavior for caller-drift (watermark > messages.length). - src/client/index.test.ts: two caller-side tests through the public saveStep API using mockModel contentSteps — one asserts that passing previousStep saves each step's messages exactly once, the other asserts that omitting it duplicates prior messages. Note: the handoff suggested step.request.messages as the baseline — that field doesn't exist in AI SDK v6 (step.request is LanguageModelRequestMetadata which is {body?: unknown}).
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: get-convex/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR replaces a heuristic for selecting "new" step messages with explicit watermarking. serializeNewMessagesInStep now accepts previousResponseMessageCount and slices step.response.messages to serialize only newly produced messages, emitting a synthetic empty assistant message when necessary. Agent.saveStep accepts an optional previousStep to compute and validate the watermark, and generateText was changed to process steps sequentially while threading the watermark. Unit and integration tests cover delta serialization and replay behavior with and without watermarking. Sequence Diagram(s)sequenceDiagram
participant Client
participant generateText
participant serializeNewMessagesInStep
participant Agent.saveStep
participant ConvexDB
Client->>generateText: invoke generateText(steps)
generateText->>serializeNewMessagesInStep: serialize step with previousResponseMessageCount
serializeNewMessagesInStep->>generateText: return MessageDoc[]
generateText->>Agent.saveStep: saveStep(step, previousStep?)
Agent.saveStep->>serializeNewMessagesInStep: compute & pass previousResponseMessageCount
Agent.saveStep->>ConvexDB: persist MessageDocs
ConvexDB-->>Agent.saveStep: ack
Agent.saveStep-->>generateText: saved messages
generateText-->>Client: aggregated messages
Possibly Related PRs
Suggested Reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/client/index.ts (1)
1230-1241: ⚡ Quick winAdd a monotonic watermark guard in
saveStep.This path trusts
previousStepblindly. A mismatchedpreviousStepcan produce an invalid watermark and silently serialize the wrong delta. Fail fast when the previous count exceeds the current step’s cumulative count.Suggested patch
- const previousResponseMessageCount = - args.previousStep?.response.messages.length ?? 0; + const previousResponseMessageCount = + args.previousStep?.response.messages.length ?? 0; + const currentResponseMessageCount = args.step.response.messages.length; + if (previousResponseMessageCount > currentResponseMessageCount) { + throw new Error( + "Invalid previousStep: previous step has more response messages than current step. " + + "Pass the immediate prior step from the same generation loop.", + ); + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/client/index.ts` around lines 1230 - 1241, The saveStep path trusts args.previousStep blindly which can yield an invalid watermark; before calling serializeNewMessagesInStep (or before using previousResponseMessageCount), compute the current step's cumulative response message count from args.step (or the same source serializeNewMessagesInStep would use) and if args.previousStep.response.messages.length > current cumulative count, throw or return an error to fail fast; update the logic around previousResponseMessageCount and the call site in saveStep/serializeNewMessagesInStep to validate and reject mismatched previousStep values to avoid silently serializing the wrong delta.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/client/index.ts`:
- Around line 1230-1241: The saveStep path trusts args.previousStep blindly
which can yield an invalid watermark; before calling serializeNewMessagesInStep
(or before using previousResponseMessageCount), compute the current step's
cumulative response message count from args.step (or the same source
serializeNewMessagesInStep would use) and if
args.previousStep.response.messages.length > current cumulative count, throw or
return an error to fail fast; update the logic around
previousResponseMessageCount and the call site in
saveStep/serializeNewMessagesInStep to validate and reject mismatched
previousStep values to avoid silently serializing the wrong delta.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: get-convex/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b6dc9a6b-a053-441e-be93-a466cc02a836
📒 Files selected for processing (5)
src/client/definePlaygroundAPI.tssrc/client/index.test.tssrc/client/index.tssrc/mapping.test.tssrc/mapping.ts
ianmacartney
left a comment
There was a problem hiding this comment.
Can you test that this works through the streamText / etc. flows? I don't see where streaming would pass in the previous step
Adds two tests demonstrating the cumulative response.messages behavior in AI SDK v6: without the watermark, step 2 re-saves all 5 cumulative messages (the bug); with previousStep.response.messages.length as the watermark, only the 2 new messages are saved (the fix). Also adds the CodeRabbit guard: saveStep throws if step.response.messages is shorter than previousStep.response.messages, which indicates a mismatched previousStep from a different generation loop.
|
Added tests in |
commit: |
ianmacartney
left a comment
There was a problem hiding this comment.
I had missed that we're already doing this in start.ts - Seth must have made that fix but not applied it to the other places we save steps.
Hopefully v7 doesn't break this
Problem
With Gemini models, multi-step tool conversations intermittently fail on the next request with
function call turn must come immediately after a user turn or after a function response turn. Fixes #200.serializeNewMessagesInStepdecided which messages were new in a step by slicing the last 1-2 entries ofstep.response.messages. In AI SDK v6 that array is cumulative across steps, so the guess silently dropped messages whenever a single step produced 3 or more (text + tool-call + tool-result, for example). Dropped messages are never written to Convex; the next request loads a history with afunctionCallmissing itsfunctionResponse, which Gemini rejects.The internal generation loop in
startGenerationalready tracks the correct watermark, sostreamText/generateTextare unaffected. The bug only reached callers that hitserializeNewMessagesInStepdirectly:agent.saveStepand the playground API.Fix
Replace the heuristic with an explicit watermark:
serializeNewMessagesInSteptakes a requiredpreviousResponseMessageCount. Defaulting it would turn a contract violation into silent duplication, the same class of bug.agent.saveSteptakes an optionalpreviousStepand derives the count from it. Self-documenting and harder to pass the wrong number.Behavior change
agent.saveStepis public. If you call it in a manual multi-step loop, pass the newpreviousStepargument so only the new messages are saved. Single-step usage needs no change.Relationship to #251
#251 targets the same Gemini ordering error from the read side: it rewrites
filterOutOrphanedToolMessagesinto a cascade-consistent three-pass filter, so dropping an assistant message also drops its orphaned tool-result instead of leaving a gap. That keeps the history consistent at fetch time.This PR is the write-side counterpart: it stops the gaps from being written in the first place, so nothing needs dropping later. The two fix different layers and are independent; neither subsumes the other.
Tests
npm test(272),npm run build,npm run typecheck,npm run lint. New coverage exercises the slicing directly and thesaveSteppath end-to-end (a test asserts that omittingpreviousStepduplicates, proving the watermark matters). Manually verified against live Gemini in the playground: a multi-stepgetGeocoding/getWeatherflow saves every tool-call paired with its result.