Skip to content

fix: handle id mismatch on multi-message assistant turns#393

Open
dariusz-did wants to merge 4 commits intomainfrom
fix/multi-message-turn-id-handling
Open

fix: handle id mismatch on multi-message assistant turns#393
dariusz-did wants to merge 4 commits intomainfrom
fix/multi-message-turn-id-handling

Conversation

@dariusz-did
Copy link
Copy Markdown
Contributor

@dariusz-did dariusz-did commented May 8, 2026

Background — how the SDK builds an assistant message

The backend streams an assistant reply word-by-word over two kinds of events:

  • Partial — one chunk of the message in flight. Each chunk has a sequence index so the SDK can re-order them.
  • Answer — the full final string for the message, sent once at the end.

The SDK keeps two structures in processChatEvent (message-queue.ts):

items.messages = [   // what the UI eventually sees
  { id: "msg-1", role: "assistant", content: "I'll check the weather", parts: [...] }
]

chatEventQueue = {   // buffer used to assemble content from partials
  0: "I'll", 1: " check", 2: " the", 3: " weather"
}

getMessageContent(chatEventQueue) concatenates the buffer (0 + 1 + 2 + …). When that string changes (or an Answer arrives) the SDK updates the last message and notifies the UI via onNewMessage.

The whole routing decision boils down to one boolean: is this event continuing the same message, or is it a new one? That was decided by:

const isNewAssistantMessage = data.id && last?.role === 'assistant' && last.id !== data.id

i.e. compare the new event's id against the last assistant message in state.

The three bugs we hit

Bug A — backend uses two different ids for the same message

The orchestrator publishes chat/partial keyed by sequence.id (e.g. 588571514) but chat/answer keyed by chat_item.id (e.g. item_b997ef5e). For one logical message the SDK saw:

Partial { id: "588571514", "I'll" }
Partial { id: "588571514", " check" }
…
Answer  { id: "item_b997ef5e", "I'll check the weather…" }
                ↑ different id, same message

So isNewAssistantMessage flipped to true on the Answer and the SDK pushed a brand-new entry instead of finalising the streaming one. Result: the message rendered twice — and because the duplicate had no streamed parts yet, the UI hid it until the avatar video finished, making the post-tool reply look like it was delayed by 20s.

Bug B — the greeting duplicated

UI calls agent.speak("Hi! I'm My Agent…") so the greeting appears in chat instantly without waiting for the backend round-trip. speak() pushes the message locally with a random id. The backend then also streams the same greeting over chat/partial with its own id. Same id-mismatch logic fires, same duplicate result — two greetings side by side.

Bug C — chatEventQueue reset was broken by closure capture

let chatEventQueue = {}
const clearQueue = () => (chatEventQueue = {})  // assigns a NEW object to outer scope

processChatEvent receives chatEventQueue as a parameter — a captured reference to the original object. When clearQueue() reassigns the outer-scope variable to {}, the parameter inside processChatEvent still points at the old object. "Clearing" did nothing for the function that actually reads/writes the buffer, so finished content from the previous message could leak into the next one.

The fix — change by change

1. speak() pushes with id: '' (agent-manager/index.ts)

- id: getRandom(),
+ id: '',

Empty id is a signal: "this assistant entry is local — adopt the backend's id when it starts streaming the same content."

2. Greeting reclaim in processChatEvent

// Adopt the backend's id onto a locally-pushed greeting (empty id) so the stream merges in.
if (event === Partial && data.id && last?.role === 'assistant' && !last.id) {
  last.id = data.id
  last.content = ''
  last.parts = []
}

When the first Partial of the greeting arrives and the last message has an empty id, take the backend id, clear the body, and let the normal streaming path repopulate it. One greeting, not two.

3. isStreamingThisTurn heuristic (Bug A)

// Mid-stream `Answer` (queue non-empty) finalises the streaming message regardless of id.
const isStreamingThisTurn = Object.keys(chatEventQueue).length > 0
const isNewAssistantMessage =
  !!data.id &&
  last?.role === 'assistant' &&
  last.id !== data.id &&
  (event === Partial || !isStreamingThisTurn)

We treat the event as a new message only when:

  • it's a Partial (Partials always open a new message), or
  • it's an Answer and the buffer is empty (no streaming was happening this turn).

chatEventQueue is the proxy for "we're mid-stream":

  • Fluent (V2): every message goes through Partials first, so by the time Answer arrives the buffer is full → isStreamingThisTurn=true → the Answer is treated as the finaliser of the streaming message, even if its id is chat_item.id rather than sequence.id.
  • Clips/talks (V1): some flows emit only consecutive Answers (no Partials). Buffer stays empty → isStreamingThisTurn=false → an Answer with a new id is correctly treated as a new message.

4. clearQueue() after every Answer

// Reset the buffer so the next turn's `isStreamingThisTurn` check starts fresh.
if (event === Answer) {
  clearQueue()
}

The isStreamingThisTurn heuristic relies on the buffer being empty between turns. We have to clear it explicitly when a turn closes — otherwise the V1 multi-Answer path would see leftover state and misclassify the next Answer.

5. Push messages with parts already populated

  currentMessage = {
    id: …,
    content: initialContent,
-   parts: [],
+   parts: parseMessagePartsMemo(initialContent),
    …
  }

Previously a fresh message was pushed with parts: []. The follow-up update only ran if currentMessage.content !== messageContent, which is false on the very first Partial (both equal data.content). The UI would briefly see an assistant message with empty parts and hide it. Populating parts at push time means the UI never observes a parts-empty assistant entry.

6. clearQueue mutates instead of reassigning (Bug C)

- let chatEventQueue: ChatEventQueue = {}
- const clearQueue = () => (chatEventQueue = {})
+ const chatEventQueue: ChatEventQueue = {}
+ const clearQueue = () => {
+   for (const key of Object.keys(chatEventQueue)) {
+     delete chatEventQueue[key as keyof ChatEventQueue]
+   }
+ }

Mutating the same object means every captured reference (including the parameter inside processChatEvent) observes the reset.

Bug → fix map

Bug Fix
A — Answer id mismatch (chat_item.id vs sequence.id) Changes 3 + 4
B — Greeting duplicates Changes 1 + 2
C — Content leak between messages Change 6
UI hides parts-empty message Change 5

Test plan

  • yarn test — 414 tests passing, including existing multi-message turn and greeting suites.
  • Live test against dev: greeting renders once; pre-tool ack and post-tool reply both stream live alongside the avatar video; no duplicates.

Backends emit chat/partial keyed by sequence.id but chat/answer keyed by
chat_item.id, so the SDK was treating each final answer as a brand-new
message and rendering duplicates of the message it had just streamed.
Greetings pushed locally via speak() also collided with the backend stream.

Changes:
- speak() pushes assistant messages with an empty id; the message-queue
  adopts the backend's id on the first partial.
- Mid-stream Answer (queue non-empty) always finalises the streaming
  message, ignoring an id mismatch from chat_item.id vs sequence.id.
- Push messages with parts already populated so the UI never receives a
  parts-empty assistant entry.
- clearQueue mutates the queue object so the closure inside processChatEvent
  observes the reset (previously reassigned the outer-scope variable; the
  inner reference still pointed at the old object).
- Clear the buffer after each Answer so the next turn's "is streaming?"
  check starts fresh (keeps clips/talks multi-Answer flow intact).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant