Skip to content

fix(voice/room_io): non-delta transcription final-stream races#1765

Open
tsushanth wants to merge 1 commit into
livekit:mainfrom
tsushanth:fix/non-delta-transcription-final-text
Open

fix(voice/room_io): non-delta transcription final-stream races#1765
tsushanth wants to merge 1 commit into
livekit:mainfrom
tsushanth:fix/non-delta-transcription-final-text

Conversation

@tsushanth

Copy link
Copy Markdown

Summary

ParticipantTranscriptionOutput (non-delta path, default user-transcription forwarding) had two race conditions that surfaced with Deepgram-style mid-utterance final bursts where multiple is_final chunks arrive back-to-back.

Bug 1 — next-segment capture overwrites the in-flight flush text

handleFlush() schedules flushTaskImpl, which reads this.latestText when it later writes the lk.transcription_final: \"true\" stream. When the next segment's first captureText() lands before that task runs, it overwrites latestText, so segment A's final stream publishes segment B's text.

Observed in the issue: the learner said "So, you made a big purchase with, a service. You tell me what exactly it was?" — segment A's final stream arrived carrying just "service." (the next chunk's fragment), and the client — keyed one entry per lk.segment_id, last write wins — replaced the full sentence.

Fix: snapshot latestText when the flush task is scheduled and pass it as a parameter to flushTaskImpl.

 protected handleFlush() {
   const currWriter = this.writer;
   this.writer = null;
-  this.flushTask = Task.from((controller) => this.flushTaskImpl(currWriter, controller.signal));
+  const textToFlush = this.latestText;
+  this.flushTask = Task.from((controller) =>
+    this.flushTaskImpl(currWriter, textToFlush, controller.signal),
+  );
 }

Bug 2 — resetState() wipes the captured text on a final-only first event

captureText() sets this.latestText = payload, then handleCaptureText() runs resetState() (which sets latestText = '') when a new segment starts. For any segment whose first event is already is_final — common with multi-final bursts — the final stream publishes an empty string. An empty write produces no chunk, so subscribers keyed on the segment never receive the final text.

Fix: restore latestText from the captured payload immediately after the resetState() call in handleCaptureText. The legacy class (ParticipantLegacyTranscriptionOutput) is unaffected — it tracks pushedText separately and resets that explicitly.

 if (!this.capturing) {
   this.resetState();
   this.capturing = true;
+  this.latestText = text;
 }

Test plan

  • Two new regression tests in agents/src/voice/room_io/_output.test.ts — both fail on main, both pass after the fix
  • pnpm vitest run agents/src/voice/room_io/_output.test.ts — 5/5 pass (3 existing + 2 new)
  • pnpm build:agents — clean
  • pnpm lint — no new warnings on touched files
  • pnpm format:check — clean
  • Changeset added (patch)
  • Legacy ParticipantLegacyTranscriptionOutput semantics untouched

Closes #1759

@CLAassistant

CLAassistant commented Jun 11, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@changeset-bot

changeset-bot Bot commented Jun 11, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: b310d8b

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 35 packages
Name Type
@livekit/agents Patch
@livekit/agents-plugin-anam Patch
@livekit/agents-plugin-assemblyai Patch
@livekit/agents-plugin-baseten Patch
@livekit/agents-plugin-bey Patch
@livekit/agents-plugin-cartesia Patch
@livekit/agents-plugin-cerebras Patch
@livekit/agents-plugin-deepgram Patch
@livekit/agents-plugin-did Patch
@livekit/agents-plugin-elevenlabs Patch
@livekit/agents-plugin-fishaudio Patch
@livekit/agents-plugin-google Patch
@livekit/agents-plugin-hedra Patch
@livekit/agents-plugin-hume Patch
@livekit/agents-plugin-inworld Patch
@livekit/agents-plugin-lemonslice Patch
@livekit/agents-plugin-liveavatar Patch
@livekit/agents-plugin-livekit Patch
@livekit/agents-plugin-minimax Patch
@livekit/agents-plugin-mistral Patch
@livekit/agents-plugin-mistralai Patch
@livekit/agents-plugin-neuphonic Patch
@livekit/agents-plugin-openai Patch
@livekit/agents-plugin-perplexity Patch
@livekit/agents-plugin-phonic Patch
@livekit/agents-plugin-resemble Patch
@livekit/agents-plugin-rime Patch
@livekit/agents-plugin-runway Patch
@livekit/agents-plugin-sarvam Patch
@livekit/agents-plugin-silero Patch
@livekit/agents-plugin-soniox Patch
@livekit/agents-plugin-tavus Patch
@livekit/agents-plugins-test Patch
@livekit/agents-plugin-trugen Patch
@livekit/agents-plugin-xai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

Open in Devin Review

Comment on lines 219 to 228
protected handleFlush() {
const currWriter = this.writer;
this.writer = null;
this.flushTask = Task.from((controller) => this.flushTaskImpl(currWriter, controller.signal));
// Snapshot latestText now so a subsequent captureText() for the next
// segment doesn't overwrite the text this flush is meant to publish.
const textToFlush = this.latestText;
this.flushTask = Task.from((controller) =>
this.flushTaskImpl(currWriter, textToFlush, controller.signal),
);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 currentId is not snapshotted in handleFlush, relying on await ordering for correctness

The PR correctly snapshots latestText in handleFlush() at _output.ts:224, but currentId is still read from this.currentId lazily inside flushTaskImplcreateTextWriter at _output.ts:247. This is safe in the normal capture→flush flow because handleCaptureText at _output.ts:186-188 awaits the pending flush task before calling resetState() (which generates a new currentId). However, setParticipant() at _output.ts:68-69 calls flush() then resetState() synchronously without awaiting the flush task. If setParticipant is called while a flush task is in-flight, the flush could pick up the new segment ID. This is a pre-existing issue not introduced by this PR, but it's worth noting the asymmetry: latestText is now robustly snapshotted while currentId relies on caller ordering.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Two races in ParticipantTranscriptionOutput on the non-delta path
(default user-transcription forwarding). Both surface with Deepgram-style
mid-utterance final bursts where multiple is_final chunks arrive
back-to-back.

1. handleFlush() read this.latestText from inside flushTaskImpl, so a
   captureText() for the next segment that landed before the flush task
   executed would overwrite the field and cause segment A's
   lk.transcription_final="true" stream to publish segment B's text
   (observed: a full sentence replaced by a follow-on fragment). Snapshot
   latestText when the task is scheduled and pass it as an argument.

2. When the first event for a fresh segment was already is_final,
   captureText set latestText = payload, then handleCaptureText called
   resetState() which cleared latestText back to "". The subsequent
   final stream then published an empty string (no chunk → subscribers
   keyed on lk.segment_id never received the text). Restore latestText
   from the captured payload immediately after the resetState() call.

Adds two regression tests via Object.create that fail without the
production change.

Closes livekit#1759
@tsushanth tsushanth force-pushed the fix/non-delta-transcription-final-text branch from f53612b to b310d8b Compare June 13, 2026 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Non-delta transcription output publishes wrong/empty text on the final stream (latestText race in ParticipantTranscriptionOutput)

2 participants