Skip to content

Non-delta transcription output publishes wrong/empty text on the final stream (latestText race in ParticipantTranscriptionOutput) #1759

@jvproduct

Description

@jvproduct

Describe the bug

Affected: @livekit/agents 1.4.4, still present on main
(agents/src/voice/room_io/_output.ts).

Setup: voice.AgentSession with Deepgram STT (interimResults: true), user
transcription forwarded to the room (non-delta path, default
transcriptionEnabled). Client renders via useTranscriptions
(@livekit/components-react 2.9.21).

Bug 1: final stream races with the next segment's capture

In ParticipantTranscriptionOutput (non-delta), handleFlush() schedules
flushTaskImpl, which later writes this.latestText to a new stream with
lk.transcription_final: "true". But captureText() assigns
this.latestText = payload before handleCaptureText() awaits the pending
flush task. When the STT emits several is_final chunks in a burst (normal for
Deepgram mid-utterance finals), the next chunk's capture overwrites
latestText before the previous segment's flush has written it, so segment A's
final stream is published carrying segment B's text.

Observed live: the learner said "So, you made a big purchase with, a service.
You tell me what exactly it was?" The first segment's interims displayed
correctly, then its final stream arrived carrying service. (the next chunk's
text), and the client — which keys one entry per lk.segment_id and takes the
last write — replaced the full sentence with the fragment.

Suggested fix: snapshot the text when the flush is scheduled and pass it as an
argument instead of reading the shared field inside the task:

tsprotected handleFlush() {
const currWriter = this.writer;
this.writer = null;
const textToFlush = this.latestText;
this.flushTask = Task.from((controller) =>
this.flushTaskImpl(currWriter, textToFlush, controller.signal),
);
}

Bug 2: resetState() wipes the first capture's text

captureText() sets this.latestText = payload, then handleCaptureText()
runs resetState() (which sets latestText = '') when a new segment starts.
So for any segment whose first event is already final (no prior interims —
again common with multi-final bursts), the final stream publishes an empty
string. An empty write produces no chunk, so subscribers keyed on the segment
never receive the final text at all.

Suggested fix: re-assign the captured text after the reset in
handleCaptureText, or stop clearing latestText in resetState().

Compounding client-side behavior (components-react)

setupTextStream (components-core 0.12.13) keeps one entry per
lk.segment_id, overwrites text with each new stream's payload, and never
updates streamInfo — so the corrupted final-stream text silently replaces
correct interim text, and lk.transcription_final permanently reflects the
first stream received for the segment (i.e. "false"). Happy to file that
separately against components-js if useful.

Relevant log output

No response

Describe your environment

System:
OS: macOS 15.7.7
CPU: (10) arm64 Apple M1 Max
Memory: 434.92 MB / 32.00 GB
Shell: 5.9 - /bin/zsh
Binaries:
Node: 22.22.3 - /Users/jonas/.nvm/versions/node/v22.22.3/bin/node
Yarn: 1.22.22 - /Users/jonas/.nvm/versions/node/v22.14.0/bin/yarn
npm: 10.9.8 - /Users/jonas/.nvm/versions/node/v22.22.3/bin/npm
pnpm: 11.5.2 - /Users/jonas/.nvm/versions/node/v22.14.0/bin/pnpm

Minimal reproducible example

No response

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions