fix(sarvam): use provider speech timing for eos by rosetta-livekit-bot[bot] · Pull Request #1763 · livekit/agents-js

rosetta-livekit-bot · 2026-06-11T17:00:26Z

Summary

use Sarvam streaming speech_start/speech_end fields for final transcript timing
apply startTimeOffset to provider-relative speech timings
keep END_OF_SPEECH bare and delay it briefly so final transcripts normally arrive first

Tests

pnpm exec prettier --write plugins/sarvam/src/stt.ts
pnpm --filter @livekit/agents-plugin-sarvam lint
pnpm build:agents
pnpm --filter @livekit/agents-plugins-test --filter @livekit/agents-plugin-silero build
pnpm --filter @livekit/agents-plugin-openai build
pnpm --filter @livekit/agents-plugin-sarvam build
pnpm test -- plugins/sarvam/src/stt.test.ts

Original PR description

Problem

The Sarvam streaming STT plugin tried to manufacture an audio-relative speech-end time from two sources that don't actually provide one:

The VAD END_SPEECH event's occured_at field — which logging proved is a wall-clock Unix epoch, not an audio-stream offset, so it was rejected 100% of the time by a range-check heuristic.
A local send-clock counter (_audio_position) that counts audio uploaded, not processed — biased and fabricated.

Sarvam's streaming socket genuinely sends no usable word timing (no timestamps array; speech_start/speech_end come back null), so all this machinery produced misleading timestamps.

Change

Aligned Sarvam with how every other STT plugin (Deepgram, AssemblyAI, Google, Azure…) handles this:

END_OF_SPEECH is now emitted bare — no alternatives/end_time. Previously it carried an empty-text SpeechData with a fabricated end_time + unused speech_end_wall_time metadata.
FINAL_TRANSCRIPT timing comes only from the provider — start_time/end_time read from speech_start/speech_end, falling back to 0.0. The voice pipeline then uses wall-clock for EOU timing (standard behavior for providers without streaming word timing).
Deleted the dead machinery: _interpret_signal_time, _resolved_speech_end, the _audio_position send-clock (+ its hot-loop increment), _utterance_server_speech_end, _utterance_speech_end_audio_pos, _utterance_speech_end_wall, and the require_end_time param.

changeset-bot · 2026-06-11T17:00:32Z

🦋 Changeset detected

Latest commit: 4baec50

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 34 packages

Name	Type
@livekit/agents-plugin-sarvam	Patch
@livekit/agents	Patch
@livekit/agents-plugin-anam	Patch
@livekit/agents-plugin-assemblyai	Patch
@livekit/agents-plugin-baseten	Patch
@livekit/agents-plugin-bey	Patch
@livekit/agents-plugin-cartesia	Patch
@livekit/agents-plugin-cerebras	Patch
@livekit/agents-plugin-deepgram	Patch
@livekit/agents-plugin-elevenlabs	Patch
@livekit/agents-plugin-fishaudio	Patch
@livekit/agents-plugin-google	Patch
@livekit/agents-plugin-hedra	Patch
@livekit/agents-plugin-hume	Patch
@livekit/agents-plugin-inworld	Patch
@livekit/agents-plugin-lemonslice	Patch
@livekit/agents-plugin-liveavatar	Patch
@livekit/agents-plugin-livekit	Patch
@livekit/agents-plugin-minimax	Patch
@livekit/agents-plugin-mistral	Patch
@livekit/agents-plugin-mistralai	Patch
@livekit/agents-plugin-neuphonic	Patch
@livekit/agents-plugin-openai	Patch
@livekit/agents-plugin-perplexity	Patch
@livekit/agents-plugin-phonic	Patch
@livekit/agents-plugin-resemble	Patch
@livekit/agents-plugin-rime	Patch
@livekit/agents-plugin-runway	Patch
@livekit/agents-plugin-silero	Patch
@livekit/agents-plugin-soniox	Patch
@livekit/agents-plugin-tavus	Patch
@livekit/agents-plugin-trugen	Patch
@livekit/agents-plugin-xai	Patch
@livekit/agents-plugins-test	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

devin-ai-integration

Devin Review found 2 potential issues.

devin-ai-integration · 2026-06-11T17:04:13Z

+                } else if (this.#sendFinalTranscript(td, putMessage)) {
+                  this.#finalReceivedForUtterance = true;
+                }


🟡 Missing #eosEmittedForUtterance guard allows FINAL_TRANSCRIPT after END_OF_SPEECH

When the EOS fallback timer fires (because the server sent END_SPEECH but no transcript arrived within 1000ms), #pendingEos is set to false and #eosEmittedForUtterance is set to true. If a late transcript data message subsequently arrives, the code at line 909 takes the else if branch (since #pendingEos is false) and calls #sendFinalTranscript without checking #eosEmittedForUtterance. This emits a FINAL_TRANSCRIPT event after END_OF_SPEECH was already emitted, violating the expected event ordering (START_OF_SPEECH → FINAL_TRANSCRIPT → END_OF_SPEECH). Downstream in audio_recognition.ts:837-897, this late FINAL_TRANSCRIPT updates audioTranscript, triggers preemptive generation, and runs EOU detection again — all after the user turn was already committed at audio_recognition.ts:1047.

Note that #tryCommitUtterance at plugins/sarvam/src/stt.ts:661 correctly guards against this with this.#eosEmittedForUtterance, but the direct #sendFinalTranscript call path at line 909 does not.

Suggested change

} else if (this.#sendFinalTranscript(td, putMessage)) {

this.#finalReceivedForUtterance = true;

}

} else if (!this.#eosEmittedForUtterance && this.#sendFinalTranscript(td, putMessage)) {

this.#finalReceivedForUtterance = true;

}

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-06-11T17:04:15Z


 const SAMPLE_RATE = 16000;
 const NUM_CHANNELS = 1;
+const EOS_FALLBACK_TIMEOUT = 1000;


🚩 EOS_FALLBACK_TIMEOUT of 1000ms may need tuning

The EOS_FALLBACK_TIMEOUT constant is set to 1000ms at line 38. This is the maximum time the system will wait for a transcript after receiving END_SPEECH before emitting END_OF_SPEECH without one. If Sarvam's server processing latency is sometimes >1000ms (e.g., for longer utterances or under load), the fallback could fire prematurely, causing the transcript to arrive after END_OF_SPEECH (which is the scenario in BUG-0001). The Sarvam STT metrics logging at line 894-896 captures processing_latency — monitoring this in production would help determine if the 1000ms timeout is appropriate.

Was this helpful? React with 👍 or 👎 to provide feedback.

fix(sarvam): use provider speech timing for eos

4baec50

rosetta-livekit-bot Bot requested a review from tinalenguyen June 11, 2026 17:00

devin-ai-integration Bot reviewed Jun 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sarvam): use provider speech timing for eos#1763

fix(sarvam): use provider speech timing for eos#1763
rosetta-livekit-bot[bot] wants to merge 1 commit into
1.5.0from
vatting-twilled-fifes

rosetta-livekit-bot Bot commented Jun 11, 2026 •

edited

Loading

Uh oh!

changeset-bot Bot commented Jun 11, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Jun 11, 2026

Uh oh!

devin-ai-integration Bot Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

rosetta-livekit-bot Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

Problem

Change

Uh oh!

changeset-bot Bot commented Jun 11, 2026

🦋 Changeset detected

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

rosetta-livekit-bot Bot commented Jun 11, 2026 •

edited

Loading