feat(eot): add audio models AGT-2919 by chenghao-mou · Pull Request #1719 · livekit/agents-js

chenghao-mou · 2026-06-05T00:10:34Z

add audio eot model and local inference support, deprecating silero and turn detector plugins## Description

Changes Made

Adds streaming audio end-of-turn detection. Single user-facing AudioTurnDetector that selects between two backends:

turn-detector
turn-detector-mini

On cloud transport error or predict_end_of_turn timeout, the session swaps to mini/local for the rest of the stream (sticky per session, one warning per failure mode).
Local failures emit the default 1.0 prediction and retry on the next turn.

A user-set unlikely_threshold is scaled multiplicatively against the cloud default so the operating point survives a fallback.

Pre-Review Checklist

Build passes: All builds (lint, typecheck, tests) pass locally
AI-generated code reviewed: Removed unnecessary comments and ensured code quality
Changes explained: All changes are properly documented and justified above
Scope appropriate: All changes relate to the PR title, or explanations provided for why they're included
Video demo: A small video demo showing changes works as expected and did not break any existing functionality using Agent Playground (if applicable)

Testing

Automated tests added/updated (if applicable)
All tests pass
Make sure both restaurant_agent.ts and realtime_agent.ts work properly (for major changes)

Additional Notes

Python PR: livekit/agents#4722

Note to reviewers: Please ensure the pre-review checklist is completed before starting your review.

changeset-bot · 2026-06-05T00:10:40Z

🦋 Changeset detected

Latest commit: 03a6b3e

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

add audio eot model and local inference support, deprecating silero and turn detector plugins

…frame The AudioFrame emitted on START_OF_SPEECH / END_OF_SPEECH sliced off the prefix-padding samples but still reported `samplesPerChannel = speechBufferIndex`, so the frame's metadata claimed more samples than its data contained and downstream consumers (STT, transcription) lost the pre-roll context the buffer machinery is designed to preserve. Slice from 0 instead so data length matches samplesPerChannel and the prefix-padding pre-roll is delivered, matching the Python original. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… to version Rename the unreleased `inference.AudioTurnDetector` to `inference.TurnDetector` and replace its `model` constructor option with `version` (`'v1' | 'v1-mini'`). The `version` is the constructor knob only; the `model` field/getter is kept and now holds the full model name (`turn-detector-v1` / `turn-detector-v1-mini`), which telemetry/billing read via `detector.model` (metric `modelName` → `EOTModelUsage.model` → remote sessions) unchanged. Mirrors the upstream Python rename. The private base peers are renamed to the modality-agnostic streaming scheme: `BaseStreamingTurnDetector`, `BaseStreamingTurnDetectorStream`, `StreamingTurnDetectionTransport`, `BaseStreamingTurnDetectorCallbacks`, `BaseStreamingTurnDetectorOptions` (resolving the public-opts `TurnDetectorOptions` collision). Adds `TurnDetectorVersion`; keeps `TurnDetectorModel` with updated values. Also folds in in-flight AGT-2520 EOU work: VAD slow-inference guard fix, `turnDetection: null` opt-out preserved distinctly from `undefined`, silero `VAD.load()` delegating to `inference.VAD({ model: 'silero' })` for 16 kHz, and a `LocalTransport` cleanup refactor. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add a copy of the turn detection model license, call it out in the root README alongside the Apache-2.0 license, and annotate it in REUSE.toml to keep the REUSE-3.2 lint green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

main dropped the flat `export *` re-exports (AgentSession, tool, logMetrics) in favor of namespace-only exports, and does not have the 1.5.0 Toolset API (Agent.create / array-style tools). Adapt basic_agent.ts to main's namespace conventions (new voice.Agent, object tools, voice.*/metrics.* prefixes) while preserving the multimodal-EOU session config. Regenerate pnpm-lock.yaml against the rebased package.json set. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

toubatbrian · 2026-06-12T23:50:33Z

-  prewarm: async (proc: JobProcess) => {
-    proc.userData.vad = await silero.VAD.load();
-  },


Why do we removed these?

toubatbrian · 2026-06-12T23:50:38Z

      // to use realtime model, replace the stt, llm, tts and vad with the following
      // llm: new openai.realtime.RealtimeModel(),
      userData: userdata,
-      turnDetection: new livekit.turnDetector.EnglishModel(),


toubatbrian · 2026-06-12T23:52:37Z

+  }
+
+  /**
+   * Speaking-guard wrapper for the bounce-EOU task, mirroring Python's


Should we remove the comments phrasing that references "mirroring pythons", etc

toubatbrian · 2026-06-12T23:53:39Z

+    // A different stream means a fresh request lifecycle: drop any held
+    // prediction future and re-arm so the adopting recognition starts its own
+    // request on the next VAD event.


claude tends to add a bunch of inline comments, would be nice to clean them up, only left those that are necessary

…dal-EOU # Conflicts: # agents/src/inference/utils.ts # agents/src/voice/agent_activity.ts # agents/src/voice/audio_recognition.ts

devin-ai-integration

Devin Review found 1 new potential issue.

devin-ai-integration · 2026-06-13T01:10:19Z

    "LIVEKIT_INFERENCE_URL",
    "LIVEKIT_OUTBOUND_TRUNK_ID",
    "LIVEKIT_URL",
+    "LIVEKIT_WORKER_TOKEN",


🟡 Duplicate LIVEKIT_WORKER_TOKEN entry in turbo.json global env passthrough

The PR adds LIVEKIT_WORKER_TOKEN at line 45 (after LIVEKIT_URL), but the original file already has it at line 49 (after LIVEKIT_AGENT_NAME). This produces a duplicate entry in the globalPassThroughEnv array. While Turbo likely deduplicates or ignores redundant entries at runtime, the duplicate is unnecessary noise and potentially confusing.

Suggested change

"LIVEKIT_WORKER_TOKEN",

Was this helpful? React with 👍 or 👎 to provide feedback.

chenghao-mou mentioned this pull request Jun 5, 2026

feat(eot): add audio eot model support #1613

Closed

8 tasks

This comment was marked as resolved.

Sign in to view

chenghao-mou changed the title ~~feat(eot): add audio eot model support~~ feat(eot): add audio models AGT-2919 Jun 7, 2026

This comment was marked as resolved.

Sign in to view

chenghao-mou requested a review from a team June 10, 2026 08:56

This comment was marked as resolved.

Sign in to view

chenghao-mou force-pushed the feat/AGT-2520-multimodal-EOU branch from ed2e02f to 3e2fb33 Compare June 11, 2026 14:06

chenghao-mou changed the base branch from main to 1.5.0 June 11, 2026 14:06

This comment was marked as resolved.

Sign in to view

toubatbrian changed the base branch from 1.5.0 to main June 11, 2026 17:51

toubatbrian changed the base branch from main to 1.5.0 June 11, 2026 17:54

chenghao-mou mentioned this pull request Jun 12, 2026

feat(barge-in): add default threshold support and drop http transport #1785

Merged

chenghao-mou and others added 13 commits June 12, 2026 13:23

feat(eot): add audio eot model support

d5c0dc3

add audio eot model and local inference support, deprecating silero and turn detector plugins

Create busy-aliens-wink.md

2198460

more clean up and refactoring

172a666

more refactoring and clean up

7950f00

more refactoring and clean up

5bb5b11

address comment

8d047b5

address comment

36477b9

rename backend to model

7c0929d

address comments

52f1b24

update default parsing and read from cloud

48792bd

reformat

b7e2647

fix port misses

c4f332b

chenghao-mou and others added 11 commits June 12, 2026 13:23

add missing worker token

9875be5

sync new commits: gate eot event + minor refacor

904fc05

port Python FSM drop

529cb5d

remove TurnDetector reference

15a4e59

address comment

b066bba

restore comment

5d86ef8

update examples

545f0e3

handle promise rejection

b7e0388

chenghao-mou force-pushed the feat/AGT-2520-multimodal-EOU branch from d3ee830 to 7e24939 Compare June 12, 2026 12:33

chenghao-mou changed the base branch from 1.5.0 to main June 12, 2026 12:33

chenghao-mou added 2 commits June 12, 2026 14:14

fix multimodal mentions

73a626f

clean up examples

e2553c9

This comment was marked as resolved.

Sign in to view

reformat

4140bce

This comment was marked as resolved.

Sign in to view

toubatbrian reviewed Jun 12, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into feat/AGT-2520-multimo…

03a6b3e

…dal-EOU # Conflicts: # agents/src/inference/utils.ts # agents/src/voice/agent_activity.ts # agents/src/voice/audio_recognition.ts

devin-ai-integration Bot reviewed Jun 13, 2026

View reviewed changes

Conversation

chenghao-mou commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes Made

Pre-Review Checklist

Testing

Additional Notes

Uh oh!

changeset-bot Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

toubatbrian Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

toubatbrian Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

toubatbrian Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

toubatbrian Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chenghao-mou commented Jun 5, 2026 •

edited

Loading

changeset-bot Bot commented Jun 5, 2026 •

edited

Loading