feat(eot): add audio models AGT-2520 by chenghao-mou · Pull Request #4722 · livekit/agents

chenghao-mou · 2026-02-05T15:09:40Z

Adds streaming audio end-of-turn detection. Single user-facing AudioTurnDetector that selects between two backends:

turn-detector
turn-detector-mini

On cloud transport error or predict_end_of_turn timeout, the session swaps to mini/local for the rest of the stream (sticky per session, one warning per failure mode).
Local failures emit the default 1.0 prediction and retry on the next turn.

A user-set unlikely_threshold is scaled multiplicatively against the cloud default so the operating point survives a fallback.

hsjun99 · 2026-02-25T01:00:31Z

@chenghao-mou Excited to see this! A couple of questions:

Will the multimodal EOT model be publicly accessible via model weights or agent-gateway.livekit.cloud, or in some other way?
Any rough timeline for when MultiModalTurnDetector gets fully wired up?

chenghao-mou · 2026-02-25T10:07:07Z

@chenghao-mou Excited to see this! A couple of questions:

Will the multimodal EOT model be publicly accessible via model weights or agent-gateway.livekit.cloud, or in some other way?

Any rough timeline for when MultiModalTurnDetector gets fully wired up?

Thanks for your patience! We don't have an official decision or timeline yet, but hopefully I can get it ready within a month or two.

# Conflicts: # examples/voice_agents/async_tool_agent.py # examples/voice_agents/basic_agent.py # livekit-agents/livekit/agents/voice/agent_activity.py # tests/test_agent_session.py

… version Rename the public AudioTurnDetector -> TurnDetector and replace the model= constructor argument with version="v1"|"v1-mini". The version maps to the internal model name (turn-detector-v1 / turn-detector-v1-mini), which the `model` property and EOT telemetry continue to report unchanged. Drop the audio modality from the private peers so they read generically for the multimodal EOU direction: - _AudioTurnDetector -> _BaseStreamingTurnDetector - _AudioTurnDetectorStream -> _BaseStreamingTurnDetectorStream - _AudioTurnDetectionTransport -> _StreamingTurnDetectionTransport Updates the inference exports (adds TurnDetectorVersions), framework references, the deprecated turn-detector plugin notice, plugin READMEs, tests, and examples. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

_make_full_recognition_for_eou builds AudioRecognition via __new__ and hand-sets attributes. The main merge (#5841) turned _speaking into a property backed by _user_silence_ev, so `ar._speaking = False` began raising AttributeError because the helper never created that event. This was a semantic merge conflict git did not flag textually. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…dal-EOU # Conflicts: # livekit-agents/livekit/agents/version.py # livekit-agents/pyproject.toml # livekit-plugins/livekit-plugins-anam/livekit/plugins/anam/version.py # livekit-plugins/livekit-plugins-anam/pyproject.toml # livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/version.py # livekit-plugins/livekit-plugins-anthropic/pyproject.toml # livekit-plugins/livekit-plugins-assemblyai/livekit/plugins/assemblyai/version.py # livekit-plugins/livekit-plugins-assemblyai/pyproject.toml # livekit-plugins/livekit-plugins-asyncai/livekit/plugins/asyncai/version.py # livekit-plugins/livekit-plugins-asyncai/pyproject.toml # livekit-plugins/livekit-plugins-avatario/livekit/plugins/avatario/version.py # livekit-plugins/livekit-plugins-avatario/pyproject.toml # livekit-plugins/livekit-plugins-avatartalk/livekit/plugins/avatartalk/version.py # livekit-plugins/livekit-plugins-avatartalk/pyproject.toml # livekit-plugins/livekit-plugins-aws/livekit/plugins/aws/version.py # livekit-plugins/livekit-plugins-aws/pyproject.toml # livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/version.py # livekit-plugins/livekit-plugins-azure/pyproject.toml # livekit-plugins/livekit-plugins-baseten/livekit/plugins/baseten/version.py # livekit-plugins/livekit-plugins-baseten/pyproject.toml # livekit-plugins/livekit-plugins-bey/livekit/plugins/bey/version.py # livekit-plugins/livekit-plugins-bey/pyproject.toml # livekit-plugins/livekit-plugins-bithuman/livekit/plugins/bithuman/version.py # livekit-plugins/livekit-plugins-bithuman/pyproject.toml # livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/version.py # livekit-plugins/livekit-plugins-browser/pyproject.toml # livekit-plugins/livekit-plugins-cambai/livekit/plugins/cambai/version.py # livekit-plugins/livekit-plugins-cambai/pyproject.toml # livekit-plugins/livekit-plugins-cartesia/livekit/plugins/cartesia/version.py # livekit-plugins/livekit-plugins-cartesia/pyproject.toml # livekit-plugins/livekit-plugins-cerebras/livekit/plugins/cerebras/version.py # livekit-plugins/livekit-plugins-cerebras/pyproject.toml # livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/version.py # livekit-plugins/livekit-plugins-clova/pyproject.toml # livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/version.py # livekit-plugins/livekit-plugins-deepgram/pyproject.toml # livekit-plugins/livekit-plugins-did/livekit/plugins/did/version.py # livekit-plugins/livekit-plugins-did/pyproject.toml # livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/version.py # livekit-plugins/livekit-plugins-elevenlabs/pyproject.toml # livekit-plugins/livekit-plugins-fal/livekit/plugins/fal/version.py # livekit-plugins/livekit-plugins-fal/pyproject.toml # livekit-plugins/livekit-plugins-fireworksai/livekit/plugins/fireworksai/version.py # livekit-plugins/livekit-plugins-fireworksai/pyproject.toml # livekit-plugins/livekit-plugins-fishaudio/livekit/plugins/fishaudio/version.py # livekit-plugins/livekit-plugins-fishaudio/pyproject.toml # livekit-plugins/livekit-plugins-gladia/livekit/plugins/gladia/version.py # livekit-plugins/livekit-plugins-gladia/pyproject.toml # livekit-plugins/livekit-plugins-gnani/livekit/plugins/gnani/version.py # livekit-plugins/livekit-plugins-gnani/pyproject.toml # livekit-plugins/livekit-plugins-google/livekit/plugins/google/version.py # livekit-plugins/livekit-plugins-google/pyproject.toml # livekit-plugins/livekit-plugins-gradium/livekit/plugins/gradium/version.py # livekit-plugins/livekit-plugins-gradium/pyproject.toml # livekit-plugins/livekit-plugins-groq/livekit/plugins/groq/version.py # livekit-plugins/livekit-plugins-groq/pyproject.toml # livekit-plugins/livekit-plugins-hamming/livekit/plugins/hamming/version.py # livekit-plugins/livekit-plugins-hamming/pyproject.toml # livekit-plugins/livekit-plugins-hedra/livekit/plugins/hedra/version.py # livekit-plugins/livekit-plugins-hedra/pyproject.toml # livekit-plugins/livekit-plugins-hume/livekit/plugins/hume/version.py # livekit-plugins/livekit-plugins-hume/pyproject.toml # livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/version.py # livekit-plugins/livekit-plugins-inworld/pyproject.toml # livekit-plugins/livekit-plugins-keyframe/livekit/plugins/keyframe/version.py # livekit-plugins/livekit-plugins-keyframe/pyproject.toml # livekit-plugins/livekit-plugins-krisp/livekit/plugins/krisp/version.py # livekit-plugins/livekit-plugins-krisp/pyproject.toml # livekit-plugins/livekit-plugins-langchain/livekit/plugins/langchain/version.py # livekit-plugins/livekit-plugins-langchain/pyproject.toml # livekit-plugins/livekit-plugins-lemonslice/livekit/plugins/lemonslice/version.py # livekit-plugins/livekit-plugins-lemonslice/pyproject.toml # livekit-plugins/livekit-plugins-liveavatar/livekit/plugins/liveavatar/version.py # livekit-plugins/livekit-plugins-liveavatar/pyproject.toml # livekit-plugins/livekit-plugins-lmnt/livekit/plugins/lmnt/version.py # livekit-plugins/livekit-plugins-lmnt/pyproject.toml # livekit-plugins/livekit-plugins-minimal/livekit/plugins/minimal/version.py # livekit-plugins/livekit-plugins-minimal/pyproject.toml # livekit-plugins/livekit-plugins-minimax/livekit/plugins/minimax/version.py # livekit-plugins/livekit-plugins-minimax/pyproject.toml # livekit-plugins/livekit-plugins-mistralai/livekit/plugins/mistralai/version.py # livekit-plugins/livekit-plugins-mistralai/pyproject.toml # livekit-plugins/livekit-plugins-murf/livekit/plugins/murf/version.py # livekit-plugins/livekit-plugins-murf/pyproject.toml # livekit-plugins/livekit-plugins-neuphonic/livekit/plugins/neuphonic/version.py # livekit-plugins/livekit-plugins-neuphonic/pyproject.toml # livekit-plugins/livekit-plugins-nltk/livekit/plugins/nltk/version.py # livekit-plugins/livekit-plugins-nltk/pyproject.toml # livekit-plugins/livekit-plugins-nvidia/livekit/plugins/nvidia/version.py # livekit-plugins/livekit-plugins-nvidia/pyproject.toml # livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/version.py # livekit-plugins/livekit-plugins-openai/pyproject.toml # livekit-plugins/livekit-plugins-perplexity/livekit/plugins/perplexity/version.py # livekit-plugins/livekit-plugins-perplexity/pyproject.toml # livekit-plugins/livekit-plugins-phonic/livekit/plugins/phonic/version.py # livekit-plugins/livekit-plugins-phonic/pyproject.toml # livekit-plugins/livekit-plugins-resemble/livekit/plugins/resemble/version.py # livekit-plugins/livekit-plugins-resemble/pyproject.toml # livekit-plugins/livekit-plugins-respeecher/livekit/plugins/respeecher/version.py # livekit-plugins/livekit-plugins-respeecher/pyproject.toml # livekit-plugins/livekit-plugins-rime/livekit/plugins/rime/version.py # livekit-plugins/livekit-plugins-rime/pyproject.toml # livekit-plugins/livekit-plugins-rtzr/livekit/plugins/rtzr/version.py # livekit-plugins/livekit-plugins-rtzr/pyproject.toml # livekit-plugins/livekit-plugins-runway/livekit/plugins/runway/version.py # livekit-plugins/livekit-plugins-runway/pyproject.toml # livekit-plugins/livekit-plugins-sarvam/livekit/plugins/sarvam/version.py # livekit-plugins/livekit-plugins-sarvam/pyproject.toml # livekit-plugins/livekit-plugins-silero/livekit/plugins/silero/version.py # livekit-plugins/livekit-plugins-silero/pyproject.toml # livekit-plugins/livekit-plugins-simli/livekit/plugins/simli/version.py # livekit-plugins/livekit-plugins-simli/pyproject.toml # livekit-plugins/livekit-plugins-simplismart/livekit/plugins/simplismart/version.py # livekit-plugins/livekit-plugins-simplismart/pyproject.toml # livekit-plugins/livekit-plugins-slng/livekit/plugins/slng/version.py # livekit-plugins/livekit-plugins-slng/pyproject.toml # livekit-plugins/livekit-plugins-smallestai/livekit/plugins/smallestai/version.py # livekit-plugins/livekit-plugins-smallestai/pyproject.toml # livekit-plugins/livekit-plugins-soniox/livekit/plugins/soniox/version.py # livekit-plugins/livekit-plugins-soniox/pyproject.toml # livekit-plugins/livekit-plugins-speechify/livekit/plugins/speechify/version.py # livekit-plugins/livekit-plugins-speechify/pyproject.toml # livekit-plugins/livekit-plugins-speechmatics/livekit/plugins/speechmatics/version.py # livekit-plugins/livekit-plugins-speechmatics/pyproject.toml # livekit-plugins/livekit-plugins-spitch/livekit/plugins/spitch/version.py # livekit-plugins/livekit-plugins-spitch/pyproject.toml # livekit-plugins/livekit-plugins-tavus/livekit/plugins/tavus/version.py # livekit-plugins/livekit-plugins-tavus/pyproject.toml # livekit-plugins/livekit-plugins-telnyx/livekit/plugins/telnyx/version.py # livekit-plugins/livekit-plugins-telnyx/pyproject.toml # livekit-plugins/livekit-plugins-trugen/livekit/plugins/trugen/version.py # livekit-plugins/livekit-plugins-trugen/pyproject.toml # livekit-plugins/livekit-plugins-turn-detector/livekit/plugins/turn_detector/version.py # livekit-plugins/livekit-plugins-turn-detector/pyproject.toml # livekit-plugins/livekit-plugins-ultravox/livekit/plugins/ultravox/version.py # livekit-plugins/livekit-plugins-ultravox/pyproject.toml # livekit-plugins/livekit-plugins-upliftai/livekit/plugins/upliftai/version.py # livekit-plugins/livekit-plugins-upliftai/pyproject.toml # livekit-plugins/livekit-plugins-xai/livekit/plugins/xai/version.py # livekit-plugins/livekit-plugins-xai/pyproject.toml

Add a copy of the turn detection model license and call it out in the root README alongside the Apache-2.0 framework license. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Drop duplicate worker token header declaration Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>

theomonnom · 2026-06-10T17:41:54Z

+def resolve_env_var(val: NotGivenOr[str], *env_vars: str, default: str = "") -> str:
+    """
+    Resolve an environment variable from a list of potential sources.
+
+    Args:
+        val: The value to resolve.
+        *env_vars: The environment variables to check. Order matters, the first non-None value will be returned.
+        default: The default value to return if no environment variables are set.
+
+    Returns:
+        The resolved environment variable.
+
+    Examples:
+    >>> resolve_env_var(
+    ...     NOT_GIVEN,
+    ...     "ABC_URL",
+    ...     default="https://agent-gateway.livekit.cloud/v1",
+    ... )
+    "https://agent-gateway.livekit.cloud/v1"
+    """
+    if is_given(val):
+        return val
+    for env_var in env_vars:
+        curr_val = os.getenv(env_var, None)
+        if curr_val is not None and curr_val != "":
+            return curr_val
+    return default


Do we need that?

Isn't it just

curr_val = os.getenv(env_var, None) or "default"?

This is really just to handle LIVEKIT_INFERENCE_* before LIVEKIT_*

theomonnom · 2026-06-10T17:44:56Z

+@runtime_checkable
+class _StreamingTurnDetectorStream(_TurnDetector, Protocol):
+    # allow None chat_ctx for the streaming model
+    async def predict_end_of_turn(
+        self, chat_ctx: ChatContext | None = None, *, timeout: float | None = None
+    ) -> float: ...
+
+    @property
+    def is_active(self) -> bool: ...
+    @property
+    def is_inference_running(self) -> bool: ...
+    @property
+    def preemptive_request_id(self) -> str | None: ...
+    @property
+    def last_prediction(self) -> TurnDetectionEvent | None: ...
+
+    def update_language(self, language: LanguageCode | None) -> None: ...
+
+    def warmup(self) -> asyncio.Future[float]: ...
+    def activate(self, trigger: str | None = None) -> None: ...
+    def deactivate(self, trigger: str | None = None) -> None: ...
+    def flush(self, reason: str | None = None) -> None: ...
+    def push_audio(self, frame: rtc.AudioFrame) -> None: ...
+    def end_input(self) -> None: ...
+    async def aclose(self) -> None: ...
+
+
+@runtime_checkable
+class _StreamingTurnDetector(Protocol):
+    """Turn detector that hands out a per-session stream instead of resolving
+    inline. Per-language threshold lookups (``unlikely_threshold`` /
+    ``supports_language``) live on the stream, not the detector — after a
+    cloud→local fallback they need to reflect the active backend's rescaled
+    view, which is per-session state."""
+
+    @property
+    def model(self) -> str: ...
+    @property
+    def provider(self) -> str: ...
+
+    def stream(
+        self,
+        *,
+        conn_options: APIConnectOptions = DEFAULT_API_CONNECT_OPTIONS,
+    ) -> _StreamingTurnDetectorStream: ...


Fwiw, it's very unlikely we will support third party turn detectors, we could simplify the abstraction by not having any protocol or abc class

I've simplified this a lot in the new commit: d74df8d (this PR)

devin-ai-integration

Devin Review found 2 new potential issues.

View 39 additional findings in Devin Review.

devin-ai-integration · 2026-06-10T23:36:44Z

                speaking=self._speaking
-                if self._vad or self._turn_detection_mode == "stt"
+                if (self._vad is not None and not self._using_default_vad)
+                or self._turn_detection_mode == "stt"
                else None,


🚩 using_default_vad changes speaking-state reporting semantics for hooks

When _using_default_vad is True, on_final_transcript, on_interim_transcript, and on_preflight_transcript now pass speaking=None instead of the VAD-derived boolean (audio_recognition.py:1008-1011, audio_recognition.py:1062-1064, audio_recognition.py:1107-1109). This is intentional — the default VAD feeds only the audio turn detector, not the pipeline's speaking-state API. Additionally, _last_speaking_time is always overwritten by STT timestamps (audio_recognition.py:1029-1031) when _using_default_vad is True, even though VAD events continue updating it. This preserves pre-PR behavior for users who never explicitly provided a VAD, but is a semantic change for anyone who relied on the implicit VAD (now auto-provisioned) providing speaking state through hooks. Downstream consumers (e.g., custom on_final_transcript handlers) that checked speaking is not None will see a behavior difference.

Was this helpful? React with 👍 or 👎 to provide feedback.

theomonnom · 2026-06-11T06:24:46Z

Nice work! 🚀

…ff (#6049) Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Long Chen <longch1024@gmail.com>

…tions (#6041)

…#6053)

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

devin-ai-integration

Devin Review found 1 new potential issue.

add interface draft

87068d5

chenghao-mou added 25 commits March 6, 2026 10:47

Merge branch 'main' into feat/AGT-2520-multimodal-EOU

e0d5ec1

draft

8eebccc

fix type issues

f92fbc0

refactor stream to support turn detector protocol

d1086ff

minor fixes

0a02bb1

minor fixes

168d0d7

WIP: use only ws stream

277db6e

Merge branch 'main' into feat/AGT-2520-multimodal-EOU

03c0e2e

fix uv.lock bad merge

56b4796

WIP: more refactoring

be9a550

fix mypy

601229c

remove temp url

c4d92f8

disable turn detection when agent is still speaking

e963d85

minor refactoring

c529d79

fix type issues

09baed8

wip

3830638

clean up encoder

f214aa0

wip

c922f44

Merge branch 'main' into feat/AGT-2520-multimodal-EOU

f94a0dd

update protos

604bfdc

minor fixes

f9ec64a

address comments

ddbf594

add text fallback

d465564

add text fallback

6e7d6bf

fix threshold

200d634

chenghao-mou marked this pull request as ready for review April 22, 2026 07:38

chenghao-mou requested a review from a team April 22, 2026 07:38

chenghao-mou added 2 commits June 5, 2026 00:54

Merge branch 'main' into feat/AGT-2520-multimodal-EOU

3bc119c

# Conflicts: # examples/voice_agents/async_tool_agent.py # examples/voice_agents/basic_agent.py # livekit-agents/livekit/agents/voice/agent_activity.py # tests/test_agent_session.py

fix tests

e4f7fad

chenghao-mou mentioned this pull request Jun 5, 2026

feat(eot): add audio models AGT-2919 livekit/agents-js#1719

Open

8 tasks

chenghao-mou and others added 2 commits June 6, 2026 20:45

This comment was marked as resolved.

Sign in to view

chenghao-mou changed the title ~~feat(eot): Audio EOT~~ feat(eot): add audio models AGT-2520 Jun 7, 2026

chenghao-mou added 2 commits June 7, 2026 16:26

skip None threshold or probability events

1a3a270

This comment was marked as resolved.

Sign in to view

chenghao-mou and others added 2 commits June 8, 2026 14:04

docs: add LiveKit Model License and reference it in README

b99d4d1

Add a copy of the turn detection model license and call it out in the root README alongside the Apache-2.0 framework license. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Merge branch 'main' into feat/AGT-2520-multimodal-EOU

6cbb55e

This comment was marked as resolved.

Sign in to view

Update livekit-agents/livekit/agents/inference/_utils.py

f8a6ce7

Drop duplicate worker token header declaration Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>

theomonnom reviewed Jun 10, 2026

View reviewed changes

Comment thread examples/frontdesk/agent.py Outdated

theomonnom reviewed Jun 10, 2026

View reviewed changes

chenghao-mou added 2 commits June 10, 2026 23:23

drop FSM in stream and move logic to audio recognition

d74df8d

remove TurnDetector reference

66f0e1c

devin-ai-integration Bot reviewed Jun 10, 2026

View reviewed changes

chenghao-mou added 2 commits June 11, 2026 01:00

restore deps due to bad merge

157e7b4

Merge branch 'main' into feat/AGT-2520-multimodal-EOU

b73663d

theomonnom approved these changes Jun 11, 2026

View reviewed changes

toubatbrian and others added 5 commits June 11, 2026 14:06

fix(voice): route and drain AsyncToolset executors correctly on hando…

62cd057

…ff (#6049) Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Long Chen <longch1024@gmail.com>

fix(deepgram): use stored language when validating model in update_op…

1e1332e

…tions (#6041)

simulation: read the dispatch from the simulator participant metadata (…

a528fc9

…#6053)

remove the next-release changeset machinery (#6054)

c8f0e26

fix(bargein): error when no interruption threshold is known (#6034)

13af956

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

devin-ai-integration Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread livekit-agents/livekit/agents/voice/audio_recognition.py

Conversation

chenghao-mou commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hsjun99 commented Feb 25, 2026

Uh oh!

chenghao-mou commented Feb 25, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

theomonnom Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

chenghao-mou Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

theomonnom Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

chenghao-mou Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

devin-ai-integration Bot Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

theomonnom commented Jun 11, 2026

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

chenghao-mou commented Feb 5, 2026 •

edited

Loading

devin-ai-integration Bot Jun 10, 2026 •

edited

Loading