Skip to content

feat(eot): add audio models AGT-2520#4722

Open
chenghao-mou wants to merge 116 commits into
mainfrom
feat/AGT-2520-multimodal-EOU
Open

feat(eot): add audio models AGT-2520#4722
chenghao-mou wants to merge 116 commits into
mainfrom
feat/AGT-2520-multimodal-EOU

Conversation

@chenghao-mou

@chenghao-mou chenghao-mou commented Feb 5, 2026

Copy link
Copy Markdown
Member

Adds streaming audio end-of-turn detection. Single user-facing AudioTurnDetector that selects between two backends:

  • turn-detector
  • turn-detector-mini

On cloud transport error or predict_end_of_turn timeout, the session swaps to mini/local for the rest of the stream (sticky per session, one warning per failure mode).
Local failures emit the default 1.0 prediction and retry on the next turn.

A user-set unlikely_threshold is scaled multiplicatively against the cloud default so the operating point survives a fallback.

@hsjun99

hsjun99 commented Feb 25, 2026

Copy link
Copy Markdown

@chenghao-mou Excited to see this! A couple of questions:

  1. Will the multimodal EOT model be publicly accessible via model weights or agent-gateway.livekit.cloud, or in some other way?
  2. Any rough timeline for when MultiModalTurnDetector gets fully wired up?

@chenghao-mou

Copy link
Copy Markdown
Member Author

@chenghao-mou Excited to see this! A couple of questions:

  1. Will the multimodal EOT model be publicly accessible via model weights or agent-gateway.livekit.cloud, or in some other way?
  2. Any rough timeline for when MultiModalTurnDetector gets fully wired up?

Thanks for your patience! We don't have an official decision or timeline yet, but hopefully I can get it ready within a month or two.

@chenghao-mou chenghao-mou marked this pull request as ready for review April 22, 2026 07:38
@chenghao-mou chenghao-mou requested a review from a team April 22, 2026 07:38
# Conflicts:
#	examples/voice_agents/async_tool_agent.py
#	examples/voice_agents/basic_agent.py
#	livekit-agents/livekit/agents/voice/agent_activity.py
#	tests/test_agent_session.py
chenghao-mou and others added 2 commits June 6, 2026 20:45
… version

Rename the public AudioTurnDetector -> TurnDetector and replace the
model= constructor argument with version="v1"|"v1-mini". The version maps
to the internal model name (turn-detector-v1 / turn-detector-v1-mini),
which the `model` property and EOT telemetry continue to report unchanged.

Drop the audio modality from the private peers so they read generically
for the multimodal EOU direction:
- _AudioTurnDetector            -> _BaseStreamingTurnDetector
- _AudioTurnDetectorStream      -> _BaseStreamingTurnDetectorStream
- _AudioTurnDetectionTransport  -> _StreamingTurnDetectionTransport

Updates the inference exports (adds TurnDetectorVersions), framework
references, the deprecated turn-detector plugin notice, plugin READMEs,
tests, and examples.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
_make_full_recognition_for_eou builds AudioRecognition via __new__ and
hand-sets attributes. The main merge (#5841) turned _speaking into a
property backed by _user_silence_ev, so `ar._speaking = False` began
raising AttributeError because the helper never created that event. This
was a semantic merge conflict git did not flag textually.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

@chenghao-mou chenghao-mou changed the title feat(eot): Audio EOT feat(eot): add audio models AGT-2520 Jun 7, 2026
…dal-EOU

# Conflicts:
#	livekit-agents/livekit/agents/version.py
#	livekit-agents/pyproject.toml
#	livekit-plugins/livekit-plugins-anam/livekit/plugins/anam/version.py
#	livekit-plugins/livekit-plugins-anam/pyproject.toml
#	livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/version.py
#	livekit-plugins/livekit-plugins-anthropic/pyproject.toml
#	livekit-plugins/livekit-plugins-assemblyai/livekit/plugins/assemblyai/version.py
#	livekit-plugins/livekit-plugins-assemblyai/pyproject.toml
#	livekit-plugins/livekit-plugins-asyncai/livekit/plugins/asyncai/version.py
#	livekit-plugins/livekit-plugins-asyncai/pyproject.toml
#	livekit-plugins/livekit-plugins-avatario/livekit/plugins/avatario/version.py
#	livekit-plugins/livekit-plugins-avatario/pyproject.toml
#	livekit-plugins/livekit-plugins-avatartalk/livekit/plugins/avatartalk/version.py
#	livekit-plugins/livekit-plugins-avatartalk/pyproject.toml
#	livekit-plugins/livekit-plugins-aws/livekit/plugins/aws/version.py
#	livekit-plugins/livekit-plugins-aws/pyproject.toml
#	livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/version.py
#	livekit-plugins/livekit-plugins-azure/pyproject.toml
#	livekit-plugins/livekit-plugins-baseten/livekit/plugins/baseten/version.py
#	livekit-plugins/livekit-plugins-baseten/pyproject.toml
#	livekit-plugins/livekit-plugins-bey/livekit/plugins/bey/version.py
#	livekit-plugins/livekit-plugins-bey/pyproject.toml
#	livekit-plugins/livekit-plugins-bithuman/livekit/plugins/bithuman/version.py
#	livekit-plugins/livekit-plugins-bithuman/pyproject.toml
#	livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/version.py
#	livekit-plugins/livekit-plugins-browser/pyproject.toml
#	livekit-plugins/livekit-plugins-cambai/livekit/plugins/cambai/version.py
#	livekit-plugins/livekit-plugins-cambai/pyproject.toml
#	livekit-plugins/livekit-plugins-cartesia/livekit/plugins/cartesia/version.py
#	livekit-plugins/livekit-plugins-cartesia/pyproject.toml
#	livekit-plugins/livekit-plugins-cerebras/livekit/plugins/cerebras/version.py
#	livekit-plugins/livekit-plugins-cerebras/pyproject.toml
#	livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/version.py
#	livekit-plugins/livekit-plugins-clova/pyproject.toml
#	livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/version.py
#	livekit-plugins/livekit-plugins-deepgram/pyproject.toml
#	livekit-plugins/livekit-plugins-did/livekit/plugins/did/version.py
#	livekit-plugins/livekit-plugins-did/pyproject.toml
#	livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/version.py
#	livekit-plugins/livekit-plugins-elevenlabs/pyproject.toml
#	livekit-plugins/livekit-plugins-fal/livekit/plugins/fal/version.py
#	livekit-plugins/livekit-plugins-fal/pyproject.toml
#	livekit-plugins/livekit-plugins-fireworksai/livekit/plugins/fireworksai/version.py
#	livekit-plugins/livekit-plugins-fireworksai/pyproject.toml
#	livekit-plugins/livekit-plugins-fishaudio/livekit/plugins/fishaudio/version.py
#	livekit-plugins/livekit-plugins-fishaudio/pyproject.toml
#	livekit-plugins/livekit-plugins-gladia/livekit/plugins/gladia/version.py
#	livekit-plugins/livekit-plugins-gladia/pyproject.toml
#	livekit-plugins/livekit-plugins-gnani/livekit/plugins/gnani/version.py
#	livekit-plugins/livekit-plugins-gnani/pyproject.toml
#	livekit-plugins/livekit-plugins-google/livekit/plugins/google/version.py
#	livekit-plugins/livekit-plugins-google/pyproject.toml
#	livekit-plugins/livekit-plugins-gradium/livekit/plugins/gradium/version.py
#	livekit-plugins/livekit-plugins-gradium/pyproject.toml
#	livekit-plugins/livekit-plugins-groq/livekit/plugins/groq/version.py
#	livekit-plugins/livekit-plugins-groq/pyproject.toml
#	livekit-plugins/livekit-plugins-hamming/livekit/plugins/hamming/version.py
#	livekit-plugins/livekit-plugins-hamming/pyproject.toml
#	livekit-plugins/livekit-plugins-hedra/livekit/plugins/hedra/version.py
#	livekit-plugins/livekit-plugins-hedra/pyproject.toml
#	livekit-plugins/livekit-plugins-hume/livekit/plugins/hume/version.py
#	livekit-plugins/livekit-plugins-hume/pyproject.toml
#	livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/version.py
#	livekit-plugins/livekit-plugins-inworld/pyproject.toml
#	livekit-plugins/livekit-plugins-keyframe/livekit/plugins/keyframe/version.py
#	livekit-plugins/livekit-plugins-keyframe/pyproject.toml
#	livekit-plugins/livekit-plugins-krisp/livekit/plugins/krisp/version.py
#	livekit-plugins/livekit-plugins-krisp/pyproject.toml
#	livekit-plugins/livekit-plugins-langchain/livekit/plugins/langchain/version.py
#	livekit-plugins/livekit-plugins-langchain/pyproject.toml
#	livekit-plugins/livekit-plugins-lemonslice/livekit/plugins/lemonslice/version.py
#	livekit-plugins/livekit-plugins-lemonslice/pyproject.toml
#	livekit-plugins/livekit-plugins-liveavatar/livekit/plugins/liveavatar/version.py
#	livekit-plugins/livekit-plugins-liveavatar/pyproject.toml
#	livekit-plugins/livekit-plugins-lmnt/livekit/plugins/lmnt/version.py
#	livekit-plugins/livekit-plugins-lmnt/pyproject.toml
#	livekit-plugins/livekit-plugins-minimal/livekit/plugins/minimal/version.py
#	livekit-plugins/livekit-plugins-minimal/pyproject.toml
#	livekit-plugins/livekit-plugins-minimax/livekit/plugins/minimax/version.py
#	livekit-plugins/livekit-plugins-minimax/pyproject.toml
#	livekit-plugins/livekit-plugins-mistralai/livekit/plugins/mistralai/version.py
#	livekit-plugins/livekit-plugins-mistralai/pyproject.toml
#	livekit-plugins/livekit-plugins-murf/livekit/plugins/murf/version.py
#	livekit-plugins/livekit-plugins-murf/pyproject.toml
#	livekit-plugins/livekit-plugins-neuphonic/livekit/plugins/neuphonic/version.py
#	livekit-plugins/livekit-plugins-neuphonic/pyproject.toml
#	livekit-plugins/livekit-plugins-nltk/livekit/plugins/nltk/version.py
#	livekit-plugins/livekit-plugins-nltk/pyproject.toml
#	livekit-plugins/livekit-plugins-nvidia/livekit/plugins/nvidia/version.py
#	livekit-plugins/livekit-plugins-nvidia/pyproject.toml
#	livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/version.py
#	livekit-plugins/livekit-plugins-openai/pyproject.toml
#	livekit-plugins/livekit-plugins-perplexity/livekit/plugins/perplexity/version.py
#	livekit-plugins/livekit-plugins-perplexity/pyproject.toml
#	livekit-plugins/livekit-plugins-phonic/livekit/plugins/phonic/version.py
#	livekit-plugins/livekit-plugins-phonic/pyproject.toml
#	livekit-plugins/livekit-plugins-resemble/livekit/plugins/resemble/version.py
#	livekit-plugins/livekit-plugins-resemble/pyproject.toml
#	livekit-plugins/livekit-plugins-respeecher/livekit/plugins/respeecher/version.py
#	livekit-plugins/livekit-plugins-respeecher/pyproject.toml
#	livekit-plugins/livekit-plugins-rime/livekit/plugins/rime/version.py
#	livekit-plugins/livekit-plugins-rime/pyproject.toml
#	livekit-plugins/livekit-plugins-rtzr/livekit/plugins/rtzr/version.py
#	livekit-plugins/livekit-plugins-rtzr/pyproject.toml
#	livekit-plugins/livekit-plugins-runway/livekit/plugins/runway/version.py
#	livekit-plugins/livekit-plugins-runway/pyproject.toml
#	livekit-plugins/livekit-plugins-sarvam/livekit/plugins/sarvam/version.py
#	livekit-plugins/livekit-plugins-sarvam/pyproject.toml
#	livekit-plugins/livekit-plugins-silero/livekit/plugins/silero/version.py
#	livekit-plugins/livekit-plugins-silero/pyproject.toml
#	livekit-plugins/livekit-plugins-simli/livekit/plugins/simli/version.py
#	livekit-plugins/livekit-plugins-simli/pyproject.toml
#	livekit-plugins/livekit-plugins-simplismart/livekit/plugins/simplismart/version.py
#	livekit-plugins/livekit-plugins-simplismart/pyproject.toml
#	livekit-plugins/livekit-plugins-slng/livekit/plugins/slng/version.py
#	livekit-plugins/livekit-plugins-slng/pyproject.toml
#	livekit-plugins/livekit-plugins-smallestai/livekit/plugins/smallestai/version.py
#	livekit-plugins/livekit-plugins-smallestai/pyproject.toml
#	livekit-plugins/livekit-plugins-soniox/livekit/plugins/soniox/version.py
#	livekit-plugins/livekit-plugins-soniox/pyproject.toml
#	livekit-plugins/livekit-plugins-speechify/livekit/plugins/speechify/version.py
#	livekit-plugins/livekit-plugins-speechify/pyproject.toml
#	livekit-plugins/livekit-plugins-speechmatics/livekit/plugins/speechmatics/version.py
#	livekit-plugins/livekit-plugins-speechmatics/pyproject.toml
#	livekit-plugins/livekit-plugins-spitch/livekit/plugins/spitch/version.py
#	livekit-plugins/livekit-plugins-spitch/pyproject.toml
#	livekit-plugins/livekit-plugins-tavus/livekit/plugins/tavus/version.py
#	livekit-plugins/livekit-plugins-tavus/pyproject.toml
#	livekit-plugins/livekit-plugins-telnyx/livekit/plugins/telnyx/version.py
#	livekit-plugins/livekit-plugins-telnyx/pyproject.toml
#	livekit-plugins/livekit-plugins-trugen/livekit/plugins/trugen/version.py
#	livekit-plugins/livekit-plugins-trugen/pyproject.toml
#	livekit-plugins/livekit-plugins-turn-detector/livekit/plugins/turn_detector/version.py
#	livekit-plugins/livekit-plugins-turn-detector/pyproject.toml
#	livekit-plugins/livekit-plugins-ultravox/livekit/plugins/ultravox/version.py
#	livekit-plugins/livekit-plugins-ultravox/pyproject.toml
#	livekit-plugins/livekit-plugins-upliftai/livekit/plugins/upliftai/version.py
#	livekit-plugins/livekit-plugins-upliftai/pyproject.toml
#	livekit-plugins/livekit-plugins-xai/livekit/plugins/xai/version.py
#	livekit-plugins/livekit-plugins-xai/pyproject.toml
devin-ai-integration[bot]

This comment was marked as resolved.

chenghao-mou and others added 2 commits June 8, 2026 14:04
Add a copy of the turn detection model license and call it out in the
root README alongside the Apache-2.0 framework license.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

Drop duplicate worker token header declaration

Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Comment thread examples/frontdesk/agent.py Outdated
Comment on lines +7 to +33
def resolve_env_var(val: NotGivenOr[str], *env_vars: str, default: str = "") -> str:
"""
Resolve an environment variable from a list of potential sources.

Args:
val: The value to resolve.
*env_vars: The environment variables to check. Order matters, the first non-None value will be returned.
default: The default value to return if no environment variables are set.

Returns:
The resolved environment variable.

Examples:
>>> resolve_env_var(
... NOT_GIVEN,
... "ABC_URL",
... default="https://agent-gateway.livekit.cloud/v1",
... )
"https://agent-gateway.livekit.cloud/v1"
"""
if is_given(val):
return val
for env_var in env_vars:
curr_val = os.getenv(env_var, None)
if curr_val is not None and curr_val != "":
return curr_val
return default

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need that?

Isn't it just

curr_val = os.getenv(env_var, None) or "default"?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really just to handle LIVEKIT_INFERENCE_* before LIVEKIT_*

Comment on lines +51 to +95
@runtime_checkable
class _StreamingTurnDetectorStream(_TurnDetector, Protocol):
# allow None chat_ctx for the streaming model
async def predict_end_of_turn(
self, chat_ctx: ChatContext | None = None, *, timeout: float | None = None
) -> float: ...

@property
def is_active(self) -> bool: ...
@property
def is_inference_running(self) -> bool: ...
@property
def preemptive_request_id(self) -> str | None: ...
@property
def last_prediction(self) -> TurnDetectionEvent | None: ...

def update_language(self, language: LanguageCode | None) -> None: ...

def warmup(self) -> asyncio.Future[float]: ...
def activate(self, trigger: str | None = None) -> None: ...
def deactivate(self, trigger: str | None = None) -> None: ...
def flush(self, reason: str | None = None) -> None: ...
def push_audio(self, frame: rtc.AudioFrame) -> None: ...
def end_input(self) -> None: ...
async def aclose(self) -> None: ...


@runtime_checkable
class _StreamingTurnDetector(Protocol):
"""Turn detector that hands out a per-session stream instead of resolving
inline. Per-language threshold lookups (``unlikely_threshold`` /
``supports_language``) live on the stream, not the detector — after a
cloud→local fallback they need to reflect the active backend's rescaled
view, which is per-session state."""

@property
def model(self) -> str: ...
@property
def provider(self) -> str: ...

def stream(
self,
*,
conn_options: APIConnectOptions = DEFAULT_API_CONNECT_OPTIONS,
) -> _StreamingTurnDetectorStream: ...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw, it's very unlikely we will support third party turn detectors, we could simplify the abstraction by not having any protocol or abc class

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've simplified this a lot in the new commit: d74df8d (this PR)

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 39 additional findings in Devin Review.

Open in Devin Review

Comment thread livekit-agents/pyproject.toml
Comment on lines 1008 to 1011
speaking=self._speaking
if self._vad or self._turn_detection_mode == "stt"
if (self._vad is not None and not self._using_default_vad)
or self._turn_detection_mode == "stt"
else None,

@devin-ai-integration devin-ai-integration Bot Jun 10, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 using_default_vad changes speaking-state reporting semantics for hooks

When _using_default_vad is True, on_final_transcript, on_interim_transcript, and on_preflight_transcript now pass speaking=None instead of the VAD-derived boolean (audio_recognition.py:1008-1011, audio_recognition.py:1062-1064, audio_recognition.py:1107-1109). This is intentional — the default VAD feeds only the audio turn detector, not the pipeline's speaking-state API. Additionally, _last_speaking_time is always overwritten by STT timestamps (audio_recognition.py:1029-1031) when _using_default_vad is True, even though VAD events continue updating it. This preserves pre-PR behavior for users who never explicitly provided a VAD, but is a semantic change for anyone who relied on the implicit VAD (now auto-provisioned) providing speaking state through hooks. Downstream consumers (e.g., custom on_final_transcript handlers) that checked speaking is not None will see a behavior difference.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@theomonnom

Copy link
Copy Markdown
Member

Nice work! 🚀

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

Open in Devin Review

Comment thread livekit-agents/livekit/agents/voice/audio_recognition.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants