feat(eot): add audio models AGT-2520#4722
Conversation
|
@chenghao-mou Excited to see this! A couple of questions:
|
Thanks for your patience! We don't have an official decision or timeline yet, but hopefully I can get it ready within a month or two. |
# Conflicts: # examples/voice_agents/async_tool_agent.py # examples/voice_agents/basic_agent.py # livekit-agents/livekit/agents/voice/agent_activity.py # tests/test_agent_session.py
… version Rename the public AudioTurnDetector -> TurnDetector and replace the model= constructor argument with version="v1"|"v1-mini". The version maps to the internal model name (turn-detector-v1 / turn-detector-v1-mini), which the `model` property and EOT telemetry continue to report unchanged. Drop the audio modality from the private peers so they read generically for the multimodal EOU direction: - _AudioTurnDetector -> _BaseStreamingTurnDetector - _AudioTurnDetectorStream -> _BaseStreamingTurnDetectorStream - _AudioTurnDetectionTransport -> _StreamingTurnDetectionTransport Updates the inference exports (adds TurnDetectorVersions), framework references, the deprecated turn-detector plugin notice, plugin READMEs, tests, and examples. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
_make_full_recognition_for_eou builds AudioRecognition via __new__ and hand-sets attributes. The main merge (#5841) turned _speaking into a property backed by _user_silence_ev, so `ar._speaking = False` began raising AttributeError because the helper never created that event. This was a semantic merge conflict git did not flag textually. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…dal-EOU # Conflicts: # livekit-agents/livekit/agents/version.py # livekit-agents/pyproject.toml # livekit-plugins/livekit-plugins-anam/livekit/plugins/anam/version.py # livekit-plugins/livekit-plugins-anam/pyproject.toml # livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/version.py # livekit-plugins/livekit-plugins-anthropic/pyproject.toml # livekit-plugins/livekit-plugins-assemblyai/livekit/plugins/assemblyai/version.py # livekit-plugins/livekit-plugins-assemblyai/pyproject.toml # livekit-plugins/livekit-plugins-asyncai/livekit/plugins/asyncai/version.py # livekit-plugins/livekit-plugins-asyncai/pyproject.toml # livekit-plugins/livekit-plugins-avatario/livekit/plugins/avatario/version.py # livekit-plugins/livekit-plugins-avatario/pyproject.toml # livekit-plugins/livekit-plugins-avatartalk/livekit/plugins/avatartalk/version.py # livekit-plugins/livekit-plugins-avatartalk/pyproject.toml # livekit-plugins/livekit-plugins-aws/livekit/plugins/aws/version.py # livekit-plugins/livekit-plugins-aws/pyproject.toml # livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/version.py # livekit-plugins/livekit-plugins-azure/pyproject.toml # livekit-plugins/livekit-plugins-baseten/livekit/plugins/baseten/version.py # livekit-plugins/livekit-plugins-baseten/pyproject.toml # livekit-plugins/livekit-plugins-bey/livekit/plugins/bey/version.py # livekit-plugins/livekit-plugins-bey/pyproject.toml # livekit-plugins/livekit-plugins-bithuman/livekit/plugins/bithuman/version.py # livekit-plugins/livekit-plugins-bithuman/pyproject.toml # livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/version.py # livekit-plugins/livekit-plugins-browser/pyproject.toml # livekit-plugins/livekit-plugins-cambai/livekit/plugins/cambai/version.py # livekit-plugins/livekit-plugins-cambai/pyproject.toml # livekit-plugins/livekit-plugins-cartesia/livekit/plugins/cartesia/version.py # livekit-plugins/livekit-plugins-cartesia/pyproject.toml # livekit-plugins/livekit-plugins-cerebras/livekit/plugins/cerebras/version.py # livekit-plugins/livekit-plugins-cerebras/pyproject.toml # livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/version.py # livekit-plugins/livekit-plugins-clova/pyproject.toml # livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/version.py # livekit-plugins/livekit-plugins-deepgram/pyproject.toml # livekit-plugins/livekit-plugins-did/livekit/plugins/did/version.py # livekit-plugins/livekit-plugins-did/pyproject.toml # livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/version.py # livekit-plugins/livekit-plugins-elevenlabs/pyproject.toml # livekit-plugins/livekit-plugins-fal/livekit/plugins/fal/version.py # livekit-plugins/livekit-plugins-fal/pyproject.toml # livekit-plugins/livekit-plugins-fireworksai/livekit/plugins/fireworksai/version.py # livekit-plugins/livekit-plugins-fireworksai/pyproject.toml # livekit-plugins/livekit-plugins-fishaudio/livekit/plugins/fishaudio/version.py # livekit-plugins/livekit-plugins-fishaudio/pyproject.toml # livekit-plugins/livekit-plugins-gladia/livekit/plugins/gladia/version.py # livekit-plugins/livekit-plugins-gladia/pyproject.toml # livekit-plugins/livekit-plugins-gnani/livekit/plugins/gnani/version.py # livekit-plugins/livekit-plugins-gnani/pyproject.toml # livekit-plugins/livekit-plugins-google/livekit/plugins/google/version.py # livekit-plugins/livekit-plugins-google/pyproject.toml # livekit-plugins/livekit-plugins-gradium/livekit/plugins/gradium/version.py # livekit-plugins/livekit-plugins-gradium/pyproject.toml # livekit-plugins/livekit-plugins-groq/livekit/plugins/groq/version.py # livekit-plugins/livekit-plugins-groq/pyproject.toml # livekit-plugins/livekit-plugins-hamming/livekit/plugins/hamming/version.py # livekit-plugins/livekit-plugins-hamming/pyproject.toml # livekit-plugins/livekit-plugins-hedra/livekit/plugins/hedra/version.py # livekit-plugins/livekit-plugins-hedra/pyproject.toml # livekit-plugins/livekit-plugins-hume/livekit/plugins/hume/version.py # livekit-plugins/livekit-plugins-hume/pyproject.toml # livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/version.py # livekit-plugins/livekit-plugins-inworld/pyproject.toml # livekit-plugins/livekit-plugins-keyframe/livekit/plugins/keyframe/version.py # livekit-plugins/livekit-plugins-keyframe/pyproject.toml # livekit-plugins/livekit-plugins-krisp/livekit/plugins/krisp/version.py # livekit-plugins/livekit-plugins-krisp/pyproject.toml # livekit-plugins/livekit-plugins-langchain/livekit/plugins/langchain/version.py # livekit-plugins/livekit-plugins-langchain/pyproject.toml # livekit-plugins/livekit-plugins-lemonslice/livekit/plugins/lemonslice/version.py # livekit-plugins/livekit-plugins-lemonslice/pyproject.toml # livekit-plugins/livekit-plugins-liveavatar/livekit/plugins/liveavatar/version.py # livekit-plugins/livekit-plugins-liveavatar/pyproject.toml # livekit-plugins/livekit-plugins-lmnt/livekit/plugins/lmnt/version.py # livekit-plugins/livekit-plugins-lmnt/pyproject.toml # livekit-plugins/livekit-plugins-minimal/livekit/plugins/minimal/version.py # livekit-plugins/livekit-plugins-minimal/pyproject.toml # livekit-plugins/livekit-plugins-minimax/livekit/plugins/minimax/version.py # livekit-plugins/livekit-plugins-minimax/pyproject.toml # livekit-plugins/livekit-plugins-mistralai/livekit/plugins/mistralai/version.py # livekit-plugins/livekit-plugins-mistralai/pyproject.toml # livekit-plugins/livekit-plugins-murf/livekit/plugins/murf/version.py # livekit-plugins/livekit-plugins-murf/pyproject.toml # livekit-plugins/livekit-plugins-neuphonic/livekit/plugins/neuphonic/version.py # livekit-plugins/livekit-plugins-neuphonic/pyproject.toml # livekit-plugins/livekit-plugins-nltk/livekit/plugins/nltk/version.py # livekit-plugins/livekit-plugins-nltk/pyproject.toml # livekit-plugins/livekit-plugins-nvidia/livekit/plugins/nvidia/version.py # livekit-plugins/livekit-plugins-nvidia/pyproject.toml # livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/version.py # livekit-plugins/livekit-plugins-openai/pyproject.toml # livekit-plugins/livekit-plugins-perplexity/livekit/plugins/perplexity/version.py # livekit-plugins/livekit-plugins-perplexity/pyproject.toml # livekit-plugins/livekit-plugins-phonic/livekit/plugins/phonic/version.py # livekit-plugins/livekit-plugins-phonic/pyproject.toml # livekit-plugins/livekit-plugins-resemble/livekit/plugins/resemble/version.py # livekit-plugins/livekit-plugins-resemble/pyproject.toml # livekit-plugins/livekit-plugins-respeecher/livekit/plugins/respeecher/version.py # livekit-plugins/livekit-plugins-respeecher/pyproject.toml # livekit-plugins/livekit-plugins-rime/livekit/plugins/rime/version.py # livekit-plugins/livekit-plugins-rime/pyproject.toml # livekit-plugins/livekit-plugins-rtzr/livekit/plugins/rtzr/version.py # livekit-plugins/livekit-plugins-rtzr/pyproject.toml # livekit-plugins/livekit-plugins-runway/livekit/plugins/runway/version.py # livekit-plugins/livekit-plugins-runway/pyproject.toml # livekit-plugins/livekit-plugins-sarvam/livekit/plugins/sarvam/version.py # livekit-plugins/livekit-plugins-sarvam/pyproject.toml # livekit-plugins/livekit-plugins-silero/livekit/plugins/silero/version.py # livekit-plugins/livekit-plugins-silero/pyproject.toml # livekit-plugins/livekit-plugins-simli/livekit/plugins/simli/version.py # livekit-plugins/livekit-plugins-simli/pyproject.toml # livekit-plugins/livekit-plugins-simplismart/livekit/plugins/simplismart/version.py # livekit-plugins/livekit-plugins-simplismart/pyproject.toml # livekit-plugins/livekit-plugins-slng/livekit/plugins/slng/version.py # livekit-plugins/livekit-plugins-slng/pyproject.toml # livekit-plugins/livekit-plugins-smallestai/livekit/plugins/smallestai/version.py # livekit-plugins/livekit-plugins-smallestai/pyproject.toml # livekit-plugins/livekit-plugins-soniox/livekit/plugins/soniox/version.py # livekit-plugins/livekit-plugins-soniox/pyproject.toml # livekit-plugins/livekit-plugins-speechify/livekit/plugins/speechify/version.py # livekit-plugins/livekit-plugins-speechify/pyproject.toml # livekit-plugins/livekit-plugins-speechmatics/livekit/plugins/speechmatics/version.py # livekit-plugins/livekit-plugins-speechmatics/pyproject.toml # livekit-plugins/livekit-plugins-spitch/livekit/plugins/spitch/version.py # livekit-plugins/livekit-plugins-spitch/pyproject.toml # livekit-plugins/livekit-plugins-tavus/livekit/plugins/tavus/version.py # livekit-plugins/livekit-plugins-tavus/pyproject.toml # livekit-plugins/livekit-plugins-telnyx/livekit/plugins/telnyx/version.py # livekit-plugins/livekit-plugins-telnyx/pyproject.toml # livekit-plugins/livekit-plugins-trugen/livekit/plugins/trugen/version.py # livekit-plugins/livekit-plugins-trugen/pyproject.toml # livekit-plugins/livekit-plugins-turn-detector/livekit/plugins/turn_detector/version.py # livekit-plugins/livekit-plugins-turn-detector/pyproject.toml # livekit-plugins/livekit-plugins-ultravox/livekit/plugins/ultravox/version.py # livekit-plugins/livekit-plugins-ultravox/pyproject.toml # livekit-plugins/livekit-plugins-upliftai/livekit/plugins/upliftai/version.py # livekit-plugins/livekit-plugins-upliftai/pyproject.toml # livekit-plugins/livekit-plugins-xai/livekit/plugins/xai/version.py # livekit-plugins/livekit-plugins-xai/pyproject.toml
Add a copy of the turn detection model license and call it out in the root README alongside the Apache-2.0 framework license. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drop duplicate worker token header declaration Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
| def resolve_env_var(val: NotGivenOr[str], *env_vars: str, default: str = "") -> str: | ||
| """ | ||
| Resolve an environment variable from a list of potential sources. | ||
|
|
||
| Args: | ||
| val: The value to resolve. | ||
| *env_vars: The environment variables to check. Order matters, the first non-None value will be returned. | ||
| default: The default value to return if no environment variables are set. | ||
|
|
||
| Returns: | ||
| The resolved environment variable. | ||
|
|
||
| Examples: | ||
| >>> resolve_env_var( | ||
| ... NOT_GIVEN, | ||
| ... "ABC_URL", | ||
| ... default="https://agent-gateway.livekit.cloud/v1", | ||
| ... ) | ||
| "https://agent-gateway.livekit.cloud/v1" | ||
| """ | ||
| if is_given(val): | ||
| return val | ||
| for env_var in env_vars: | ||
| curr_val = os.getenv(env_var, None) | ||
| if curr_val is not None and curr_val != "": | ||
| return curr_val | ||
| return default |
There was a problem hiding this comment.
Do we need that?
Isn't it just
curr_val = os.getenv(env_var, None) or "default"?
There was a problem hiding this comment.
This is really just to handle LIVEKIT_INFERENCE_* before LIVEKIT_*
| @runtime_checkable | ||
| class _StreamingTurnDetectorStream(_TurnDetector, Protocol): | ||
| # allow None chat_ctx for the streaming model | ||
| async def predict_end_of_turn( | ||
| self, chat_ctx: ChatContext | None = None, *, timeout: float | None = None | ||
| ) -> float: ... | ||
|
|
||
| @property | ||
| def is_active(self) -> bool: ... | ||
| @property | ||
| def is_inference_running(self) -> bool: ... | ||
| @property | ||
| def preemptive_request_id(self) -> str | None: ... | ||
| @property | ||
| def last_prediction(self) -> TurnDetectionEvent | None: ... | ||
|
|
||
| def update_language(self, language: LanguageCode | None) -> None: ... | ||
|
|
||
| def warmup(self) -> asyncio.Future[float]: ... | ||
| def activate(self, trigger: str | None = None) -> None: ... | ||
| def deactivate(self, trigger: str | None = None) -> None: ... | ||
| def flush(self, reason: str | None = None) -> None: ... | ||
| def push_audio(self, frame: rtc.AudioFrame) -> None: ... | ||
| def end_input(self) -> None: ... | ||
| async def aclose(self) -> None: ... | ||
|
|
||
|
|
||
| @runtime_checkable | ||
| class _StreamingTurnDetector(Protocol): | ||
| """Turn detector that hands out a per-session stream instead of resolving | ||
| inline. Per-language threshold lookups (``unlikely_threshold`` / | ||
| ``supports_language``) live on the stream, not the detector — after a | ||
| cloud→local fallback they need to reflect the active backend's rescaled | ||
| view, which is per-session state.""" | ||
|
|
||
| @property | ||
| def model(self) -> str: ... | ||
| @property | ||
| def provider(self) -> str: ... | ||
|
|
||
| def stream( | ||
| self, | ||
| *, | ||
| conn_options: APIConnectOptions = DEFAULT_API_CONNECT_OPTIONS, | ||
| ) -> _StreamingTurnDetectorStream: ... |
There was a problem hiding this comment.
Fwiw, it's very unlikely we will support third party turn detectors, we could simplify the abstraction by not having any protocol or abc class
There was a problem hiding this comment.
I've simplified this a lot in the new commit: d74df8d (this PR)
| speaking=self._speaking | ||
| if self._vad or self._turn_detection_mode == "stt" | ||
| if (self._vad is not None and not self._using_default_vad) | ||
| or self._turn_detection_mode == "stt" | ||
| else None, |
There was a problem hiding this comment.
🚩 using_default_vad changes speaking-state reporting semantics for hooks
When _using_default_vad is True, on_final_transcript, on_interim_transcript, and on_preflight_transcript now pass speaking=None instead of the VAD-derived boolean (audio_recognition.py:1008-1011, audio_recognition.py:1062-1064, audio_recognition.py:1107-1109). This is intentional — the default VAD feeds only the audio turn detector, not the pipeline's speaking-state API. Additionally, _last_speaking_time is always overwritten by STT timestamps (audio_recognition.py:1029-1031) when _using_default_vad is True, even though VAD events continue updating it. This preserves pre-PR behavior for users who never explicitly provided a VAD, but is a semantic change for anyone who relied on the implicit VAD (now auto-provisioned) providing speaking state through hooks. Downstream consumers (e.g., custom on_final_transcript handlers) that checked speaking is not None will see a behavior difference.
Was this helpful? React with 👍 or 👎 to provide feedback.
|
Nice work! 🚀 |
…ff (#6049) Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Long Chen <longch1024@gmail.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds streaming audio end-of-turn detection. Single user-facing
AudioTurnDetectorthat selects between two backends:turn-detectorturn-detector-miniOn cloud transport error or
predict_end_of_turntimeout, the session swaps to mini/local for the rest of the stream (sticky per session, one warning per failure mode).Local failures emit the default
1.0prediction and retry on the next turn.A user-set
unlikely_thresholdis scaled multiplicatively against the cloud default so the operating point survives a fallback.