diff --git a/README.md b/README.md
index 2a09aac241..059dc65bdb 100644
--- a/README.md
+++ b/README.md
@@ -1,375 +1,275 @@
-
+# Context-Aware Interruption Logic for LiveKit Voice Agents
-
-
+## Overview
-
-[](https://pepy.tech/projects/livekit-agents)
-[](https://livekit.io/join-slack)
-[](https://twitter.com/livekit)
-[](https://deepwiki.com/livekit/agents)
-[](https://github.com/livekit/livekit/blob/master/LICENSE)
+LiveKit's default Voice Activity Detection (VAD) is highly sensitive and immediately triggers interruptions whenever user audio is detected. This causes incorrect behavior when users provide passive acknowledgements such as "yeah", "ok", "aha", or "hmm" (commonly known as backchanneling). Instead of recognizing these as signs of engagement, the agent would abruptly stop speaking, breaking conversational flow.
-
+**The Solution**: A context-aware decision system that correctly distinguishes between:
+1. **Passive acknowledgements** (soft inputs) - should not interrupt when agent is speaking
+2. **Active interruptions** (hard commands) - should always interrupt
-Looking for the JS/TS library? Check out [AgentsJS](https://github.com/livekit/agents-js)
+## How It Works
-## What is Agents?
+The interruption logic operates **above the VAD layer** and validates the final Speech-to-Text (STT) transcript before committing to an interruption. This ensures that:
-
+- VAD may fire before STT completes (false starts)
+- The system waits for transcript validation before pausing
+- Passive acknowledgements are ignored when the agent is speaking
+- Interrupt commands are always respected immediately
-The Agent Framework is designed for building realtime, programmable participants
-that run on servers. Use it to create conversational, multi-modal voice
-agents that can see, hear, and understand.
+### Decision Flow
-
-
-## Features
-
-- **Flexible integrations**: A comprehensive ecosystem to mix and match the right STT, LLM, TTS, and Realtime API to suit your use case.
-- **Integrated job scheduling**: Built-in task scheduling and distribution with [dispatch APIs](https://docs.livekit.io/agents/build/dispatch/) to connect end users to agents.
-- **Extensive WebRTC clients**: Build client applications using LiveKit's open-source SDK ecosystem, supporting all major platforms.
-- **Telephony integration**: Works seamlessly with LiveKit's [telephony stack](https://docs.livekit.io/sip/), allowing your agent to make calls to or receive calls from phones.
-- **Exchange data with clients**: Use [RPCs](https://docs.livekit.io/home/client/data/rpc/) and other [Data APIs](https://docs.livekit.io/home/client/data/) to seamlessly exchange data with clients.
-- **Semantic turn detection**: Uses a transformer model to detect when a user is done with their turn, helps to reduce interruptions.
-- **MCP support**: Native support for MCP. Integrate tools provided by MCP servers with one loc.
-- **Builtin test framework**: Write tests and use judges to ensure your agent is performing as expected.
-- **Open-source**: Fully open-source, allowing you to run the entire stack on your own servers, including [LiveKit server](https://github.com/livekit/livekit), one of the most widely used WebRTC media servers.
-
-## Installation
-
-To install the core Agents library, along with plugins for popular model providers:
-
-```bash
-pip install "livekit-agents[openai,silero,deepgram,cartesia,turn-detector]~=1.0"
+```
+User Audio Detected (VAD fires)
+ ↓
+Wait for STT Transcript (if agent is speaking)
+ ↓
+Classify Interruption:
+ ├─ Contains interrupt command? → INTERRUPT
+ ├─ Agent SPEAKING + passive acknowledgement? → IGNORE
+ ├─ Agent SILENT + passive acknowledgement? → RESPOND
+ └─ Agent SILENT + normal input? → RESPOND
```
-## Docs and guides
-
-Documentation on the framework and how to use it can be found [here](https://docs.livekit.io/agents/)
-
-## Core concepts
-
-- Agent: An LLM-based application with defined instructions.
-- AgentSession: A container for agents that manages interactions with end users.
-- entrypoint: The starting point for an interactive session, similar to a request handler in a web server.
-- Worker: The main process that coordinates job scheduling and launches agents for user sessions.
-
-## Usage
-
-### Simple voice agent
-
----
+## Decision Rules
+
+The system outputs exactly **ONE** of three decisions:
+
+### 1. `INTERRUPT` - Stop the agent immediately
+- **Trigger**: Transcript contains ANY interrupt command, even if mixed with passive acknowledgements
+- **Examples**:
+ - "stop"
+ - "wait"
+ - "yeah wait a second" → INTERRUPT (contains "wait")
+ - "ok no stop" → INTERRUPT (contains "no" and "stop")
+
+### 2. `IGNORE` - Continue speaking without interruption
+- **Trigger**: Agent is SPEAKING AND transcript contains ONLY passive acknowledgement words
+- **Examples**:
+ - Agent speaking: "yeah" → IGNORE
+ - Agent speaking: "ok" → IGNORE
+ - Agent speaking: "yeah ok" → IGNORE
+ - Agent speaking: "yeah but" → INTERRUPT (contains non-passive word)
+
+### 3. `RESPOND` - Process as new user turn
+- **Trigger**: Agent is SILENT AND transcript contains any input
+- **Examples**:
+ - Agent silent: "yeah" → RESPOND
+ - Agent silent: "hello" → RESPOND
+ - Agent silent: "tell me a story" → RESPOND
+
+## Passive Acknowledgements vs Interrupt Commands
+
+### Passive Acknowledgements (Soft Inputs)
+These words indicate engagement but should not interrupt when the agent is speaking:
+- `yeah`
+- `ok`
+- `okay`
+- `hmm`
+- `aha`
+- `uh-huh`
+- `uh huh`
+- `right`
+
+### Interrupt Commands (Hard Inputs)
+These words always trigger an interruption, regardless of context:
+- `stop`
+- `wait`
+- `no`
+- `cancel`
+- `hold on`
+- `hold`
+
+## Implementation Details
+
+### Core Function
+
+The logic is implemented in `livekit-agents/livekit/agents/voice/interruption_logic.py`:
```python
-from livekit.agents import (
- Agent,
- AgentSession,
- JobContext,
- RunContext,
- WorkerOptions,
- cli,
- function_tool,
-)
-from livekit.plugins import deepgram, elevenlabs, openai, silero
-
-@function_tool
-async def lookup_weather(
- context: RunContext,
- location: str,
-):
- """Used to look up weather information."""
-
- return {"weather": "sunny", "temperature": 70}
-
-
-async def entrypoint(ctx: JobContext):
- await ctx.connect()
-
- agent = Agent(
- instructions="You are a friendly voice assistant built by LiveKit.",
- tools=[lookup_weather],
- )
- session = AgentSession(
- vad=silero.VAD.load(),
- # any combination of STT, LLM, TTS, or realtime API can be used
- stt=deepgram.STT(model="nova-3"),
- llm=openai.LLM(model="gpt-4o-mini"),
- tts=elevenlabs.TTS(),
- )
-
- await session.start(agent=agent, room=ctx.room)
- await session.generate_reply(instructions="greet the user and ask about their day")
-
-
-if __name__ == "__main__":
- cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
+def classify_interruption(
+ agent_state: Literal["SPEAKING", "SILENT"],
+ user_transcript: str,
+) -> InterruptionDecision:
+ """Classify user input as INTERRUPT, IGNORE, or RESPOND."""
+ # Rule 1: Check for interrupt commands (highest priority)
+ # Rule 2: If speaking + passive acknowledgement → IGNORE
+ # Rule 3: If silent + passive acknowledgement → RESPOND
+ # Rule 4: If silent + normal input → RESPOND
```
-You'll need the following environment variables for this example:
-
-- DEEPGRAM_API_KEY
-- OPENAI_API_KEY
-- ELEVEN_API_KEY
+### Integration Points
+
+The logic is integrated into the agent's audio activity handling at three key points:
+
+1. **`_interrupt_by_audio_activity()`** - Prevents premature pauses when VAD fires before transcript is available
+2. **`on_interim_transcript()`** - Checks interim transcripts and resumes if false interruption detected
+3. **`on_final_transcript()`** - Validates final transcripts and prevents interruption for passive acknowledgements
+4. **`on_end_of_turn()`** - Prevents passive acknowledgements from being processed as new user turns
+
+### Key Features
+
+- **Transcript-first approach**: When agent is speaking, waits for transcript before pausing
+- **False interruption recovery**: Automatically resumes if pause occurred before transcript validation
+- **Turn prevention**: Prevents passive acknowledgements from triggering new LLM responses
+- **Punctuation handling**: Normalizes transcripts to handle variations like "yeah!", "ok?", "hmm."
+
+## How to Run the Agent
+
+### Prerequisites
+
+1. **Python 3.12+** installed
+2. **Environment variables** set:
+ ```bash
+ # LiveKit credentials
+ LIVEKIT_URL=wss://your-livekit-server.com
+ LIVEKIT_API_KEY=your-api-key
+ LIVEKIT_API_SECRET=your-api-secret
+
+ # Groq API (for LLM and STT)
+ GROQ_API_KEY=your-groq-api-key
+
+ # Cartesia API (for TTS)
+ CARTESIA_API_KEY=your-cartesia-api-key
+ ```
+
+3. **Install dependencies**:
+ ```bash
+ # Install workspace packages in editable mode
+ uv pip install -e livekit-agents
+ uv pip install -e livekit-plugins/livekit-plugins-groq
+ uv pip install -e livekit-plugins/livekit-plugins-silero
+ uv pip install -e livekit-plugins/livekit-plugins-turn-detector
+ uv pip install -e livekit-plugins/livekit-plugins-cartesia
+ ```
+
+### Running the Agent
+
+1. **Navigate to the examples directory**:
+ ```bash
+ cd examples/voice_agents
+ ```
+
+2. **Start the agent in development mode**:
+ ```bash
+ python basic_agent.py dev
+ ```
+
+ This will:
+ - Start the agent server with hot reloading
+ - Connect to your LiveKit server
+ - Enable the interruption logic automatically
+
+3. **Connect using LiveKit Playground**:
+ - Visit https://agents-playground.livekit.io/
+ - Enter your LiveKit credentials
+ - Start a conversation with the agent
+
+### Testing the Interruption Logic
+
+#### Test Case 1: Ignore "yeah" while agent is speaking
+1. Start a conversation with the agent
+2. Wait for the agent to start speaking
+3. While the agent is speaking, say "yeah" or "ok"
+4. **Expected**: Agent continues speaking without pause or interruption
+
+#### Test Case 2: Respond to "yeah" when agent is silent
+1. Start a conversation with the agent
+2. Wait for the agent to finish speaking (agent becomes silent)
+3. Say "yeah" or "ok"
+4. **Expected**: Agent processes this as a new user turn and responds
+
+#### Test Case 3: Stop for "stop" command
+1. Start a conversation with the agent
+2. While the agent is speaking, say "stop" or "wait"
+3. **Expected**: Agent immediately stops speaking
+
+#### Test Case 4: Mixed commands
+1. While agent is speaking, say "yeah wait a second"
+2. **Expected**: Agent interrupts (because "wait" is an interrupt command)
+
+## Code Structure
-### Multi-agent handoff
-
----
-
-This code snippet is abbreviated. For the full example, see [multi_agent.py](examples/voice_agents/multi_agent.py)
-
-```python
-...
-class IntroAgent(Agent):
- def __init__(self) -> None:
- super().__init__(
- instructions=f"You are a story teller. Your goal is to gather a few pieces of information from the user to make the story personalized and engaging."
- "Ask the user for their name and where they are from"
- )
-
- async def on_enter(self):
- self.session.generate_reply(instructions="greet the user and gather information")
-
- @function_tool
- async def information_gathered(
- self,
- context: RunContext,
- name: str,
- location: str,
- ):
- """Called when the user has provided the information needed to make the story personalized and engaging.
-
- Args:
- name: The name of the user
- location: The location of the user
- """
-
- context.userdata.name = name
- context.userdata.location = location
-
- story_agent = StoryAgent(name, location)
- return story_agent, "Let's start the story!"
-
-
-class StoryAgent(Agent):
- def __init__(self, name: str, location: str) -> None:
- super().__init__(
- instructions=f"You are a storyteller. Use the user's information in order to make the story personalized."
- f"The user's name is {name}, from {location}"
- # override the default model, switching to Realtime API from standard LLMs
- llm=openai.realtime.RealtimeModel(voice="echo"),
- chat_ctx=chat_ctx,
- )
-
- async def on_enter(self):
- self.session.generate_reply()
-
-
-async def entrypoint(ctx: JobContext):
- await ctx.connect()
-
- userdata = StoryData()
- session = AgentSession[StoryData](
- vad=silero.VAD.load(),
- stt=deepgram.STT(model="nova-3"),
- llm=openai.LLM(model="gpt-4o-mini"),
- tts=openai.TTS(voice="echo"),
- userdata=userdata,
- )
-
- await session.start(
- agent=IntroAgent(),
- room=ctx.room,
- )
-...
+```
+livekit-agents/
+└── livekit/
+ └── agents/
+ └── voice/
+ ├── interruption_logic.py # Core classification logic
+ └── agent_activity.py # Integration with agent lifecycle
```
-### Testing
+## Technical Constraints
-Automated tests are essential for building reliable agents, especially with the non-deterministic behavior of LLMs. LiveKit Agents include native test integration to help you create dependable agents.
+- **Latency**: Decisions are made in real-time with no perceptible delay
+- **VAD Independence**: Logic operates above VAD layer, VAD kernel is not modified
+- **Transcript Validation**: System validates final STT transcript before committing to interruption
+- **False Start Handling**: Supports "false start" scenarios where VAD fires but transcript resolves to passive acknowledgement
-```python
-@pytest.mark.asyncio
-async def test_no_availability() -> None:
- llm = google.LLM()
- async AgentSession(llm=llm) as sess:
- await sess.start(MyAgent())
- result = await sess.run(
- user_input="Hello, I need to place an order."
- )
- result.expect.skip_next_event_if(type="message", role="assistant")
- result.expect.next_event().is_function_call(name="start_order")
- result.expect.next_event().is_function_call_output()
- await (
- result.expect.next_event()
- .is_message(role="assistant")
- .judge(llm, intent="assistant should be asking the user what they would like")
- )
+## Example Scenarios
+### Scenario 1: Natural Backchanneling
```
-
-## Examples
-
-
-🎙️ Starter Agent-A starter agent optimized for voice conversations. --Code - - |
-
-🔄 Multi-user push to talk-Responds to multiple users in the room via push-to-talk. --Code - - |
-
-🎵 Background audio-Background ambient and thinking audio to improve realism. --Code - - |
-
-🛠️ Dynamic tool creation-Creating function tools dynamically. --Code - - |
-
-☎️ Outbound caller-Agent that makes outbound phone calls --Code - - |
-
-📋 Structured output-Using structured output from LLM to guide TTS tone. --Code - - |
-
-🔌 MCP support-Use tools from MCP servers --Code - - |
-
-💬 Text-only agent-Skip voice altogether and use the same code for text-only integrations --Code - - |
-
-📝 Multi-user transcriber-Produce transcriptions from all users in the room --Code - - |
-
-🎥 Video avatars-Add an AI avatar with Tavus, Beyond Presence, and Bithuman --Code - - |
-
-🍽️ Restaurant ordering and reservations-Full example of an agent that handles calls for a restaurant. --Code - - |
-
-👁️ Gemini Live vision-Full example (including iOS app) of Gemini Live agent that can see. --Code - - |
-
| LiveKit Ecosystem | |
|---|---|
| LiveKit SDKs | Browser · iOS/macOS/visionOS · Android · Flutter · React Native · Rust · Node.js · Python · Unity · Unity (WebGL) · ESP32 |
| Server APIs | Node.js · Golang · Ruby · Java/Kotlin · Python · Rust · PHP (community) · .NET (community) |
| UI Components | React · Android Compose · SwiftUI · Flutter |
| Agents Frameworks | Python · Node.js · Playground |
| Services | LiveKit server · Egress · Ingress · SIP |
| Resources | Docs · Example apps · Cloud · Self-hosting · CLI |