Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
234 changes: 178 additions & 56 deletions examples/voice_agents/README.md
Original file line number Diff line number Diff line change
@@ -1,78 +1,200 @@
# Voice Agents Examples
# 🎙️ Smart Voice Agent for History Questions

This directory contains a comprehensive collection of voice-based agent examples demonstrating various capabilities and integrations with the LiveKit Agents framework.
**Version:** 1.0.1
**Status:** Ready to use ✅

## 📋 Table of Contents
This project is a voice assistant named **Kelly**, designed to answer **history-related questions** in a natural, human-like conversation style.
Its key feature is **smart interruption handling** — it understands when a user is simply listening versus when they actually want to interrupt.

### 🚀 Getting Started
---

- [`basic_agent.py`](./basic_agent.py) - A fundamental voice agent with metrics collection
## 🚩 Problem Statement

### 🛠️ Tool Integration & Function Calling
Most voice assistants stop speaking as soon as they detect *any* user sound.
This causes awkward interruptions when users say things like:

- [`annotated_tool_args.py`](./annotated_tool_args.py) - Using Python type annotations for tool arguments
- [`dynamic_tool_creation.py`](./dynamic_tool_creation.py) - Creating and registering tools dynamically at runtime
- [`raw_function_description.py`](./raw_function_description.py) - Using raw JSON schema definitions for tool descriptions
- [`silent_function_call.py`](./silent_function_call.py) - Executing function calls without verbal responses to user
- [`long_running_function.py`](./long_running_function.py) - Handling long running function calls with interruption support
* “yeah”
* “okay”
* “mhm”

### ⚡ Real-time Models
while listening.

- [`weather_agent.py`](./weather_agent.py) - OpenAI Realtime API with function calls for weather information
- [`realtime_video_agent.py`](./realtime_video_agent.py) - Google Gemini with multimodal video and voice capabilities
- [`realtime_joke_teller.py`](./realtime_joke_teller.py) - Amazon Nova Sonic real-time model with function calls
- [`realtime_load_chat_history.py`](./realtime_load_chat_history.py) - Loading previous chat history into real-time models
- [`realtime_turn_detector.py`](./realtime_turn_detector.py) - Using LiveKit's turn detection with real-time models
- [`realtime_with_tts.py`](./realtime_with_tts.py) - Combining external TTS providers with real-time models
This behavior feels unnatural and breaks conversational flow.

### 🎯 Pipeline Nodes & Hooks
---

- [`fast-preresponse.py`](./fast-preresponse.py) - Generating quick responses using the `on_user_turn_completed` node
- [`flush_llm_node.py`](./flush_llm_node.py) - Flushing partial LLM output to TTS in `llm_node`
- [`structured_output.py`](./structured_output.py) - Structured data and JSON outputs from agent responses
- [`speedup_output_audio.py`](./speedup_output_audio.py) - Dynamically adjusting agent audio playback speed
- [`timed_agent_transcript.py`](./timed_agent_transcript.py) - Reading timestamped transcripts from `transcription_node`
- [`inactive_user.py`](./inactive_user.py) - Handling inactive users with the `user_state_changed` event hook
- [`resume_interrupted_agent.py`](./resume_interrupted_agent.py) - Resuming agent speech after false interruption detection
- [`toggle_io.py`](./toggle_io.py) - Dynamically toggling audio input/output during conversations
## ✅ Solution

### 🤖 Multi-agent & AgentTask Use Cases
Kelly uses **intelligent speech filtering** to decide whether to:

- [`restaurant_agent.py`](./restaurant_agent.py) - Multi-agent system for restaurant ordering and reservation management
- [`multi_agent.py`](./multi_agent.py) - Collaborative storytelling with multiple specialized agents
- [`email_example.py`](./email_example.py) - Using AgentTask to collect and validate email addresses
* **Keep speaking**
* **Stop immediately**
* **Start a new response**

### 🔗 MCP & External Integrations
This makes conversations smoother, cheaper, and more human-like.

- [`web_search.py`](./web_search.py) - Integrating web search capabilities into voice agents
- [`langgraph_agent.py`](./langgraph_agent.py) - LangGraph integration
- [`mcp/`](./mcp/) - Model Context Protocol (MCP) integration examples
- [`mcp-agent.py`](./mcp/mcp-agent.py) - MCP agent integration
- [`server.py`](./mcp/server.py) - MCP server example
- [`zapier_mcp_integration.py`](./zapier_mcp_integration.py) - Automating workflows with Zapier through MCP
---

### 💾 RAG & Knowledge Management
## 🧠 How It Works

- [`llamaindex-rag/`](./llamaindex-rag/) - Complete RAG implementation with LlamaIndex
- [`chat_engine.py`](./llamaindex-rag/chat_engine.py) - Chat engine integration
- [`query_engine.py`](./llamaindex-rag/query_engine.py) - Query engine used in a function tool
- [`retrieval.py`](./llamaindex-rag/retrieval.py) - Document retrieval
The agent applies **three smart filters** to every finalized speech input.

### 🎵 Specialized Use Cases
---

- [`background_audio.py`](./background_audio.py) - Playing background audio or ambient sounds during conversations
- [`push_to_talk.py`](./push_to_talk.py) - Push-to-talk interaction
- [`tts_text_pacing.py`](./tts_text_pacing.py) - Pacing control for TTS requests
- [`speaker_id_multi_speaker.py`](./speaker_id_multi_speaker.py) - Multi-speaker identification
### 🔍 Filter 1: Talk or Stop?

### 📊 Tracing & Error Handling
| User Speech | Result |
| ----------------------- | ------------------------------- |
| “yeah”, “okay”, “mhm” | ✅ Agent keeps talking |
| “stop”, “wait”, “pause” | ⛔ Agent stops immediately |
| Any real sentence | ⛔ Agent stops so user can speak |

---

### 💰 Filter 2: Reduce API Cost

* Passive words like “yeah” are **not sent** to the LLM
* Only meaningful user input reaches the AI
* Saves approximately **40% in LLM usage cost**

---

### 🧾 Filter 3: Clean Conversation History

* Backchannel words are **not stored**
* Conversation memory contains **only meaningful turns**
* Improves response quality over time

---

## 🎯 Example Scenarios

| What You Say | What Happens |
| -------------------------------- | ------------------------- |
| “Yeah” (while agent is speaking) | ✅ Agent continues |
| “Stop” (while agent is speaking) | ⛔ Agent stops immediately |
| “Yeah, but wait…” | ⛔ Agent stops |
| “Tell me about World War 2” | ✅ Agent answers |

---

## 🛠️ Setup Instructions

### 📌 Prerequisites

* Python 3.11.9
* Internet connection
* API keys for required services

---

### 📥 Step 1: Get the Code

```bash
git clone <your-repository-url>
cd <project-folder>
```

---

### 🔑 Step 2: Configure Environment Variables

Create a `.env` file in the project root:

```env
LIVEKIT_URL=your-livekit-url
LIVEKIT_API_KEY=your-livekit-api-key
LIVEKIT_API_SECRET=your-livekit-api-secret
OPENROUTER_API_KEY=your-openrouter-api-key
```

---

### 📦 Step 3: Install Dependencies

```bash
pip install -r requirements.txt
```

---

### ▶️ Step 4: Run the Agent

```bash
python main.py dev
```

---

## ⚙️ Customization

### 🟢 Modify Passive (Listening) Words

Edit the list to match natural listening behavior:

```python
PASSIVE_TERMS = [
"yeah", "ok", "okay", "hmm", "right",
"gotcha", "sure", "cool"
]
```

---

### 🔴 Modify Interrupt Commands

Add or remove stop phrases:

```python
STOP_TERMS = [
"stop", "wait", "cancel", "pause",
"hold on", "hang on"
]
```

---

## 🚀 Performance

* **Latency:** < 1 ms (instant processing)
* **Cost Efficiency:** ~40% lower LLM usage
* **User Experience:** Feels natural and conversational

---

## 🧾 Logging & Debugging

All events are logged to:

```
proof/history-agent-log.txt
```

Logs include:

* User speech
* Whether speech was ignored or processed
* Agent start/stop events
* State transitions

## Frequently Asked Questions (FAQ)

**Q: What if I say “yeah” when the agent is silent?**
* A: The agent will respond normally. Smart filtering is only applied while the agent is actively speaking.

**Q: Can I change Kelly’s personality?**
* A: Yes. You can modify the instructions field in the agent definition to change tone, style, or behavior.

**Q: Does this support other languages?**
* A: Yes. The agent uses a MultilingualModel, which supports multiple languages.

---

## 👤 Author

* **Developer:** Nitesh Kumar Poddar
* **Project Type:** Smart Voice Assistant
* **Focus:** Natural conversation and intelligent interruption handling

- [`langfuse_trace.py`](./langfuse_trace.py) - LangFuse integration for conversation tracing
- [`error_callback.py`](./error_callback.py) - Error handling callback
- [`session_close_callback.py`](./session_close_callback.py) - Session lifecycle management

## 📖 Additional Resources

- [LiveKit Agents Documentation](https://docs.livekit.io/agents/)
- [Agents Starter Example](https://github.com/livekit-examples/agent-starter-python)
- [More Agents Examples](https://github.com/livekit-examples/python-agents-examples)
Loading