AI Voice Conversation App

Real-time AI voice conversation application using OpenAI Whisper, GPT-4o-mini, and TTS with LiveKit for WebRTC audio streaming.

Features

🎤 Voice Activity Detection - Automatically detects when you start and stop speaking
🗣️ Real-time Transcription - Converts speech to text using OpenAI Whisper
🤖 AI Responses - Generates intelligent responses using GPT-4o-mini
🔊 Text-to-Speech - Plays AI responses with high-quality voice synthesis
⏱️ 2-Second Silence Detection - Automatically sends audio after you finish speaking
🔄 Sequential Flow - Prevents overlapping conversations for natural interaction

Tech Stack

Backend:

FastAPI
OpenAI API (Whisper, GPT-4o-mini, TTS-1-HD)
LiveKit
Python 3.11+

Frontend:

Next.js 15
TypeScript
LiveKit Client
Tailwind CSS

Quick Start

Prerequisites

Python 3.11+
Node.js 18+
OpenAI API key
LiveKit credentials (optional, for production)

1. Backend Setup

cd backend
python -m venv venv
venv\Scripts\activate  # Windows
pip install -r requirements.txt

Create backend/.env:

OPENAI_API_KEY=sk-your-key-here
LIVEKIT_API_KEY=your-key
LIVEKIT_API_SECRET=your-secret
LIVEKIT_URL=ws://localhost:7880

Run backend:

uvicorn main:app --reload --port 8000

2. Frontend Setup

cd frontend
npm install

Create frontend/.env.local:

NEXT_PUBLIC_BACKEND_URL=http://localhost:8000

Run frontend:

npm run dev

Open http://localhost:3000

Documentation

📖 Backend Documentation - API endpoints, configuration, dependencies

📖 Frontend Documentation - Components, hooks, architecture

📖 AI Prompts - AI tools and prompts used during development

Project Structure

skill.io/
├── backend/
│   ├── main.py              # FastAPI app with endpoints
│   ├── ai_handler.py        # OpenAI integrations
│   ├── requirements.txt     # Python dependencies
│   └── README.md           # Backend documentation
├── frontend/
│   ├── app/                # Next.js app directory
│   ├── components/         # React components
│   ├── hooks/              # Custom hooks
│   └── README.md          # Frontend documentation
├── README.md              # This file
└── AI_PROMPTS.md         # AI assistance documentation

How It Works

User speaks → Frontend records audio continuously
2 seconds of silence → Recording stops, audio sent to backend
Backend transcribes → OpenAI Whisper converts speech to text
Text displayed → User sees transcription immediately
AI generates response → GPT-4o-mini creates reply
Response played → TTS converts text to speech and plays audio
Ready for next question → Cycle repeats

API Endpoints

GET / - Health check
POST /token - Generate LiveKit access token
POST /transcribe - Transcribe audio to text
POST /respond - Generate AI response with audio

🎬 Demo

Live Application Demo:
🔗 Watch Full Demo - Complete walkthrough of the AI voice conversation app

Additional Demo:
🔗 Extended Demo - Additional features and functionality

What can be improve

Enhance voice activity detection accuracy with best / custom model
When user speak, live transcribed text using streaming / sophisticated technology

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AI Voice Conversation App

Features

Tech Stack

Quick Start

Prerequisites

1. Backend Setup

2. Frontend Setup

Documentation

Project Structure

How It Works

API Endpoints

🎬 Demo

What can be improve

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

AI Voice Conversation App

Features

Tech Stack

Quick Start

Prerequisites

1. Backend Setup

2. Frontend Setup

Documentation

Project Structure

How It Works

API Endpoints

🎬 Demo

What can be improve