Real-time AI voice conversation application using OpenAI Whisper, GPT-4o-mini, and TTS with LiveKit for WebRTC audio streaming.
- π€ Voice Activity Detection - Automatically detects when you start and stop speaking
- π£οΈ Real-time Transcription - Converts speech to text using OpenAI Whisper
- π€ AI Responses - Generates intelligent responses using GPT-4o-mini
- π Text-to-Speech - Plays AI responses with high-quality voice synthesis
- β±οΈ 2-Second Silence Detection - Automatically sends audio after you finish speaking
- π Sequential Flow - Prevents overlapping conversations for natural interaction
Backend:
- FastAPI
- OpenAI API (Whisper, GPT-4o-mini, TTS-1-HD)
- LiveKit
- Python 3.11+
Frontend:
- Next.js 15
- TypeScript
- LiveKit Client
- Tailwind CSS
- Python 3.11+
- Node.js 18+
- OpenAI API key
- LiveKit credentials (optional, for production)
cd backend
python -m venv venv
venv\Scripts\activate # Windows
pip install -r requirements.txtCreate backend/.env:
OPENAI_API_KEY=sk-your-key-here
LIVEKIT_API_KEY=your-key
LIVEKIT_API_SECRET=your-secret
LIVEKIT_URL=ws://localhost:7880
Run backend:
uvicorn main:app --reload --port 8000cd frontend
npm installCreate frontend/.env.local:
NEXT_PUBLIC_BACKEND_URL=http://localhost:8000
Run frontend:
npm run devπ Backend Documentation - API endpoints, configuration, dependencies
π Frontend Documentation - Components, hooks, architecture
π AI Prompts - AI tools and prompts used during development
skill.io/
βββ backend/
β βββ main.py # FastAPI app with endpoints
β βββ ai_handler.py # OpenAI integrations
β βββ requirements.txt # Python dependencies
β βββ README.md # Backend documentation
βββ frontend/
β βββ app/ # Next.js app directory
β βββ components/ # React components
β βββ hooks/ # Custom hooks
β βββ README.md # Frontend documentation
βββ README.md # This file
βββ AI_PROMPTS.md # AI assistance documentation
- User speaks β Frontend records audio continuously
- 2 seconds of silence β Recording stops, audio sent to backend
- Backend transcribes β OpenAI Whisper converts speech to text
- Text displayed β User sees transcription immediately
- AI generates response β GPT-4o-mini creates reply
- Response played β TTS converts text to speech and plays audio
- Ready for next question β Cycle repeats
GET /- Health checkPOST /token- Generate LiveKit access tokenPOST /transcribe- Transcribe audio to textPOST /respond- Generate AI response with audio
Live Application Demo:
π Watch Full Demo - Complete walkthrough of the AI voice conversation app
Additional Demo:
π Extended Demo - Additional features and functionality
- Enhance voice activity detection accuracy with best / custom model
- When user speak, live transcribed text using streaming / sophisticated technology