Skip to content

Latest commit

Β 

History

History
130 lines (99 loc) Β· 3.58 KB

File metadata and controls

130 lines (99 loc) Β· 3.58 KB

AI Voice Conversation App

Real-time AI voice conversation application using OpenAI Whisper, GPT-4o-mini, and TTS with LiveKit for WebRTC audio streaming.

Features

  • 🎀 Voice Activity Detection - Automatically detects when you start and stop speaking
  • πŸ—£οΈ Real-time Transcription - Converts speech to text using OpenAI Whisper
  • πŸ€– AI Responses - Generates intelligent responses using GPT-4o-mini
  • πŸ”Š Text-to-Speech - Plays AI responses with high-quality voice synthesis
  • ⏱️ 2-Second Silence Detection - Automatically sends audio after you finish speaking
  • πŸ”„ Sequential Flow - Prevents overlapping conversations for natural interaction

Tech Stack

Backend:

  • FastAPI
  • OpenAI API (Whisper, GPT-4o-mini, TTS-1-HD)
  • LiveKit
  • Python 3.11+

Frontend:

  • Next.js 15
  • TypeScript
  • LiveKit Client
  • Tailwind CSS

Quick Start

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • OpenAI API key
  • LiveKit credentials (optional, for production)

1. Backend Setup

cd backend
python -m venv venv
venv\Scripts\activate  # Windows
pip install -r requirements.txt

Create backend/.env:

OPENAI_API_KEY=sk-your-key-here
LIVEKIT_API_KEY=your-key
LIVEKIT_API_SECRET=your-secret
LIVEKIT_URL=ws://localhost:7880

Run backend:

uvicorn main:app --reload --port 8000

2. Frontend Setup

cd frontend
npm install

Create frontend/.env.local:

NEXT_PUBLIC_BACKEND_URL=http://localhost:8000

Run frontend:

npm run dev

Open http://localhost:3000

Documentation

πŸ“– Backend Documentation - API endpoints, configuration, dependencies

πŸ“– Frontend Documentation - Components, hooks, architecture

πŸ“– AI Prompts - AI tools and prompts used during development

Project Structure

skill.io/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ main.py              # FastAPI app with endpoints
β”‚   β”œβ”€β”€ ai_handler.py        # OpenAI integrations
β”‚   β”œβ”€β”€ requirements.txt     # Python dependencies
β”‚   └── README.md           # Backend documentation
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ app/                # Next.js app directory
β”‚   β”œβ”€β”€ components/         # React components
β”‚   β”œβ”€β”€ hooks/              # Custom hooks
β”‚   └── README.md          # Frontend documentation
β”œβ”€β”€ README.md              # This file
└── AI_PROMPTS.md         # AI assistance documentation

How It Works

  1. User speaks β†’ Frontend records audio continuously
  2. 2 seconds of silence β†’ Recording stops, audio sent to backend
  3. Backend transcribes β†’ OpenAI Whisper converts speech to text
  4. Text displayed β†’ User sees transcription immediately
  5. AI generates response β†’ GPT-4o-mini creates reply
  6. Response played β†’ TTS converts text to speech and plays audio
  7. Ready for next question β†’ Cycle repeats

API Endpoints

  • GET / - Health check
  • POST /token - Generate LiveKit access token
  • POST /transcribe - Transcribe audio to text
  • POST /respond - Generate AI response with audio

🎬 Demo

Live Application Demo:
πŸ”— Watch Full Demo - Complete walkthrough of the AI voice conversation app

Additional Demo:
πŸ”— Extended Demo - Additional features and functionality

What can be improve

  • Enhance voice activity detection accuracy with best / custom model
  • When user speak, live transcribed text using streaming / sophisticated technology