FastAPI backend for real-time AI voice conversations using LiveKit and OpenAI.
python -m venv venv
venv\Scripts\activate # Windows
pip install -r requirements.txtCreate .env file:
LIVEKIT_API_KEY=your_livekit_key
LIVEKIT_API_SECRET=your_livekit_secret
LIVEKIT_URL=ws://localhost:7880
OPENAI_API_KEY=sk-your-openai-key
uvicorn main:app --reload --port 8000API will be available at http://localhost:8000
Health check endpoint.
Response:
{
"status": "ok",
"ai_enabled": true,
"livekit_configured": true
}Generate LiveKit access token for room connection.
Request:
{
"room_name": "voice-conversation",
"participant_name": "user-123"
}Response:
{
"token": "...",
"url": "ws://localhost:7880"
}Transcribe audio to text using OpenAI Whisper.
Request:
- Content-Type:
multipart/form-data - Body: Audio file (webm format)
Response:
{
"text": "Hello, how are you?"
}Notes:
- Returns empty text if audio is too small (< 5000 bytes)
Generate AI response with audio using GPT-4o-mini and OpenAI TTS.
Request:
{
"text": "Hello, how are you?"
}Response:
{
"response_text": "I'm doing great, thank you for asking! How can I help you today?",
"audio": "base64_encoded_audio_data..."
}Notes:
- Audio is returned as base64-encoded MP3
- Uses GPT-4o-mini for response generation
- Uses TTS-1-HD for high-quality speech synthesis
All endpoints return standard HTTP error codes:
400- Bad request (invalid input)500- Server error (OpenAI/LiveKit not configured or processing failed)
- FastAPI - Web framework
- OpenAI - AI services (Whisper, GPT-4o-mini, TTS)
- LiveKit - WebRTC infrastructure
- python-dotenv - Environment variable management