OpenAI-compatible TTS API built with FastAPI and the great (and fast and tiny) KittenTTS.
Using kitten-tts-mini (80M) model by default, the model used can be configured via environment variable. Model is automatically downloaded and loaded on startup.
- OpenAI-compatible /v1/audio/speech endpoint
- Multiple output formats: WAV, MP3, OGG
# Pull the image
docker pull ghcr.io/ktos/kittentts-api:latest
# Run the container
docker run -d -p 8000:8000 ghcr.io/ktos/kittentts-api:latestcurl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1",
"input": "Hello world!",
"voice": "Jasper",
"response_format": "mp3"
}' \
-o output.mp3from openai import OpenAI
client = OpenAI(
api_key="dummy",
base_url="http://localhost:8000/v1"
)
response = client.audio.speech.create(
model="tts-1",
input="Hello world!",
voice="Jasper",
response_format="mp3"
)
with open("output.mp3", "wb") as f:
f.write(response.content)| Endpoint | Description |
|---|---|
| POST /v1/audio/speech | Generate speech audio |
| GET /v1/models | List available models (tts-1 for compatibility) |
You can specify the KittenTTS model to use by setting the KITTEN_TTS_MODEL environment variable. Available models:
KittenML/kitten-tts-mini-0.8(default, 80M parameters)KittenML/kitten-tts-micro-0.8KittenML/kitten-tts-nano-0.8KittenML/kitten-tts-nano-0.8-int8
Example:
docker run -e KITTEN_TTS_MODEL=KittenML/kitten-tts-nano-0.8 -p 8000:8000 kittentts-api# Install dependencies
pip install -r requirements.txt
# Install ffmpeg (required for MP3/OGG)
# Ubuntu/Debian:
sudo apt install ffmpeg
# macOS:
brew install ffmpeg
# Windows:
winget install ffmpeg
# Run the server
uvicorn main:app --reloaddocker build -t kittentts-api .AGPLv3