Self-host the ultra-lightweight KittenTTS model with this enhanced API server. Now supports all 7 KittenTTS models β from the tiny 15M Nano to the 80M Mini β with hot-swappable model switching, an intuitive Web UI, a flexible API, large text processing for audiobooks, and high-performance GPU acceleration.
This server provides a robust, user-friendly, and powerful interface for the KittenTTS engine family, an open-source, realistic text-to-speech system. This project significantly enhances the original model by adding a full-featured server, an easy-to-use UI, and an optimized inference pipeline for hardware ranging from NVIDIA GPUs to CPUs and even the Raspberry Pi 5.
- Added full support for all 7 KittenTTS models across three model sizes (Nano, Micro, Mini) and two generations (v0.1/v0.2 and v0.8).
- Models range from the ultra-compact 15M-parameter Nano to the high-quality 80M-parameter Mini, all running on ONNX for maximum portability.
- v0.8 models feature improved expressivity, quantized INT8 variants for minimal footprint, and named voices (Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo).
- Added a new model selector dropdown at the top of the Web UI.
- All 7 models are hot-swappable β select from the dropdown, click "Apply & Restart", and the backend automatically downloads (if needed), unloads the current model, and loads your choice. No server restart required.
- A progress modal with real-time status updates shows download and loading progress, so you always know what's happening.
- Models are downloaded automatically from Hugging Face on first use and cached locally in the project's
model_cachedirectory for instant subsequent loads. - Cancellation support β if you change your mind during a download, select a different model and the current load is cancelled automatically.
- All models now use human-friendly voice names instead of technical identifiers.
- v0.1/v0.2 models: Amber, Felix, Clara, Marcus, Ivy, Oscar, Nora, Reed (4 female, 4 male).
- v0.8 models: Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo (4 female, 4 male).
- The voice dropdown automatically updates when you switch models.
You now have access to the entire KittenTTS family:
| Model | Parameters | Size | Voices | Notes |
|---|---|---|---|---|
| Nano 0.1 | 15M | <25MB | Amber, Felix, Clara, Marcus, Ivy, Oscar, Nora, Reed | Original release |
| Nano 0.2 | 15M | <25MB | Amber, Felix, Clara, Marcus, Ivy, Oscar, Nora, Reed | Developer preview |
| Mini 0.1 | 80M | ~170MB | Amber, Felix, Clara, Marcus, Ivy, Oscar, Nora, Reed | Highest quality (v0.1) |
| Nano 0.8 INT8 | 15M | ~25MB | Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo | Quantized, SOTA expressivity |
| Nano 0.8 FP32 | 15M | ~50MB | Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo | Full precision |
| Micro 0.8 | 40M | ~40MB | Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo | Balanced quality/speed |
| Mini 0.8 | 80M | ~79MB | Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo | Highest quality |
Switching models is effortless: Simply select your preferred model from the dropdown at the top of the Web UI. The server handles downloading, loading, and voice list updates automatically β just click and go.
The KittenTTS model by KittenML provides a foundation for generating high-quality speech from models smaller than 80MB. This project elevates that foundation into a production-ready service by providing a robust FastAPI server that makes KittenTTS significantly easier to use, more powerful, and drastically faster.
We solve the complexity of setting up and running the model by offering:
- A modern Web UI for easy experimentation, preset loading, and speed adjustment.
- Hot-swappable model switching between all 7 KittenTTS models with automatic download.
- True GPU Acceleration for NVIDIA GPUs, a feature not present in the original implementation.
- Large Text Handling & Audiobook Generation: Intelligently splits long texts into manageable chunks, processes them sequentially, and seamlessly concatenates the audio. Perfect for creating complete audiobooks.
- A flexible, dual-API system including a simple endpoint and an OpenAI-compatible endpoint for easy integration.
- Named Voices: Up to 8 named voices per model for consistent and reliable output.
- Cross-platform support for Windows and Linux, with clear setup instructions.
- Docker support for easy, reproducible containerized deployment.
The ultra-lightweight nature of the KittenTTS model and the efficiency of this server make it a perfect candidate for running on single-board computers (SBCs) and other edge devices.
- β Raspberry Pi 5 (RP5): Confirmed to run with excellent performance. The server is fast and responsive, easily handling requests from other devices on the same local network (LAN). This makes it ideal for local network services, home automation, and other DIY projects.
To install, simply follow the standard Linux installation guide provided in this README.
A standout feature of this server is the implementation of high-performance GPU acceleration, a capability not available in the original KittenTTS project. While the base model is CPU-only, this server unlocks the full potential of your hardware.
- Optimized ONNX Runtime Pipeline: We leverage
onnxruntime-gputo move the entire inference process to your NVIDIA graphics card. - Eliminated I/O Bottlenecks: The server uses advanced I/O Binding. This technique pre-allocates memory directly on the GPU for both model inputs and outputs, drastically reducing the latency caused by copying data between system RAM and the GPU's VRAM.
- True Performance Gains: This isn't just running the model on the GPU; it's an optimized pipeline designed to minimize latency and maximize throughput, making real-time generation significantly faster than on CPU.
This enhancement transforms KittenTTS from a lightweight-but-modest engine into a high-speed synthesis powerhouse.
The KittenTTS model serves as an excellent alternative to Piper TTS for fast generation on limited compute and edge devices like Raspberry Pi 5.
KittenTTS Model Advantages:
- Extreme Efficiency: 15M to 80M parameters, with the smallest models under 25MB β significantly smaller than most Piper models
- Universal Compatibility: CPU-optimized to run without GPU on any device and "works literally everywhere"
- Real-time Performance: Optimized for real-time speech synthesis even on resource-constrained hardware
- Multiple Model Sizes: Choose the right balance of quality vs speed for your use case
This Server Project's Enhancement: While KittenTTS provides the ultra-lightweight foundation, this server transforms it into a production-ready Piper replacement by adding GPU acceleration (unavailable in the base model), hot-swappable multi-model support, modern REST/OpenAI APIs, audiobook processing capabilities, and an intuitive web interface β all while maintaining the model's edge device compatibility.
Perfect for users seeking Piper's offline capabilities with better performance on limited hardware and modern server infrastructure.
- π Multi-Model Support: All 7 KittenTTS models (Nano, Micro, Mini across v0.1/v0.2/v0.8) with hot-swappable switching from the UI.
- π¦ Automatic Model Download: Models are downloaded from Hugging Face on first use and cached locally.
- β‘ True GPU Acceleration: Full support for NVIDIA (CUDA) via an optimized
onnxruntime-gpupipeline with I/O Binding for maximum performance. - π Large Text & Audiobook Generation:
- Automatically handles long texts by intelligently splitting them based on sentence boundaries.
- Processes each chunk individually and seamlessly concatenates the resulting audio.
- Ideal for audiobooks - paste entire books and get professional-quality audio.
- π₯οΈ Modern Web Interface:
- Intuitive UI for text input, model selection, voice selection, and parameter adjustment.
- Real-time waveform visualization of generated audio.
- Progress modal for model downloads with real-time status updates.
- π€ Named Voices:
- Up to 8 named voices per model (4 male, 4 female).
- Voice list updates automatically when switching models.
- βοΈ Dual API Endpoints:
- A primary
/ttsendpoint offering full control over all generation parameters. - An OpenAI-compatible
/v1/audio/speechendpoint for seamless integration into existing workflows.
- A primary
- π§ Easy Configuration:
- All settings are managed through a single
config.yamlfile. - The server automatically creates a default config on the first run.
- All settings are managed through a single
- πΎ UI State Persistence: The web interface remembers your last-used text, voice, and settings to streamline your workflow.
- π³ Docker Support: Easy, reproducible deployment for both CPU and GPU via Docker Compose.
- Operating System: Windows 10/11 (64-bit) or Linux (Debian/Ubuntu recommended).
- Python: Version 3.10 or later.
- Git: For cloning the repository.
- eSpeak NG: This is a required dependency for text phonemization.
- Windows: See installation guide below.
- Linux:
sudo apt install espeak-ng
- Raspberry Pi:
- Raspberry Pi 5
- (For GPU Acceleration):
- An NVIDIA GPU with CUDA support.
- (For Linux Only):
libsndfile1: Audio library needed bysoundfile. Install viasudo apt install libsndfile1.ffmpeg: For robust audio operations. Install viasudo apt install ffmpeg.
This project uses specific dependency files and a clear process to ensure a smooth, one-command installation for your hardware.
1. Clone the Repository
git clone https://github.com/devnen/Kitten-TTS-Server.git
cd Kitten-TTS-Server2. Create and Activate a Python Virtual Environment This is crucial to avoid conflicts with other Python projects.
-
Windows (PowerShell):
python -m venv venv .\venv\Scripts\activateIf you see an error about execution policies, run:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUserand try activating again. -
Linux (Bash):
python3 -m venv venv source venv/bin/activateYour command prompt should now start with
(venv).
3. Install eSpeak NG (Required)
-
Windows:
- Download the installer from the eSpeak NG Releases page. Look for the file named
espeak-ng-X.XX-x64.msi. - Run the installer with default settings.
- Important: Restart your terminal (PowerShell/CMD) after installation for the changes to take effect.
- Download the installer from the eSpeak NG Releases page. Look for the file named
-
Linux (Ubuntu/Debian):
sudo apt update && sudo apt install -y espeak-ng
4. Install Python Dependencies
Choose one of the following paths based on your hardware.
This is the simplest path and works on any machine.
# Make sure your (venv) is active
pip install --upgrade pip
pip install -r requirements.txtThis method ensures all necessary CUDA libraries are correctly installed within your virtual environment for a hassle-free setup.
# Make sure your (venv) is active
pip install --upgrade pip
# Step 1: Install the GPU-enabled ONNX Runtime
pip install onnxruntime-gpu
# Step 2: Install PyTorch with CUDA support. This command also brings the
# necessary CUDA and cuDNN .dll files that onnxruntime-gpu needs.
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121
# Step 3: Install the remaining dependencies from the requirements file
pip install -r requirements-nvidia.txtAfter installation, verify that PyTorch can see your GPU:
python -c "import torch; print(f'PyTorch version: {torch.__version__}'); print(f'CUDA available: {torch.cuda.is_available()}'); print(f'Device name: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else None}')"If CUDA available: shows True, your setup is correct!
If you initially installed the server for CPU-only usage and now want to enable GPU acceleration, follow these steps to upgrade your environment safely.
# Make sure your (venv) is active
pip install --upgrade pip
# Step 1: Uninstall the CPU-only versions of onnxruntime and torch.
# This is critical to prevent conflicts with the GPU packages.
pip uninstall onnxruntime torch torchaudio -y
# Step 2: Install the GPU-enabled ONNX Runtime.
pip install onnxruntime-gpu
# Step 3: Install PyTorch with CUDA support. This command also brings the
# necessary CUDA and cuDNN .dll files that onnxruntime-gpu needs.
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121
# Step 4: Re-install from the nvidia requirements file to ensure all other
# dependencies are correct and up to date.
pip install -r requirements-nvidia.txtAfter upgrading, do the following:
-
Verify the installation by running the same check from Option 2:
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"The output must be
CUDA available: True. -
Update your configuration by editing the
config.yamlfile:tts_engine: device: auto # Or "cuda", or "gpu"
-
Restart the server for the changes to take effect. It will now use your NVIDIA GPU.
Important: First-Run Model Download The first time you start the server, it will automatically download the default KittenTTS Nano model (~25MB) from Hugging Face. This is a one-time process. Subsequent launches will be instant. Additional models are downloaded automatically when selected from the Web UI.
-
Activate the virtual environment (if not already active).
- Windows:
.\venv\Scripts\activate - Linux:
source venv/bin/activate
- Windows:
-
Run the server:
python server.py
-
The server will start and automatically open the Web UI in your default browser.
- Web UI:
http://localhost:8005 - API Docs:
http://localhost:8005/docs
- Web UI:
-
To stop the server: Press
CTRL+Cin the terminal.
KittenTTS runs excellently on Raspberry Pi 5, making it ideal for local network services and DIY projects.
Raspberry Pi 5 works out-of-the-box with the standard Linux installation guide above. No special steps required!
Tested Configuration:
- Hardware: Raspberry Pi 5 Model B Rev 1.0
- OS: Debian GNU/Linux 12 (bookworm) 64-bit
- Architecture: aarch64 (ARM64)
- Python: 3.11
- Memory: 4GB RAM
- Installation: Follow the standard Linux installation guide exactly
Installation Steps:
# Step 1: Install system dependencies
sudo apt update && sudo apt upgrade -y
sudo apt install -y espeak-ng libsndfile1 ffmpeg python3-pip python3-venv git
# Step 2: Set up Python environment
python -m venv venv
source venv/bin/activate
# Step 3: Install Python dependencies
pip install -r requirements.txt
# Step 4: Start the server
python server.pyImportant: During the
pip install -r requirements.txtstep, some Python packages (especially audio processing libraries likelibrosa,praat-parselmouth, and others) may need to be compiled from source on ARM architecture. This process can take 15-30 minutes depending on your SD card speed and system load. This is normal - let it complete without interruption.
Run Kitten-TTS-Server easily using Docker. The recommended method uses Docker Compose, which is pre-configured for both CPU and NVIDIA GPU deployment.
- Docker installed.
- Docker Compose installed (usually included with Docker Desktop).
- (For GPU acceleration)
- An NVIDIA GPU.
- Up-to-date NVIDIA drivers for your host operating system.
- The NVIDIA Container Toolkit installed.
This method uses the provided docker-compose.yml files to automatically build the correct image and manage the container, volumes, and configuration.
1. Clone the Repository
git clone https://github.com/devnen/Kitten-TTS-Server.git
cd Kitten-TTS-Server2. Start the Container Based on Your Hardware
Choose one of the following commands:
The default docker-compose.yml is configured for NVIDIA GPUs. It will build the image with full CUDA support.
docker compose up -d --buildThis uses a dedicated compose file that builds the image without GPU dependencies.
docker compose -f docker-compose-cpu.yml up -d --buildNote: The first time you run this, Docker will build the image and the server will download the KittenTTS model, which can take a few minutes. Subsequent starts will be much faster.
-
Access the Web UI: Open your browser to
http://localhost:8005 -
Access the API Docs:
http://localhost:8005/docs -
View Logs:
# For GPU or CPU version docker compose logs -f -
Stop the Container:
# This stops and removes the container but keeps your data volumes docker compose down
- Build-time Argument: The
Dockerfileuses aRUNTIMEargument (nvidiaorcpu) to conditionally install the correct Python packages, creating an optimized image for your hardware. - Persistent Data: The
docker-composefiles use Docker volumes to persist your important data on your host machine, even if the container is removed:./config.yaml: Your main server configuration file../outputs: All generated audio files are saved here../logs: Server log files for troubleshooting.hf_cache(Named Volume): Persists the downloaded Hugging Face models, saving significant time on rebuilds.
After starting the GPU container, you can verify that Docker and the application can see your graphics card.
# Check if the container can see the NVIDIA GPU
docker compose exec kitten-tts-server nvidia-smi
# Check if PyTorch inside the container can access CUDA
docker compose exec kitten-tts-server python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"If CUDA available: prints True, your GPU setup is working correctly
- Start the server and open the Web UI (
http://localhost:8005). - Select a model from the dropdown at the top (or use the default Nano 0.1).
- Select a voice from the voice dropdown.
- Type or paste your text into the input box.
- Adjust the speech speed if desired.
- Click "Generate Speech".
- The audio will play automatically and be available for download.
- Select a different model from the Active Model dropdown at the top of the page.
- Click "Apply & Restart".
- A progress modal will show the download and loading status.
- Once complete, the voice dropdown updates automatically with the new model's voices.
- Copy the entire plain text of your book or chapter.
- Paste it into the text area.
- Ensure "Split text into chunks" is enabled.
- Set a Chunk Size between 300 and 500 characters for natural pauses.
- Click "Generate Speech". The server will process the entire text and stitch the audio together seamlessly.
- Download your complete audiobook file.
The server exposes two main endpoints for TTS. See http://localhost:8005/docs for an interactive playground.
This endpoint offers the most control.
- Method:
POST - Body:
{ "text": "Hello from the KittenTTS API!", "voice": "Reed", "speed": 1.0, "output_format": "mp3", "split_text": true, "chunk_size": 300 } - Response: Streaming audio file (
audio/wav,audio/mp3, etc.).
Use this for drop-in compatibility with scripts expecting OpenAI's TTS API structure.
- Method:
POST - Body:
{ "model": "kitten-tts", "input": "This is an OpenAI-compatible request.", "voice": "Ivy", "response_format": "wav", "speed": 0.9 }
GET /api/model-infoβ Returns details about the currently loaded model.GET /api/model-registryβ Returns all available models for the UI dropdown.GET /api/model-statusβ Returns download/loading progress during model switching.POST /restart_serverβ Triggers an async model hot-swap based on current config.POST /api/cancel-loadingβ Cancels an in-progress model download/load.
All server settings are managed in the config.yaml file. It's created automatically on first launch if it doesn't exist.
Key Settings:
server.host,server.port: Network settings.model.repo_id: The active model selector (e.g.,kitten-nano-0.8-int8or a HuggingFace repo ID).tts_engine.device: Set toauto,cuda, orcpu. The server will use your GPU if set toautoorcudaand a compatible environment is found.generation_defaults.speed: Default speech speed (1.0 is normal).audio_output.format: Default audio format (wav,mp3,opus).
- Phonemizer / eSpeak Errors:
- This is the most common issue. Ensure you have installed eSpeak NG correctly for your OS and restarted your terminal afterward. The server includes auto-detection logic for common install paths.
- GPU Not Used / Falls Back to CPU:
- Follow the NVIDIA GPU Installation steps exactly. The most common cause is
torchbeing installed without CUDA support. - Run the verification command from the installation guide to confirm
torch.cuda.is_available()isTrue.
- Follow the NVIDIA GPU Installation steps exactly. The most common cause is
- "No module named 'soundfile'" or Audio Errors on Linux:
- The underlying system library is likely missing. Run
sudo apt install libsndfile1.
- The underlying system library is likely missing. Run
- "Port already in use" Error:
- Another application is using port 8005. Stop that application or change the port in
config.yaml(e.g.,port: 8006) and restart the server.
- Another application is using port 8005. Stop that application or change the port in
- Model download hangs or fails:
- Check your internet connection. Models are downloaded from Hugging Face Hub.
- Try clearing the
model_cachedirectory and restarting. - Large models (Mini 0.1 at ~170MB) may take several minutes on slower connections.
- Core Model: This project is powered by the KittenTTS model created by KittenML. Our work adds a high-performance server and UI layer on top of their excellent lightweight model.
- Core Libraries: FastAPI, Uvicorn, ONNX Runtime, PyTorch, Hugging Face Hub, Phonemizer.
- UI Inspiration: The UI/server architecture is inspired by our previous work on the Chatterbox-TTS-Server.
This project is licensed under the MIT License. See the LICENSE file for details.
Contributions, issues, and feature requests are welcome! Please feel free to open an issue or submit a pull request.

