Pocket TTS OpenAPI

Fast, local, multi-protocol Text-to-Speech server powered by Kyutai's Pocket TTS.

🚀 Runs at 1.5x real-time on older CPUs (tested on Haswell)
🎭 Voice cloning support - use your own .wav files
⚡ Optimized Loading - converts voices to .safetensors for instant startup
📦 Audio caching - instant response for repeated phrases
🛰️ Streaming Support - real-time audio generation
🛡️ Stuttering Protection - runs with High Priority to prevent choppiness under load
🌐 Multi-protocol: OpenAI standard, XTTS-compatible, WebSocket, and GET streaming
🤖 SillyTavern ready - works with XTTSv2 and OpenAI Compatible TTS providers

Installation

1. Download Project

git clone https://github.com/IceFog72/pocket-tts-openapi
cd pocket-tts-openapi

2. Setup & Run

Windows

Run install.bat - sets up Python venv and installs dependencies
Run start.bat - starts the server (automatically sets High Priority)

Linux

Run chmod +x install.sh start.sh update.sh (first time only)
Run ./install.sh - sets up Python venv and installs dependencies
Run ./start.sh - starts the server

3. Updating

To get the latest version of the project:

Windows: Run update.bat
Linux: Run ./update.sh

API Endpoints

Server: http://localhost:8005

Method	Path	Description
GET	`/health`	Server status
POST	`/v1/audio/speech`	Generate audio (OpenAI standard)
GET	`/tts_stream?text=...&voice=...`	Stream audio via GET
POST	`/tts_to_audio/`	Generate audio (XTTS format)
GET	`/v1/voices`	Voice list (OpenAI format)
GET	`/speakers`	Voice list (XTTS format)
WS	`/v1/audio/stream`	WebSocket streaming

OpenAI Standard

curl http://localhost:8005/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Hello!", "voice": "nova", "response_format": "mp3"}' \
  --output hello.mp3

XTTS-Compatible (GET Streaming)

curl "http://localhost:8005/tts_stream?text=Hello&voice=nova&format=mp3" \
  --output hello.mp3

WebSocket

import asyncio, json, websockets

async def stream_tts():
    async with websockets.connect("ws://localhost:8005/v1/audio/stream") as ws:
        await ws.send(json.dumps({"text": "Hello", "voice": "nova", "format": "mp3"}))
        while True:
            msg = await ws.recv()
            if isinstance(msg, bytes):
                with open("out.mp3", "ab") as f:
                    f.write(msg)
            elif json.loads(msg).get("status") == "done":
                break

asyncio.run(stream_tts())

SillyTavern Integration

The server works with SillyTavern in three ways:

1. PocketTTS WebSocket Extension (Recommended)

The SillyTavern-PocketTTS-WebSocket extension provides the best experience:

Persistent WebSocket connection — no reconnect overhead per sentence
Sentence-split generation — each sentence gets exact audio duration, no gaps
Built-in TTS playback bar with seek, volume, speed controls
Model selection (CPU/GPU, fast/quality)
Voice auto-discovery
Streaming Response via async generator — audio plays while server generates

2. Built-in XTTSv2 Provider

Use SillyTavern's built-in XTTSv2 provider — set endpoint to http://host:8005.

3. Built-in OpenAI Compatible Provider

Use SillyTavern's built-in OpenAI Compatible provider — set endpoint to http://host:8005/v1/audio/speech.

Provider	Set endpoint to	Voices auto-discovered	Sentence streaming
PocketTTS Extension	`http://host:8005`	Yes	Yes
XTTSv2	`http://host:8005`	Yes	No
OpenAI Compatible	`http://host:8005/v1/audio/speech`	Yes	No

🖥️ Ice Open TTS Proxy (GUI & AI Agent Bridge)

For a desktop experience and AI integration, use the Ice Open TTS Proxy.

🎨 Desktop GUI: Text input, voice selection, playback controls.
⚡ Live Mode: Speaks as you type with real-time setting sync.
🤖 AI Agent Bridge: OpenAI-compatible API server on port 8181.

Launching the Proxy

Ensure the main TTS server is running (Step 2 above).
Go to the ice-open-tts-test-proxy/ directory.
Windows: Run start_ice_gui.bat
Linux: Run ./start_ice_gui.sh

See AGENTS.md for detailed AI Agent integration.

Features

Built-in Voices

Pocket TTS: alba, marius, javert, jean, fantine, cosette, eponine, azelma
OpenAI aliases: alloy, echo, fable, onyx, nova, shimmer
Custom: place .wav files in voices/ (auto-converted to .safetensors)

Voice Cloning & Embeddings

voices/: Place your source .wav files here (~10 seconds for best results).
embeddings/: Optimized .safetensors are stored here for instant loading.

Setup Authentication

Accept license at https://huggingface.co/kyutai/pocket-tts
Login: huggingface-cli login
Restart the server

Audio Quality & Performance

High Priority Mode: Auto-runs as High Priority on Windows.
Quality Parameters: temperature (0.0-2.0), lsd_decode_steps (1-50).
Large Block Handling: Auto-splits long text into sentences.
Model Tiers: tts-1 (fast), tts-1-hd (quality), tts-1-cuda, tts-1-hd-cuda.

Audio Caching

Auto-caches generated files (default: 10).
Cache includes voice, text, and quality parameters.
Cache hit = instant response.

Troubleshooting

401 Unauthorized → Run huggingface-cli login
Port conflict → Server auto-selects next free port
Slow first run → Downloads ~236MB model

Technical Notes

Platform: Windows and Linux
Dependencies: Python 3.10+, FFmpeg (for MP3/AAC/etc)
Cache: ./audio_cache/
Model cache: ~/.cache/huggingface

Feedback

Discord: https://discord.gg/2tJcWeMjFQ • SillyTavern Discord

Ko-fi • Patreon

Inspired by kyutai-tts-openai-api

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
embeddings		embeddings
ice-open-tts-test-proxy		ice-open-tts-test-proxy
pocket_tts_server		pocket_tts_server
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
config.ini		config.ini
config.py		config.py
install.bat		install.bat
install.sh		install.sh
package-lock.json		package-lock.json
package.json		package.json
pocketapi.py		pocketapi.py
pyproject.toml		pyproject.toml
start.bat		start.bat
start.sh		start.sh
test_pocketapi.py		test_pocketapi.py
test_websocket.py		test_websocket.py
test_ws_streaming.py		test_ws_streaming.py
update.bat		update.bat
update.sh		update.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pocket TTS OpenAPI

Installation

1. Download Project

2. Setup & Run

Windows

Linux

3. Updating

API Endpoints

OpenAI Standard

XTTS-Compatible (GET Streaming)

WebSocket

SillyTavern Integration

1. PocketTTS WebSocket Extension (Recommended)

2. Built-in XTTSv2 Provider

3. Built-in OpenAI Compatible Provider

🖥️ Ice Open TTS Proxy (GUI & AI Agent Bridge)

Launching the Proxy

Features

Built-in Voices

Voice Cloning & Embeddings

Setup Authentication

Audio Quality & Performance

Audio Caching

Troubleshooting

Technical Notes

Feedback

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pocket TTS OpenAPI

Installation

1. Download Project

2. Setup & Run

Windows

Linux

3. Updating

API Endpoints

OpenAI Standard

XTTS-Compatible (GET Streaming)

WebSocket

SillyTavern Integration

1. PocketTTS WebSocket Extension (Recommended)

2. Built-in XTTSv2 Provider

3. Built-in OpenAI Compatible Provider

🖥️ Ice Open TTS Proxy (GUI & AI Agent Bridge)

Launching the Proxy

Features

Built-in Voices

Voice Cloning & Embeddings

Setup Authentication

Audio Quality & Performance

Audio Caching

Troubleshooting

Technical Notes

Feedback

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages