Native C++ inference engine for Kitten TTS + Python GGUF model conversion tools.
├── cpp/ C++ inference engine (ggml backend, CMake build)
│ ├── src/ kitten-tts.cpp (library) + main.cpp (CLI)
│ ├── include/ kitten-tts.h (public API)
│ └── extern/ ggml + espeak-ng (git submodules)
├── python/ GGUF conversion scripts
│ ├── convert_gguf.py ONNX → GGUF converter
│ ├── convert_onnx.py ONNX weight extraction
│ └── model.py PyTorch model definition
└── requirements-gguf.txt Python dependencies for GGUF conversion
# Clone with submodules
git clone --recursive https://github.com/YOUR_USER/kitten-tts-native.git
# or if already cloned:
git submodule update --init --recursive
# Build
cd cpp
cmake -B build
cmake --build build -j$(nproc)
# Run
./build/kitten-tts-cli --model ../path/to/model.gguf --text "Hello world" --output hello.wav--model PATH GGUF model path (required)
--text STRING Text to synthesize (required, or --input-ids)
--input-ids LIST Comma-separated token IDs (alternative to --text)
--voice NAME Voice: Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo (default: Bella)
--ref-id INT Reference style ID 0-399 (default: 0)
--speed FLOAT Speed 0.5-2.0 (default: 1.0)
--output PATH Output WAV path (default: output.wav)
--threads INT Thread count (default: 4)
pip install -r python/requirements-gguf.txt
python python/convert_gguf.py \
--onnx path/to/model.onnx \
--voices path/to/voices.npz \
--output model.gguf- C++: CMake 3.14+, C++17 compiler, ggml (submodule), espeak-ng (submodule)
- Python: PyTorch 2.0+, NumPy, ONNX
Apache 2.0