Local text-to-speech that never leaves your machine. Generate natural speech, clone voices, and design new ones — powered by Qwen3-TTS models running entirely on your hardware.
No cloud APIs. No subscriptions. No data sent anywhere. Have ideas or requests? Open an issue or PR, or support the project.
|
Pick from preset speakers, choose a language, and type your text. Optionally add style instructions to control tone and delivery. |
Record or import a reference audio clip, and PrivateVoice will generate new speech in that voice. Includes optional Whisper auto-transcription. |
Describe the voice you want in plain text — "warm baritone, slight British accent, nature documentary narrator" — and the model creates it. |
- Runs 100% locally — inference on Apple Silicon (MPS), NVIDIA CUDA, or CPU. Nothing leaves
localhost. - Lightweight installer — ships a small desktop app; downloads Python, dependencies, and models on first launch.
- 5 model variants — from fast 0.6B to high-quality 1.7B, with one-click switching between compatible models.
- Deterministic seed control — optional seed input for reproducible generations across Custom Voice, Voice Clone, and Voice Design.
- Voice library — save generated audio, organize with tabs (Recent / Saved Voices / Audio), search and replay.
- Export to WAV or MP3 — configurable MP3 bitrate, plus WAV sample rate and bit depth controls.
- Batch processing — queue multiple
.txtfiles and generate a ZIP of per-file outputs with progress tracking and cancellation. - Optional Whisper transcription — auto-fill Voice Clone transcripts from reference audio.
- Optional translation — translate input text locally before generating speech (NLLB 600M).
- Keyboard shortcuts —
Cmd/Ctrl+Enterto generate,Cmd/Ctrl+Sto save,Spaceto play/pause, and more. - Debug console — live logs and system info for troubleshooting.
| Platform | Status |
|---|---|
| Windows 11 x64 | Working — NSIS installer, CUDA auto-detection |
| macOS (Apple Silicon) | Working — MPS acceleration |
| Linux x64 | Builds available — CUDA/ROCm/CPU detection implemented |
System requirements: 16 GB+ RAM recommended (8 GB minimum for 0.6B models). First model download is ~1.2–3.4 GB.
Download the latest release for your platform from Releases, then launch the app.
On first run, PrivateVoice will automatically:
- Detect your hardware (GPU/CPU)
- Install a standalone Python 3.11 environment
- Download dependencies (with GPU-appropriate PyTorch)
- Start the local TTS server
Subsequent launches skip setup and start in seconds.
pnpm install
pnpm tauri dev # development with hot reloadFor a release build:
# macOS / Linux
./scripts/build-release.sh
# Windows (PowerShell)
.\scripts\build-release.ps1| Model | Size | Custom Voice | Voice Clone | Voice Design |
|---|---|---|---|---|
0.6b |
~1.2 GB | Yes | — | — |
0.6b-base |
~1.2 GB | — | Yes | — |
1.7b |
~3.4 GB | Yes | — | — |
1.7b-base |
~3.4 GB | — | Yes | — |
1.7b-design |
~3.4 GB | — | — | Yes |
The app shows compatibility indicators on mode tabs and offers one-click model loading when you switch to an incompatible mode.
┌─────────────────────────────────────────────────────┐
│ Svelte 5 Frontend (TypeScript + Tailwind CSS 4) │
│ ↕ HTTP localhost:8765 │
│ Python FastAPI Server (Qwen3-TTS inference) │
│ ↕ managed by │
│ Tauri 2 Rust Shell (sidecar lifecycle, native OS) │
└─────────────────────────────────────────────────────┘
The Tauri desktop shell manages a Python sidecar process that runs the TTS models. The Svelte frontend communicates with it over HTTP on localhost. All model weights and runtime files are stored in app-scoped directories — nothing pollutes your global Python or system cache.
- All inference runs locally on your machine
- The server binds to
127.0.0.1:8765— not accessible from the network - Internet is used only during first-run setup (Python, dependencies) and model downloads from HuggingFace
- No telemetry, no analytics, no cloud calls during normal use
Configure theme, default model/speaker, export format, auto-load behavior, and optional features (Whisper, translation). The Environment section shows GPU target, setup state, and disk usage — with Repair and Full Rebuild buttons if anything goes wrong.
- Seed (optional): available in each generation mode under
Advanced. Use the same seed + same setup for reproducible outputs. - WAV tuning: when export format is WAV, choose
Native / 8k / 16k / 22.05k / 24k / 44.1k / 48ksample rates and16/24/32-bitdepth. - Batch mode: toggle
Batch mode, upload multiple.txtfiles, and generate all outputs using the current voice configuration. Results download as a ZIP.
App data is stored in platform-standard locations:
| Platform | Path |
|---|---|
| macOS | ~/Library/Application Support/com.privatevoice.desktop/ |
| Windows | %APPDATA%\com.privatevoice.desktop\ |
| Linux | ~/.local/share/com.privatevoice.desktop/ |
Windows uninstaller offers granular cleanup — keep your voice library while removing models and runtime, or remove everything.
pnpm install # frontend dependencies
pnpm tauri dev # full app with hot reload
pnpm dev # frontend only (no TTS backend)
# Python backend standalone
cd python && source .venv/bin/activate && python -m tts_server.main
# Tests
pnpm test:run # unit tests
pnpm test:e2e # Playwright E2E (90 tests)
pnpm check # TypeScript/Svelte type check
cd python && pytest tests/ # Python backend tests343 automated tests: 216 unit tests (Vitest) + 90 E2E tests (Playwright) + Python backend tests.
If PrivateVoice is useful to you, you can support ongoing development. Issues, feature requests, and PRs are always welcome.
MIT



