Skip to content

jmoore2333/PrivateVoice

Repository files navigation

PrivateVoice

Local text-to-speech that never leaves your machine. Generate natural speech, clone voices, and design new ones — powered by Qwen3-TTS models running entirely on your hardware.

No cloud APIs. No subscriptions. No data sent anywhere. Have ideas or requests? Open an issue or PR, or support the project.

PrivateVoice — Custom Voice mode


Three Ways to Generate Speech

Custom Voice

Pick from preset speakers, choose a language, and type your text. Optionally add style instructions to control tone and delivery.

Voice Clone

Record or import a reference audio clip, and PrivateVoice will generate new speech in that voice. Includes optional Whisper auto-transcription.

Voice Design

Describe the voice you want in plain text — "warm baritone, slight British accent, nature documentary narrator" — and the model creates it.

Voice Clone mode Voice Design mode

Key Features

  • Runs 100% locally — inference on Apple Silicon (MPS), NVIDIA CUDA, or CPU. Nothing leaves localhost.
  • Lightweight installer — ships a small desktop app; downloads Python, dependencies, and models on first launch.
  • 5 model variants — from fast 0.6B to high-quality 1.7B, with one-click switching between compatible models.
  • Deterministic seed control — optional seed input for reproducible generations across Custom Voice, Voice Clone, and Voice Design.
  • Voice library — save generated audio, organize with tabs (Recent / Saved Voices / Audio), search and replay.
  • Export to WAV or MP3 — configurable MP3 bitrate, plus WAV sample rate and bit depth controls.
  • Batch processing — queue multiple .txt files and generate a ZIP of per-file outputs with progress tracking and cancellation.
  • Optional Whisper transcription — auto-fill Voice Clone transcripts from reference audio.
  • Optional translation — translate input text locally before generating speech (NLLB 600M).
  • Keyboard shortcutsCmd/Ctrl+Enter to generate, Cmd/Ctrl+S to save, Space to play/pause, and more.
  • Debug console — live logs and system info for troubleshooting.

Platform Support

Platform Status
Windows 11 x64 Working — NSIS installer, CUDA auto-detection
macOS (Apple Silicon) Working — MPS acceleration
Linux x64 Builds available — CUDA/ROCm/CPU detection implemented

System requirements: 16 GB+ RAM recommended (8 GB minimum for 0.6B models). First model download is ~1.2–3.4 GB.

Quick Start

Install and run

Download the latest release for your platform from Releases, then launch the app.

On first run, PrivateVoice will automatically:

  1. Detect your hardware (GPU/CPU)
  2. Install a standalone Python 3.11 environment
  3. Download dependencies (with GPU-appropriate PyTorch)
  4. Start the local TTS server

Subsequent launches skip setup and start in seconds.

Build from source

pnpm install
pnpm tauri dev          # development with hot reload

For a release build:

# macOS / Linux
./scripts/build-release.sh

# Windows (PowerShell)
.\scripts\build-release.ps1

Model Reference

Model Size Custom Voice Voice Clone Voice Design
0.6b ~1.2 GB Yes
0.6b-base ~1.2 GB Yes
1.7b ~3.4 GB Yes
1.7b-base ~3.4 GB Yes
1.7b-design ~3.4 GB Yes

The app shows compatibility indicators on mode tabs and offers one-click model loading when you switch to an incompatible mode.

How It Works

┌─────────────────────────────────────────────────────┐
│  Svelte 5 Frontend  (TypeScript + Tailwind CSS 4)   │
│  ↕ HTTP localhost:8765                              │
│  Python FastAPI Server  (Qwen3-TTS inference)       │
│  ↕ managed by                                       │
│  Tauri 2 Rust Shell  (sidecar lifecycle, native OS) │
└─────────────────────────────────────────────────────┘

The Tauri desktop shell manages a Python sidecar process that runs the TTS models. The Svelte frontend communicates with it over HTTP on localhost. All model weights and runtime files are stored in app-scoped directories — nothing pollutes your global Python or system cache.

Privacy

  • All inference runs locally on your machine
  • The server binds to 127.0.0.1:8765 — not accessible from the network
  • Internet is used only during first-run setup (Python, dependencies) and model downloads from HuggingFace
  • No telemetry, no analytics, no cloud calls during normal use

Settings & Environment

Settings panel

Configure theme, default model/speaker, export format, auto-load behavior, and optional features (Whisper, translation). The Environment section shows GPU target, setup state, and disk usage — with Repair and Full Rebuild buttons if anything goes wrong.

Advanced Generation Controls

  • Seed (optional): available in each generation mode under Advanced. Use the same seed + same setup for reproducible outputs.
  • WAV tuning: when export format is WAV, choose Native / 8k / 16k / 22.05k / 24k / 44.1k / 48k sample rates and 16/24/32-bit depth.
  • Batch mode: toggle Batch mode, upload multiple .txt files, and generate all outputs using the current voice configuration. Results download as a ZIP.

Storage & Uninstall

App data is stored in platform-standard locations:

Platform Path
macOS ~/Library/Application Support/com.privatevoice.desktop/
Windows %APPDATA%\com.privatevoice.desktop\
Linux ~/.local/share/com.privatevoice.desktop/

Windows uninstaller offers granular cleanup — keep your voice library while removing models and runtime, or remove everything.

Development

pnpm install                    # frontend dependencies
pnpm tauri dev                  # full app with hot reload
pnpm dev                        # frontend only (no TTS backend)

# Python backend standalone
cd python && source .venv/bin/activate && python -m tts_server.main

# Tests
pnpm test:run                   # unit tests
pnpm test:e2e                   # Playwright E2E (90 tests)
pnpm check                      # TypeScript/Svelte type check
cd python && pytest tests/      # Python backend tests

343 automated tests: 216 unit tests (Vitest) + 90 E2E tests (Playwright) + Python backend tests.

Documentation

Support

If PrivateVoice is useful to you, you can support ongoing development. Issues, feature requests, and PRs are always welcome.

Buy me a coffee

License

MIT

About

Personal voice studio - create, clone, and generate voices locally with complete privacy

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors