Local AI dictation layer. Speak naturally. Get polished text.
Voxzilla is a privacy-first, fully local alternative to Wispr Flow. Hold a hotkey, speak naturally with all your "ums," "uhs," false starts, and self-corrections โ Voxzilla captures your speech, transcribes it, and cleans it up into publication-ready text. All running on your machine. No cloud. No subscriptions. No data leaving your device.
You say: "um yeah I think we should like schedule the meeting for
Thursday, no wait, Friday actually, and uh make sure to
invite Sarah from, you know, the design team"
You get: "I think we should schedule the meeting for Friday and
make sure to invite Sarah from the design team."
- ๐ค Push-to-Talk Dictation โ Hold a hotkey, speak, release. Text appears wherever your cursor is.
- ๐ง AI Text Correction โ Strips filler words, resolves self-corrections, fixes homophones, adds proper punctuation. Your speech reads like writing.
- ๐ 100% Local โ ASR runs via mlx-whisper on Apple Silicon. Correction runs through LM Studio. No internet required.
- ๐จ Multiple Correction Styles โ Auto, Professional, Casual, or Verbatim (punctuation only).
- โจ๏ธ Works Anywhere โ Injects text into any app: VS Code, Slack, Gmail, Notion, iMessage, terminal, browsers.
- โก Blazing Fast โ ~2 seconds for ASR + ~0.5โ3 seconds for correction on Apple Silicon.
- ๐ง Configurable โ Swap ASR engines, correction models, hotkeys, and styles via a simple YAML config.
- ๐ Multilingual โ 100+ languages supported through Whisper. Auto-detection or manual selection.
Microphone โโโบ Audio Capture โโโบ Voice Activity Detection
โ
โผ
โโโโโโโโโโโโโโโโโ
โ ASR Engine โ
โ mlx-whisper โ
โ large-v3-turbo โ
โโโโโโโโโฌโโโโโโโโ
โ Raw transcript
โผ
โโโโโโโโโโโโโโโโโ
โ Correction โ
โ LM Studio โ
โ (FlowScribe / โ
โ Qwen2.5-7B) โ
โโโโโโโโโฌโโโโโโโโ
โ Cleaned text
โผ
โโโโโโโโโโโโโโโโโ
โ Text Injection โ
โ CGEvent / โ
โ AppleScript โ
โโโโโโโโโฌโโโโโโโโ
โ
Active Text Field
- macOS with Apple Silicon (M1/M2/M3/M4)
- Python 3.12+
- LM Studio (free download from lmstudio.ai)
# Clone and run the setup script
git clone https://github.com/voxzilla/voxzilla.git
cd voxzilla
chmod +x scripts/setup_models.sh
./scripts/setup_models.sh# 1. Install voxzilla
pip install -e .
# 2. Run the setup wizard
voxzilla setup
# 3. Start dictating!
voxzilla startHold Ctrl and speak. Release to see your polished text appear.
| Model | Engine | Speed (M2 Pro) | Accuracy |
|---|---|---|---|
| Whisper large-v3-turbo โ | mlx-whisper | ~2s/min audio | 7.75% WER |
| Whisper large-v3 | mlx-whisper | ~6s/min audio | 7.44% WER |
| Whisper small | mlx-whisper | ~0.5s/min audio | 10%+ WER |
โ = Recommended default
| Model | Size | Latency | Best For |
|---|---|---|---|
| FlowScribe 0.5B โ | ~400 MB | ~0.5s | Speed. Purpose-built for dictation cleanup. |
| Qwen2.5-7B-Instruct | ~4.7 GB | ~2.6s | Quality. Best instruction following. |
| Llama 3.2-3B-Instruct | ~2 GB | ~1.4s | Balance. Great for English. |
| Phi-4-mini 3.8B | ~2.5 GB | ~1.5s | Good reasoning, handles messy input well. |
โ = Recommended default
# Start the dictation daemon
voxzilla start
# Check status and configuration
voxzilla status
# Run the setup wizard
voxzilla setup
# List available models
voxzilla models
# Show current configuration
voxzilla config show
# Edit configuration file
voxzilla config edit
# Benchmark your setup
python scripts/benchmark.py
# Start with raw transcription only (no AI correction)
voxzilla start --no-correction
# Enable debug logging
voxzilla start --verbose- Push-to-Talk (default): Hold the configured key (Ctrl by default) while speaking, release to process.
- Toggle: Press once to start recording, press again to stop and process.
- Auto: Context-aware โ detects app and formality level automatically.
- Professional: Formal grammar, full sentences, business-appropriate tone.
- Casual: Relaxed, conversational. Keeps some filler for natural feel.
- Verbatim: Adds punctuation only. Keeps every word including fillers.
Configuration lives at ~/.config/voxzilla/config.yaml. Run voxzilla setup for interactive configuration, or edit it directly:
asr:
engine: mlx_whisper # mlx_whisper | faster_whisper
model: large-v3-turbo
language: auto
correction:
engine: lm_studio # lm_studio | ollama | none
base_url: http://localhost:1234/v1
model: flowscribe-0.5b
temperature: 0.0
style: auto # auto | professional | casual | verbatim
hotkey:
key: ctrl # ctrl | cmd | alt | shift | fn
mode: push_to_talk # push_to_talk | toggle
audio:
sample_rate: 16000
channels: 1# Install with dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Type checking
mypy src/
# Linting
ruff check src/
# Formatting
ruff format src/voxzilla/
โโโ src/voxzilla/
โ โโโ audio/ # Audio capture & VAD
โ โโโ asr/ # ASR engines (mlx-whisper, faster-whisper)
โ โโโ correction/ # Text correction engines (LM Studio, Ollama)
โ โโโ injection/ # Text injection (macOS CGEvent/AppleScript)
โ โโโ hotkey/ # Global hotkey listener
โ โโโ pipeline/ # Orchestration pipeline
โ โโโ ui/ # CLI, system tray, overlay
โ โโโ models/ # Model catalog and management
โ โโโ daemon.py # Main application daemon
โ โโโ config.py # Pydantic configuration system
โโโ config/default.yaml # Bundled default configuration
โโโ scripts/ # Setup and benchmarking tools
โโโ tests/ # Test suite
โโโ pyproject.toml # Project metadata and dependencies
- Strategy Pattern: Every major component (ASR, Correction, Injection, Hotkey) follows an abstract base class โ pluggable implementations pattern. Swap engines by changing one config line.
- Async-First: Correction is async (LLM API calls). The daemon runs its own event loop.
- Fail Gracefully: If correction fails, raw transcription is still pasted. Every component handles its own errors.
- Type Hints Everywhere: Strict mypy compliance. Pydantic for config validation.
Voxzilla builds on incredible open-source work:
- OpenAI Whisper โ Speech recognition foundation
- mlx-whisper โ Apple Silicon optimized Whisper
- faster-whisper โ CTranslate2 Whisper backend
- LM Studio โ Local LLM runtime
- Silero VAD โ Voice activity detection
- rumps โ macOS menu bar apps in Python
Inspired by Wispr Flow, FreeFlow, OpenWhispr, Sussurro, and many other open-source dictation projects.
MIT โ see LICENSE for details.
๐ฆ Voxzilla โ The Voice Godzilla
Speak naturally. Get polished text. All local. All private.