🦖 Voxzilla — The Voice Godzilla

Local AI dictation layer. Speak naturally. Get polished text.

Voxzilla is a privacy-first, fully local alternative to Wispr Flow. Hold a hotkey, speak naturally with all your "ums," "uhs," false starts, and self-corrections — Voxzilla captures your speech, transcribes it, and cleans it up into publication-ready text. All running on your machine. No cloud. No subscriptions. No data leaving your device.

You say:  "um yeah I think we should like schedule the meeting for
           Thursday, no wait, Friday actually, and uh make sure to
           invite Sarah from, you know, the design team"

You get:  "I think we should schedule the meeting for Friday and
           make sure to invite Sarah from the design team."

✨ Features

🎤 Push-to-Talk Dictation — Hold a hotkey, speak, release. Text appears wherever your cursor is.
🧠 AI Text Correction — Strips filler words, resolves self-corrections, fixes homophones, adds proper punctuation. Your speech reads like writing.
🔒 100% Local — ASR runs via mlx-whisper on Apple Silicon. Correction runs through LM Studio. No internet required.
🎨 Multiple Correction Styles — Auto, Professional, Casual, or Verbatim (punctuation only).
⌨️ Works Anywhere — Injects text into any app: VS Code, Slack, Gmail, Notion, iMessage, terminal, browsers.
⚡ Blazing Fast — ~2 seconds for ASR + ~0.5–3 seconds for correction on Apple Silicon.
🔧 Configurable — Swap ASR engines, correction models, hotkeys, and styles via a simple YAML config.
🌍 Multilingual — 100+ languages supported through Whisper. Auto-detection or manual selection.

🏗️ Architecture

Microphone ──► Audio Capture ──► Voice Activity Detection
                                     │
                                     ▼
                             ┌───────────────┐
                             │   ASR Engine   │
                             │  mlx-whisper   │
                             │ large-v3-turbo │
                             └───────┬───────┘
                                     │ Raw transcript
                                     ▼
                             ┌───────────────┐
                             │  Correction    │
                             │  LM Studio     │
                             │ (FlowScribe /  │
                             │  Qwen2.5-7B)   │
                             └───────┬───────┘
                                     │ Cleaned text
                                     ▼
                             ┌───────────────┐
                             │ Text Injection │
                             │  CGEvent /     │
                             │  AppleScript   │
                             └───────┬───────┘
                                     │
                             Active Text Field

🚀 Quick Start

Prerequisites

macOS with Apple Silicon (M1/M2/M3/M4)
Python 3.12+
LM Studio (free download from lmstudio.ai)

One-Command Setup

# Clone and run the setup script
git clone https://github.com/voxzilla/voxzilla.git
cd voxzilla
chmod +x scripts/setup_models.sh
./scripts/setup_models.sh

Manual Setup

# 1. Install voxzilla
pip install -e .

# 2. Run the setup wizard
voxzilla setup

# 3. Start dictating!
voxzilla start

Hold Ctrl and speak. Release to see your polished text appear.

📦 Recommended Models

ASR (automatic — no setup needed)

Model	Engine	Speed (M2 Pro)	Accuracy
Whisper large-v3-turbo ★	mlx-whisper	~2s/min audio	7.75% WER
Whisper large-v3	mlx-whisper	~6s/min audio	7.44% WER
Whisper small	mlx-whisper	~0.5s/min audio	10%+ WER

★ = Recommended default

Correction (load in LM Studio)

Model	Size	Latency	Best For
FlowScribe 0.5B ★	~400 MB	~0.5s	Speed. Purpose-built for dictation cleanup.
Qwen2.5-7B-Instruct	~4.7 GB	~2.6s	Quality. Best instruction following.
Llama 3.2-3B-Instruct	~2 GB	~1.4s	Balance. Great for English.
Phi-4-mini 3.8B	~2.5 GB	~1.5s	Good reasoning, handles messy input well.

★ = Recommended default

🎮 Usage

# Start the dictation daemon
voxzilla start

# Check status and configuration
voxzilla status

# Run the setup wizard
voxzilla setup

# List available models
voxzilla models

# Show current configuration
voxzilla config show

# Edit configuration file
voxzilla config edit

# Benchmark your setup
python scripts/benchmark.py

# Start with raw transcription only (no AI correction)
voxzilla start --no-correction

# Enable debug logging
voxzilla start --verbose

Hotkey Modes

Push-to-Talk (default): Hold the configured key (Ctrl by default) while speaking, release to process.
Toggle: Press once to start recording, press again to stop and process.

Correction Styles

Auto: Context-aware — detects app and formality level automatically.
Professional: Formal grammar, full sentences, business-appropriate tone.
Casual: Relaxed, conversational. Keeps some filler for natural feel.
Verbatim: Adds punctuation only. Keeps every word including fillers.

⚙️ Configuration

Configuration lives at ~/.config/voxzilla/config.yaml. Run voxzilla setup for interactive configuration, or edit it directly:

asr:
  engine: mlx_whisper           # mlx_whisper | faster_whisper
  model: large-v3-turbo
  language: auto

correction:
  engine: lm_studio             # lm_studio | ollama | none
  base_url: http://localhost:1234/v1
  model: flowscribe-0.5b
  temperature: 0.0
  style: auto                   # auto | professional | casual | verbatim

hotkey:
  key: ctrl                     # ctrl | cmd | alt | shift | fn
  mode: push_to_talk            # push_to_talk | toggle

audio:
  sample_rate: 16000
  channels: 1

🔧 Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Type checking
mypy src/

# Linting
ruff check src/

# Formatting
ruff format src/

Project Structure

voxzilla/
├── src/voxzilla/
│   ├── audio/          # Audio capture & VAD
│   ├── asr/            # ASR engines (mlx-whisper, faster-whisper)
│   ├── correction/     # Text correction engines (LM Studio, Ollama)
│   ├── injection/      # Text injection (macOS CGEvent/AppleScript)
│   ├── hotkey/         # Global hotkey listener
│   ├── pipeline/       # Orchestration pipeline
│   ├── ui/             # CLI, system tray, overlay
│   ├── models/         # Model catalog and management
│   ├── daemon.py       # Main application daemon
│   └── config.py       # Pydantic configuration system
├── config/default.yaml # Bundled default configuration
├── scripts/            # Setup and benchmarking tools
├── tests/              # Test suite
└── pyproject.toml      # Project metadata and dependencies

Design Principles

Strategy Pattern: Every major component (ASR, Correction, Injection, Hotkey) follows an abstract base class → pluggable implementations pattern. Swap engines by changing one config line.
Async-First: Correction is async (LLM API calls). The daemon runs its own event loop.
Fail Gracefully: If correction fails, raw transcription is still pasted. Every component handles its own errors.
Type Hints Everywhere: Strict mypy compliance. Pydantic for config validation.

🙏 Acknowledgments

Voxzilla builds on incredible open-source work:

OpenAI Whisper — Speech recognition foundation
mlx-whisper — Apple Silicon optimized Whisper
faster-whisper — CTranslate2 Whisper backend
LM Studio — Local LLM runtime
Silero VAD — Voice activity detection
rumps — macOS menu bar apps in Python

Inspired by Wispr Flow, FreeFlow, OpenWhispr, Sussurro, and many other open-source dictation projects.

📄 License

MIT — see LICENSE for details.

🦖 Voxzilla — The Voice Godzilla
_{Speak naturally. Get polished text. All local. All private.}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🦖 Voxzilla — The Voice Godzilla

✨ Features

🏗️ Architecture

🚀 Quick Start

Prerequisites

One-Command Setup

Manual Setup

📦 Recommended Models

ASR (automatic — no setup needed)

Correction (load in LM Studio)

🎮 Usage

Hotkey Modes

Correction Styles

⚙️ Configuration

🔧 Development

Project Structure

Design Principles

🙏 Acknowledgments

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
config		config
scripts		scripts
src/voxzilla		src/voxzilla
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

🦖 Voxzilla — The Voice Godzilla

✨ Features

🏗️ Architecture

🚀 Quick Start

Prerequisites

One-Command Setup

Manual Setup

📦 Recommended Models

ASR (automatic — no setup needed)

Correction (load in LM Studio)

🎮 Usage

Hotkey Modes

Correction Styles

⚙️ Configuration

🔧 Development

Project Structure

Design Principles

🙏 Acknowledgments

📄 License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages