Skip to content

bigr00/voxzilla

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

5 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿฆ– Voxzilla โ€” The Voice Godzilla

Local AI dictation layer. Speak naturally. Get polished text.

Voxzilla is a privacy-first, fully local alternative to Wispr Flow. Hold a hotkey, speak naturally with all your "ums," "uhs," false starts, and self-corrections โ€” Voxzilla captures your speech, transcribes it, and cleans it up into publication-ready text. All running on your machine. No cloud. No subscriptions. No data leaving your device.

You say:  "um yeah I think we should like schedule the meeting for
           Thursday, no wait, Friday actually, and uh make sure to
           invite Sarah from, you know, the design team"

You get:  "I think we should schedule the meeting for Friday and
           make sure to invite Sarah from the design team."

โœจ Features

  • ๐ŸŽค Push-to-Talk Dictation โ€” Hold a hotkey, speak, release. Text appears wherever your cursor is.
  • ๐Ÿง  AI Text Correction โ€” Strips filler words, resolves self-corrections, fixes homophones, adds proper punctuation. Your speech reads like writing.
  • ๐Ÿ”’ 100% Local โ€” ASR runs via mlx-whisper on Apple Silicon. Correction runs through LM Studio. No internet required.
  • ๐ŸŽจ Multiple Correction Styles โ€” Auto, Professional, Casual, or Verbatim (punctuation only).
  • โŒจ๏ธ Works Anywhere โ€” Injects text into any app: VS Code, Slack, Gmail, Notion, iMessage, terminal, browsers.
  • โšก Blazing Fast โ€” ~2 seconds for ASR + ~0.5โ€“3 seconds for correction on Apple Silicon.
  • ๐Ÿ”ง Configurable โ€” Swap ASR engines, correction models, hotkeys, and styles via a simple YAML config.
  • ๐ŸŒ Multilingual โ€” 100+ languages supported through Whisper. Auto-detection or manual selection.

๐Ÿ—๏ธ Architecture

Microphone โ”€โ”€โ–บ Audio Capture โ”€โ”€โ–บ Voice Activity Detection
                                     โ”‚
                                     โ–ผ
                             โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                             โ”‚   ASR Engine   โ”‚
                             โ”‚  mlx-whisper   โ”‚
                             โ”‚ large-v3-turbo โ”‚
                             โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                     โ”‚ Raw transcript
                                     โ–ผ
                             โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                             โ”‚  Correction    โ”‚
                             โ”‚  LM Studio     โ”‚
                             โ”‚ (FlowScribe /  โ”‚
                             โ”‚  Qwen2.5-7B)   โ”‚
                             โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                     โ”‚ Cleaned text
                                     โ–ผ
                             โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                             โ”‚ Text Injection โ”‚
                             โ”‚  CGEvent /     โ”‚
                             โ”‚  AppleScript   โ”‚
                             โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                     โ”‚
                             Active Text Field

๐Ÿš€ Quick Start

Prerequisites

  • macOS with Apple Silicon (M1/M2/M3/M4)
  • Python 3.12+
  • LM Studio (free download from lmstudio.ai)

One-Command Setup

# Clone and run the setup script
git clone https://github.com/voxzilla/voxzilla.git
cd voxzilla
chmod +x scripts/setup_models.sh
./scripts/setup_models.sh

Manual Setup

# 1. Install voxzilla
pip install -e .

# 2. Run the setup wizard
voxzilla setup

# 3. Start dictating!
voxzilla start

Hold Ctrl and speak. Release to see your polished text appear.

๐Ÿ“ฆ Recommended Models

ASR (automatic โ€” no setup needed)

Model Engine Speed (M2 Pro) Accuracy
Whisper large-v3-turbo โ˜… mlx-whisper ~2s/min audio 7.75% WER
Whisper large-v3 mlx-whisper ~6s/min audio 7.44% WER
Whisper small mlx-whisper ~0.5s/min audio 10%+ WER

โ˜… = Recommended default

Correction (load in LM Studio)

Model Size Latency Best For
FlowScribe 0.5B โ˜… ~400 MB ~0.5s Speed. Purpose-built for dictation cleanup.
Qwen2.5-7B-Instruct ~4.7 GB ~2.6s Quality. Best instruction following.
Llama 3.2-3B-Instruct ~2 GB ~1.4s Balance. Great for English.
Phi-4-mini 3.8B ~2.5 GB ~1.5s Good reasoning, handles messy input well.

โ˜… = Recommended default

๐ŸŽฎ Usage

# Start the dictation daemon
voxzilla start

# Check status and configuration
voxzilla status

# Run the setup wizard
voxzilla setup

# List available models
voxzilla models

# Show current configuration
voxzilla config show

# Edit configuration file
voxzilla config edit

# Benchmark your setup
python scripts/benchmark.py

# Start with raw transcription only (no AI correction)
voxzilla start --no-correction

# Enable debug logging
voxzilla start --verbose

Hotkey Modes

  • Push-to-Talk (default): Hold the configured key (Ctrl by default) while speaking, release to process.
  • Toggle: Press once to start recording, press again to stop and process.

Correction Styles

  • Auto: Context-aware โ€” detects app and formality level automatically.
  • Professional: Formal grammar, full sentences, business-appropriate tone.
  • Casual: Relaxed, conversational. Keeps some filler for natural feel.
  • Verbatim: Adds punctuation only. Keeps every word including fillers.

โš™๏ธ Configuration

Configuration lives at ~/.config/voxzilla/config.yaml. Run voxzilla setup for interactive configuration, or edit it directly:

asr:
  engine: mlx_whisper           # mlx_whisper | faster_whisper
  model: large-v3-turbo
  language: auto

correction:
  engine: lm_studio             # lm_studio | ollama | none
  base_url: http://localhost:1234/v1
  model: flowscribe-0.5b
  temperature: 0.0
  style: auto                   # auto | professional | casual | verbatim

hotkey:
  key: ctrl                     # ctrl | cmd | alt | shift | fn
  mode: push_to_talk            # push_to_talk | toggle

audio:
  sample_rate: 16000
  channels: 1

๐Ÿ”ง Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Type checking
mypy src/

# Linting
ruff check src/

# Formatting
ruff format src/

Project Structure

voxzilla/
โ”œโ”€โ”€ src/voxzilla/
โ”‚   โ”œโ”€โ”€ audio/          # Audio capture & VAD
โ”‚   โ”œโ”€โ”€ asr/            # ASR engines (mlx-whisper, faster-whisper)
โ”‚   โ”œโ”€โ”€ correction/     # Text correction engines (LM Studio, Ollama)
โ”‚   โ”œโ”€โ”€ injection/      # Text injection (macOS CGEvent/AppleScript)
โ”‚   โ”œโ”€โ”€ hotkey/         # Global hotkey listener
โ”‚   โ”œโ”€โ”€ pipeline/       # Orchestration pipeline
โ”‚   โ”œโ”€โ”€ ui/             # CLI, system tray, overlay
โ”‚   โ”œโ”€โ”€ models/         # Model catalog and management
โ”‚   โ”œโ”€โ”€ daemon.py       # Main application daemon
โ”‚   โ””โ”€โ”€ config.py       # Pydantic configuration system
โ”œโ”€โ”€ config/default.yaml # Bundled default configuration
โ”œโ”€โ”€ scripts/            # Setup and benchmarking tools
โ”œโ”€โ”€ tests/              # Test suite
โ””โ”€โ”€ pyproject.toml      # Project metadata and dependencies

Design Principles

  • Strategy Pattern: Every major component (ASR, Correction, Injection, Hotkey) follows an abstract base class โ†’ pluggable implementations pattern. Swap engines by changing one config line.
  • Async-First: Correction is async (LLM API calls). The daemon runs its own event loop.
  • Fail Gracefully: If correction fails, raw transcription is still pasted. Every component handles its own errors.
  • Type Hints Everywhere: Strict mypy compliance. Pydantic for config validation.

๐Ÿ™ Acknowledgments

Voxzilla builds on incredible open-source work:

Inspired by Wispr Flow, FreeFlow, OpenWhispr, Sussurro, and many other open-source dictation projects.

๐Ÿ“„ License

MIT โ€” see LICENSE for details.


๐Ÿฆ– Voxzilla โ€” The Voice Godzilla
Speak naturally. Get polished text. All local. All private.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors