Press a key. Speak. Text appears.
Download for macOS · Download for Linux
Installation · Features · Contributing · Docs
Voice input on macOS and Linux is either cloud-dependent or requires complex setup. Thoth runs speech-to-text locally using whisper.cpp with GPU acceleration (Metal on macOS, CUDA/Vulkan on Linux). Nothing leaves the machine. No subscription. No cloud. No internet required.
After installing, Thoth checks for updates automatically and installs them in-app.
macOS will block the app the first time you open it because it isn't from the App Store. This is normal and only happens once.
- Open the
.dmgand drag Thoth to Applications - Right-click (or Control-click) the app and choose Open
- Click Open in the dialogue that appears
Alternative: remove the block from Terminal
xattr -dr com.apple.quarantine /Applications/Thoth.app- Download the
.AppImagefrom the latest release - Make it executable:
chmod +x Thoth_*.AppImage - Run it:
./Thoth_*.AppImage
NVIDIA users: Ensure CUDA drivers are installed for GPU-accelerated transcription. Without them, Thoth falls back to CPU.
The app walks you through three quick steps:
- Download a speech model. Click "Download Recommended Model" on the Overview tab (~1.5 GB, runs locally).
- Grant microphone access. Click "Allow" when prompted so Thoth can hear you.
- Grant global shortcut access. Click "Allow" so the recording hotkey works from any app (System Settings › Privacy & Security › Accessibility).
- Start dictating. Press F13 (the default shortcut), speak, and text appears at your cursor.
Tip: F13 is the default. You can change it in Settings › Recording.
|
Offline Transcription
|
AI Enhancement
|
|
Personal Dictionary
|
Recording Options
|
|
History & Export
|
Menu Bar App
|
pnpm install
pnpm tauri dev # Development build
pnpm tauri build # Production buildRequirements
- macOS 14.0+ or Linux
- Rust 1.75+
- Node.js 20+
- pnpm
Linux GPU Acceleration
whisper.cpp supports GPU acceleration on Linux for significantly faster transcription. Choose the backend that matches your hardware:
| GPU | Feature Flag | Requirements |
|---|---|---|
| NVIDIA | --features cuda |
CUDA Toolkit 12.x, NVIDIA drivers |
| AMD | --features hipblas |
ROCm 6.x |
| Any (experimental) | --features vulkan |
Vulkan drivers |
Build with GPU acceleration:
# NVIDIA (recommended for RTX/GTX cards)
pnpm tauri build -- --features cuda
# AMD (RX series)
pnpm tauri build -- --features hipblas
# Vulkan (cross-platform, experimental)
pnpm tauri build -- --features vulkanNixOS users: Use nix develop to enter the development environment with CUDA support pre-configured. The flake includes all necessary libraries and sets up LD_LIBRARY_PATH and RUSTFLAGS for CUDA linking.
If GPU initialisation fails, Thoth automatically falls back to CPU transcription.
GNOME Wayland users: On first transcription, GNOME may show a "Remote Desktop" dialogue requesting "Allow Remote Interaction". This is because GNOME doesn't support the Wayland virtual-keyboard protocol. Enable the permission once and GNOME will remember it for future transcriptions. Alternatively, use a different Wayland compositor (Sway, Hyprland) or X11 session.
| Layer | Choice | Why |
|---|---|---|
| Framework | Tauri 2.0 | Native performance, cross-platform |
| Backend | Rust | Memory safety, audio performance |
| Frontend | Svelte 5 | Reactive UI with runes |
| Audio | cpal | Cross-platform audio capture |
| Transcription | whisper-rs | whisper.cpp with GPU acceleration (sherpa-rs fallback) |
| Database | SQLite | Local persistence with migrations |
| AI | Ollama | Local LLM enhancement |
- Product docs: docs/product/. Intent, workflows, design principles
- Development: docs/development/. Audio pipeline, data model, architecture
Your voice. Your machine. Nothing else.
Named after the Egyptian god of writing and wisdom, the scribe who faithfully records all that is spoken.
Built on whisper.cpp, Tauri, cpal, and Sherpa-ONNX. Inspired by MacWhisper, VoiceInk, and Spokenly.
