A voice transcription daemon for Linux/Wayland that enables system-wide voice-to-text input using OpenAI-compatible transcription APIs.
- Remote Transcription: Uses OpenAI-compatible APIs (OpenAI, Groq, local Whisper servers)
- System-wide Hotkey: Global keyboard shortcut via XDG Desktop Portal to start/stop recording
- Flexible Text Injection: Multiple paste modes including clipboard-only, Ctrl+V, Ctrl+Shift+V, and Super+V
- Audio Feedback: Optional sound effects for recording start/stop
- Word Overrides: Custom case-insensitive replacements for commonly misheard words
- Voice Commands: Extensive punctuation and symbol commands (period, comma, new line, etc.)
- High-Performance Audio: Async Rust implementation with lock-free ring buffers for real-time capture
- Configurable Retries: Built-in timeout and retry logic for API requests
- Linux with Wayland compositor supporting XDG Desktop Portal (GNOME, KDE Plasma, COSMIC, etc.)
xdg-desktop-portaland a compositor-specific backend (e.g.,xdg-desktop-portal-gnome,xdg-desktop-portal-kde)- Rust toolchain (for building)
wl-copy(for clipboard operations)ydotool(for auto-paste modes - not needed if usingpaste_mode: "none")- Audio input device (microphone)
- OpenAI-compatible transcription API (local or remote)
git clone https://github.com/sizeak/dictator
cd dictator
cargo build --release
sudo cp target/release/dictator /usr/local/bin/Create the configuration directory and copy the example config:
mkdir -p ~/.config/dictator
cp assets/config.example.json ~/.config/dictator/config.jsonEdit ~/.config/dictator/config.json with your settings:
{
"api_url": "http://localhost:8000/v1",
"api_key": "your-api-key-here",
"model": "Systran/faster-distil-whisper-large-v3",
"paste_mode": "ctrl_shift",
"audio_feedback": true,
"language": "en"
}For automatic startup:
mkdir -p ~/.config/systemd/user
cp systemd/dictator.service ~/.config/systemd/user/
systemctl --user enable dictator.service
systemctl --user start dictator.servicedictatorThe daemon will start and register a global shortcut (default: Logo+Alt+D) via XDG Desktop Portal. You can reconfigure the binding in your desktop's System Settings > Shortcuts.
- Press the shortcut to start recording (you'll hear a beep if audio feedback is enabled)
- Speak your text
- Press the shortcut again to stop recording
- The text will be transcribed, processed, and either auto-pasted or copied to clipboard depending on your
paste_modesetting
All configuration is stored in ~/.config/dictator/config.json.
api_url: Base URL for the OpenAI-compatible API (e.g.,"http://localhost:8000/v1")api_key: API authentication keymodel: Model name for transcription- For local servers: model path (e.g.,
"Systran/faster-distil-whisper-large-v3") - For OpenAI:
"whisper-1"
- For local servers: model path (e.g.,
-
paste_mode: How to handle transcribed text (default:"ctrl_shift")"none": Copy to clipboard only, no auto-paste"ctrl": Auto-paste using Ctrl+V"ctrl_shift": Auto-paste using Ctrl+Shift+V"super": Auto-paste using Super+V
-
audio_feedback: Enable/disable sound effects (default:true) -
start_sound_path: Path to recording start sound (default:"ping-up.ogg")- Relative paths are resolved from executable location or use absolute paths
-
stop_sound_path: Path to recording stop sound (default:"ping-down.ogg") -
complete_sound_path: Path to completion notification sound (default:"ping-complete.ogg")- Plays when transcription completes and text is injected/copied to clipboard
-
language: Two-letter language code for transcription (e.g.,"en","es","fr")- If not specified, API will auto-detect language
-
whisper_prompt: Optional prompt to guide transcription style/context- Can improve accuracy for domain-specific vocabulary
-
word_overrides: Dictionary of case-insensitive word/phrase replacements"word_overrides": { "open ai": "OpenAI", "rust": "Rust", "dictator": "Dictator" }
-
timeout: API request timeout in seconds (default:30) -
max_retries: Number of retry attempts for failed API requests (default:2)
Dictator supports voice commands for punctuation and symbols. Say the command word to insert the corresponding character:
Punctuation:
- period →
. - comma →
, - question mark →
? - exclamation mark →
! - colon →
: - semicolon →
;
Whitespace:
- new line →
\n - tab →
\t
Symbols:
- dash →
- - underscore →
_ - slash →
/ - backslash →
\ - pipe →
| - at symbol →
@ - hash →
# - dollar sign →
$ - percent →
% - caret →
^ - ampersand →
& - asterisk →
* - plus →
+ - equals →
= - tilde →
~
Brackets:
- open paren / close paren →
(/) - open bracket / close bracket →
[/] - open brace / close brace →
{/} - less than / greater than →
</>
Quotes:
- quote →
" - single quote →
' - backtick →
`
Dictator uses a modular service-based architecture:
- App: Main application loop handling state transitions
- Recorder: Audio capture service using cpal with lock-free ring buffers
- Transcriber: Handles OpenAI-compatible API communication
- TextProcessor: Applies word overrides and voice command transformations
- AudioFeedback: Plays sound effects using rodio
- TextInjector: Manages clipboard and keyboard simulation via wl-copy and ydotool
Audio is captured in 16-bit signed PCM format at 16kHz mono, streamed to temporary WAV files as recording happens, then sent to the transcription API.
The application uses Tokio's async runtime with a LocalSet to handle !Send futures from the audio capture library.
The shortcut is registered via XDG Desktop Portal's GlobalShortcuts interface. Ensure:
xdg-desktop-portalis installed and running- Your compositor's portal backend is installed (e.g.,
xdg-desktop-portal-gnome,xdg-desktop-portal-kde) - Check if the shortcut was registered: look for "Global shortcut registered" in the logs
- You can reconfigure the binding in System Settings > Shortcuts
Check that your microphone is working and is the default input device:
arecord -lView logs for more details:
journalctl --user -u dictator.service -fOr run manually with logging:
RUST_LOG=info dictatorEnsure ydotool is installed (not needed if using paste_mode: "none"):
# Arch Linux
sudo pacman -S ydotool
# Debian/Ubuntu
sudo apt install ydotoolEnsure wl-copy is installed (required for all paste modes):
# Arch Linux
sudo pacman -S wl-clipboard
# Debian/Ubuntu
sudo apt install wl-clipboard- Check that your API server is running and accessible
- Verify the
api_urlin your config matches your server's address - Check network connectivity and firewall rules
- Review logs for specific error messages
MIT License - Copyright (c) 2025 Simon Jackson
See LICENSE for full details.
Inspired by hyprwhspr, reimplemented in Rust for better performance and reliability.