Dictator

A voice transcription daemon for Linux/Wayland that enables system-wide voice-to-text input using OpenAI-compatible transcription APIs.

Features

Remote Transcription: Uses OpenAI-compatible APIs (OpenAI, Groq, local Whisper servers)
System-wide Hotkey: Global keyboard shortcut via XDG Desktop Portal to start/stop recording
Flexible Text Injection: Multiple paste modes including clipboard-only, Ctrl+V, Ctrl+Shift+V, and Super+V
Audio Feedback: Optional sound effects for recording start/stop
Word Overrides: Custom case-insensitive replacements for commonly misheard words
Voice Commands: Extensive punctuation and symbol commands (period, comma, new line, etc.)
High-Performance Audio: Async Rust implementation with lock-free ring buffers for real-time capture
Configurable Retries: Built-in timeout and retry logic for API requests

Requirements

Linux with Wayland compositor supporting XDG Desktop Portal (GNOME, KDE Plasma, COSMIC, etc.)
xdg-desktop-portal and a compositor-specific backend (e.g., xdg-desktop-portal-gnome, xdg-desktop-portal-kde)
Rust toolchain (for building)
wl-copy (for clipboard operations)
ydotool (for auto-paste modes - not needed if using paste_mode: "none")
Audio input device (microphone)
OpenAI-compatible transcription API (local or remote)

Installation

Build from source

git clone https://github.com/sizeak/dictator
cd dictator
cargo build --release
sudo cp target/release/dictator /usr/local/bin/

Configure

Create the configuration directory and copy the example config:

mkdir -p ~/.config/dictator
cp assets/config.example.json ~/.config/dictator/config.json

Edit ~/.config/dictator/config.json with your settings:

{
  "api_url": "http://localhost:8000/v1",
  "api_key": "your-api-key-here",
  "model": "Systran/faster-distil-whisper-large-v3",
  "paste_mode": "ctrl_shift",
  "audio_feedback": true,
  "language": "en"
}

Install systemd service (optional)

For automatic startup:

mkdir -p ~/.config/systemd/user
cp systemd/dictator.service ~/.config/systemd/user/
systemctl --user enable dictator.service
systemctl --user start dictator.service

Usage

Running manually

dictator

The daemon will start and register a global shortcut (default: Logo+Alt+D) via XDG Desktop Portal. You can reconfigure the binding in your desktop's System Settings > Shortcuts.

Using the daemon

Press the shortcut to start recording (you'll hear a beep if audio feedback is enabled)
Speak your text
Press the shortcut again to stop recording
The text will be transcribed, processed, and either auto-pasted or copied to clipboard depending on your paste_mode setting

Configuration Options

All configuration is stored in ~/.config/dictator/config.json.

Required Settings

api_url: Base URL for the OpenAI-compatible API (e.g., "http://localhost:8000/v1")
api_key: API authentication key
model: Model name for transcription
- For local servers: model path (e.g., "Systran/faster-distil-whisper-large-v3")
- For OpenAI: "whisper-1"

Optional Settings

paste_mode: How to handle transcribed text (default: "ctrl_shift")
- "none": Copy to clipboard only, no auto-paste
- "ctrl": Auto-paste using Ctrl+V
- "ctrl_shift": Auto-paste using Ctrl+Shift+V
- "super": Auto-paste using Super+V
audio_feedback: Enable/disable sound effects (default: true)
start_sound_path: Path to recording start sound (default: "ping-up.ogg")
- Relative paths are resolved from executable location or use absolute paths
stop_sound_path: Path to recording stop sound (default: "ping-down.ogg")
complete_sound_path: Path to completion notification sound (default: "ping-complete.ogg")
- Plays when transcription completes and text is injected/copied to clipboard
language: Two-letter language code for transcription (e.g., "en", "es", "fr")
- If not specified, API will auto-detect language
whisper_prompt: Optional prompt to guide transcription style/context
- Can improve accuracy for domain-specific vocabulary

word_overrides: Dictionary of case-insensitive word/phrase replacements

"word_overrides": {
  "open ai": "OpenAI",
  "rust": "Rust",
  "dictator": "Dictator"
}

timeout: API request timeout in seconds (default: 30)
max_retries: Number of retry attempts for failed API requests (default: 2)

Voice Commands

Dictator supports voice commands for punctuation and symbols. Say the command word to insert the corresponding character:

Punctuation:

period → .
comma → ,
question mark → ?
exclamation mark → !
colon → :
semicolon → ;

Whitespace:

new line → \n
tab → \t

Symbols:

dash → -
underscore → _
slash → /
backslash → \
pipe → |
at symbol → @
hash → #
dollar sign → $
percent → %
caret → ^
ampersand → &
asterisk → *
plus → +
equals → =
tilde → ~

Brackets:

open paren / close paren → ( / )
open bracket / close bracket → [ / ]
open brace / close brace → { / }
less than / greater than → < / >

Quotes:

quote → "
single quote → '
backtick → `

Architecture

Dictator uses a modular service-based architecture:

App: Main application loop handling state transitions
Recorder: Audio capture service using cpal with lock-free ring buffers
Transcriber: Handles OpenAI-compatible API communication
TextProcessor: Applies word overrides and voice command transformations
AudioFeedback: Plays sound effects using rodio
TextInjector: Manages clipboard and keyboard simulation via wl-copy and ydotool

Audio is captured in 16-bit signed PCM format at 16kHz mono, streamed to temporary WAV files as recording happens, then sent to the transcription API.

The application uses Tokio's async runtime with a LocalSet to handle !Send futures from the audio capture library.

Troubleshooting

Keyboard shortcut not working

The shortcut is registered via XDG Desktop Portal's GlobalShortcuts interface. Ensure:

xdg-desktop-portal is installed and running
Your compositor's portal backend is installed (e.g., xdg-desktop-portal-gnome, xdg-desktop-portal-kde)
Check if the shortcut was registered: look for "Global shortcut registered" in the logs
You can reconfigure the binding in System Settings > Shortcuts

Audio not recording

Check that your microphone is working and is the default input device:

arecord -l

View logs for more details:

journalctl --user -u dictator.service -f

Or run manually with logging:

RUST_LOG=info dictator

Text not injecting

Ensure ydotool is installed (not needed if using paste_mode: "none"):

# Arch Linux
sudo pacman -S ydotool

# Debian/Ubuntu
sudo apt install ydotool

Ensure wl-copy is installed (required for all paste modes):

# Arch Linux
sudo pacman -S wl-clipboard

# Debian/Ubuntu
sudo apt install wl-clipboard

API connection issues

Check that your API server is running and accessible
Verify the api_url in your config matches your server's address
Check network connectivity and firewall rules
Review logs for specific error messages

License

See LICENSE for full details.

Credits

Inspired by hyprwhspr, reimplemented in Rust for better performance and reliability.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.vscode		.vscode
assets		assets
src		src
systemd		systemd
.gitattributes		.gitattributes
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dictator

Features

Requirements

Installation

Build from source

Configure

Install systemd service (optional)

Usage

Running manually

Using the daemon

Configuration Options

Required Settings

Optional Settings

Voice Commands

Architecture

Troubleshooting

Keyboard shortcut not working

Audio not recording

Text not injecting

API connection issues

License

Credits

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

sizeak/dictator

Folders and files

Latest commit

History

Repository files navigation

Dictator

Features

Requirements

Installation

Build from source

Configure

Install systemd service (optional)

Usage

Running manually

Using the daemon

Configuration Options

Required Settings

Optional Settings

Voice Commands

Architecture

Troubleshooting

Keyboard shortcut not working

Audio not recording

Text not injecting

API connection issues

License

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages