A Windows system-tray application for real-time speech-to-text dictation, powered by OpenAI's Whisper running entirely on the GPU via DirectCompute.
Press a hotkey, speak, and the transcribed text is typed into whatever window has focus — as if you had typed it with the keyboard.
The GPU inference engine is based on Const-me/Whisper by Konstantin, a Windows-native port of whisper.cpp that replaces CUDA/GGML with DirectCompute (D3D 11.0). All credit for the core inference engine, the DirectCompute shader pipeline, and the COM-lite interface layer goes to Konstantin's excellent work.
WhisperTray, the build scripts, and various cleanups are original additions.
- Global hotkey (
Ctrl+Shift+Spaceby default) to start/stop capture - System tray icon with color-coded status (gray = idle, gold = loading, green = listening)
- Automatic voice activity detection — recognizes speech segments and pauses
- Hallucination filtering — strips phantom tokens like
[BLANK_AUDIO]and(music) - Direct text injection via
SendInput— works in any application, full Unicode support - GPU-accelerated inference via DirectCompute — no CUDA required, works on any D3D 11.0 GPU (NVIDIA, AMD, Intel)
- Configurable microphone, language, model, hotkey, and timing via INI settings
- No .NET runtime required — pure C++ with statically linked CRT
- 64-bit Windows 8.1 or later
- DirectX 11.0 capable GPU
- CPU with AVX1 and F16C support
- A Whisper GGML model file (see Models)
Prerequisites: Visual Studio 2026 Build Tools (or the full IDE) with the following components:
- Desktop development with C++ workload
- MSVC v143+ C++ compiler toolset
- Windows 10/11 SDK
The full Visual Studio IDE is not required — the free Build Tools package is sufficient. No .NET SDK needed.
Run from the repository root:
build.batThe build has four steps:
- ComputeShaders — compiles 46 HLSL compute shaders to DXBC bytecode
- CompressShaders — LZ4-compresses the shader binaries into a blob for static linking
- Whisper.lib — builds the DirectCompute inference engine as a static library
- WhisperTray.exe — builds the tray application (statically links the inference engine)
Alternatively, open WhisperTray.sln in Visual Studio and build Release|x64.
Building directly from a WSL mount (e.g. /mnt/c/...) can cause issues with MSBuild due to path handling and file locking across the Windows/Linux filesystem boundary. To work around this, use wsl-build.bat which copies the source tree to a native Windows directory (%SystemDrive%\Whisper-Compiling), runs the build there, and copies the results back to dist/.
- Download a GGML model file (see below).
- Run
dist\WhisperTray.exe. On first launch, it will prompt you to select a model file. - Press
Ctrl+Shift+Space(or click the tray icon) to start listening. The tray icon turns green. - Speak. Transcribed text is typed into the focused window.
- Press the hotkey again (or click the tray icon) to stop.
Right-click the tray icon to:
- Select a different microphone
- Change the model file
Settings are stored in %APPDATA%\WhisperTray\settings.ini:
| Setting | Default | Description |
|---|---|---|
ModelPath |
(prompts on first run) | Path to the GGML .bin model file |
LanguageCode |
en |
Whisper language code |
HotkeyModifiers |
Ctrl+Shift |
Hotkey modifier keys |
HotkeyVirtualKey |
Space |
Hotkey key |
MinDuration |
3.0 s |
Minimum audio chunk duration |
MaxDuration |
11.0 s |
Maximum audio chunk duration |
PauseDuration |
0.6 s |
Silence gap that ends a speech segment |
Download GGML model files from Hugging Face. Supported sizes:
| Model | Parameters | Disk Size | Notes |
|---|---|---|---|
tiny |
39M | ~75 MB | Fastest, lowest accuracy |
base |
74M | ~142 MB | Good balance for quick dictation |
small |
244M | ~466 MB | Better accuracy |
medium |
769M | ~1.5 GB | High accuracy |
large |
1550M | ~2.9 GB | Best accuracy, requires significant VRAM |
For real-time dictation, base or small models are recommended.
This project is licensed under the Mozilla Public License 2.0.
Based on:
- Const-me/Whisper by Konstantin — DirectCompute inference engine (MPL 2.0)
- whisper.cpp by Georgi Gerganov — original C/C++ Whisper port (MIT)
- LZ4 by Yann Collet — compression library (BSD 2-Clause)