WhisperTray

A Windows system-tray application for real-time speech-to-text dictation, powered by OpenAI's Whisper running entirely on the GPU via DirectCompute.

Press a hotkey, speak, and the transcribed text is typed into whatever window has focus — as if you had typed it with the keyboard.

Acknowledgments

The GPU inference engine is based on Const-me/Whisper by Konstantin, a Windows-native port of whisper.cpp that replaces CUDA/GGML with DirectCompute (D3D 11.0). All credit for the core inference engine, the DirectCompute shader pipeline, and the COM-lite interface layer goes to Konstantin's excellent work.

WhisperTray, the build scripts, and various cleanups are original additions.

Features

Global hotkey (Ctrl+Shift+Space by default) to start/stop capture
System tray icon with color-coded status (gray = idle, gold = loading, green = listening)
Automatic voice activity detection — recognizes speech segments and pauses
Hallucination filtering — strips phantom tokens like [BLANK_AUDIO] and (music)
Direct text injection via SendInput — works in any application, full Unicode support
GPU-accelerated inference via DirectCompute — no CUDA required, works on any D3D 11.0 GPU (NVIDIA, AMD, Intel)
Configurable microphone, language, model, hotkey, and timing via INI settings
No .NET runtime required — pure C++ with statically linked CRT

Requirements

64-bit Windows 8.1 or later
DirectX 11.0 capable GPU
CPU with AVX1 and F16C support
A Whisper GGML model file (see Models)

Building

Prerequisites: Visual Studio 2026 Build Tools (or the full IDE) with the following components:

Desktop development with C++ workload
MSVC v143+ C++ compiler toolset
Windows 10/11 SDK

The full Visual Studio IDE is not required — the free Build Tools package is sufficient. No .NET SDK needed.

Run from the repository root:

build.bat

The build has four steps:

ComputeShaders — compiles 46 HLSL compute shaders to DXBC bytecode
CompressShaders — LZ4-compresses the shader binaries into a blob for static linking
Whisper.lib — builds the DirectCompute inference engine as a static library
WhisperTray.exe — builds the tray application (statically links the inference engine)

Alternatively, open WhisperTray.sln in Visual Studio and build Release|x64.

Building from WSL

Building directly from a WSL mount (e.g. /mnt/c/...) can cause issues with MSBuild due to path handling and file locking across the Windows/Linux filesystem boundary. To work around this, use wsl-build.bat which copies the source tree to a native Windows directory (%SystemDrive%\Whisper-Compiling), runs the build there, and copies the results back to dist/.

Usage

Download a GGML model file (see below).
Run dist\WhisperTray.exe. On first launch, it will prompt you to select a model file.
Press Ctrl+Shift+Space (or click the tray icon) to start listening. The tray icon turns green.
Speak. Transcribed text is typed into the focused window.
Press the hotkey again (or click the tray icon) to stop.

Right-click the tray icon to:

Select a different microphone
Change the model file

Settings

Settings are stored in %APPDATA%\WhisperTray\settings.ini:

Setting	Default	Description
`ModelPath`	(prompts on first run)	Path to the GGML `.bin` model file
`LanguageCode`	`en`	Whisper language code
`HotkeyModifiers`	`Ctrl+Shift`	Hotkey modifier keys
`HotkeyVirtualKey`	`Space`	Hotkey key
`MinDuration`	`3.0` s	Minimum audio chunk duration
`MaxDuration`	`11.0` s	Maximum audio chunk duration
`PauseDuration`	`0.6` s	Silence gap that ends a speech segment

Models

Download GGML model files from Hugging Face. Supported sizes:

Model	Parameters	Disk Size	Notes
`tiny`	39M	~75 MB	Fastest, lowest accuracy
`base`	74M	~142 MB	Good balance for quick dictation
`small`	244M	~466 MB	Better accuracy
`medium`	769M	~1.5 GB	High accuracy
`large`	1550M	~2.9 GB	Best accuracy, requires significant VRAM

For real-time dictation, base or small models are recommended.

License

This project is licensed under the Mozilla Public License 2.0.

Based on:

Const-me/Whisper by Konstantin — DirectCompute inference engine (MPL 2.0)
whisper.cpp by Georgi Gerganov — original C/C++ Whisper port (MIT)
LZ4 by Yann Collet — compression library (BSD 2-Clause)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
ComLightLib		ComLightLib
CompressShaders		CompressShaders
ComputeShaders		ComputeShaders
Whisper		Whisper
WhisperTray		WhisperTray
.gitattributes		.gitattributes
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
WhisperTray.sln		WhisperTray.sln
build.bat		build.bat
wsl-build.bat		wsl-build.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WhisperTray

Acknowledgments

Features

Requirements

Building

Building from WSL

Usage

Settings

Models

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WhisperTray

Acknowledgments

Features

Requirements

Building

Building from WSL

Usage

Settings

Models

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages