Hybrid voice dictation for Windows — types in any active text field using local offline faster-whisper or optional Azure/Sarvam AI cloud backends.
| Feature | Details |
|---|---|
| Hybrid STT | Offline faster-whisper (CPU/GPU) + Azure Speech / Sarvam AI cloud fallback |
| GPU Accelerated | High-performance transcription via CUDA/cuBLAS for NVIDIA RTX GPUs |
| Real-Time Previews | Live text appears in the overlay as you speak (sub-second latency) |
| Sarvam AI Support | High-performance STT for Indian languages via WebSocket streaming |
| LLM Text Polishing | Integrated with local Ollama to automatically fix grammar, rewrite, or act as a conversational assistant |
| Always available | System tray icon — lives quietly in your taskbar |
| Floating mic button | Draggable, always-on-top toggle with modern, crisp icons for all states |
| Countdown ring | Sweep arc on the mic button shows remaining recording time (green → amber → red) |
| Global hotkey | Configurable (default Ctrl+Alt+D), toggle or push-to-talk |
| Types anywhere | Injects text at your cursor in any Windows app |
| Transcription preview | Floating dark overlay shows live tentative text and last 3 dictated lines |
| Live level meter | smooth live waveform visualizer in the preview overlay shows mic amplitude |
| Spoken punctuation | "period" → . "comma" → , "new line" → ↵ etc. |
| Auto-capitalisation | Capitalises after sentence endings automatically |
| Word corrections | Persistent find-and-replace rules applied after every transcription |
| Session history | Searchable log of every dictated utterance — copy, export, or clear |
| Auto-update checker | Silently checks GitHub Releases at startup; notifies when a new version is available |
| VAD filtering | WebRTC Voice Activity Detection — CPU only active while you speak |
| Secure key storage | API keys stored in Windows Credential Manager (DPAPI) |
| Model selector | tiny / base / small (recommended) / medium / large |
| Multi-language | 20+ languages via Whisper; specialized Indian support via Sarvam |
| Start with Windows | Optional registry key entry |
| App Launcher | Bind custom voice triggers (e.g. "open notepad") to run local applications |
| Profile Sync | Export and Import settings configurations to/from a JSON file |
| Premium Branding | Glowing neon mic window and tray icons with live status indicator dots |
| Standalone .exe | Build with PyInstaller — no Python install needed for end-users |
- Windows 10 / 11 (x64)
- Python 3.11, 3.12, or 3.13 — python.org/downloads
- A microphone
git clone https://github.com/RhythmicDias/DictateAnywhere.git
cd DictateAnywhere
scripts\create_venv.bat
scripts\install.batscripts\run.batDictateAnywhere starts silently and appears in the system tray (bottom-right of your taskbar).
| Action | How |
|---|---|
| Start / stop dictation | Press Ctrl+Alt+D (configurable) |
| Start / stop dictation | Click the floating mic button |
| Start / stop dictation | Right-click tray icon → Start / Stop |
| Open settings | Right-click tray icon → Settings… |
| View session history | Right-click tray icon → Session History |
| Toggle preview overlay | Right-click tray icon → Toggle Preview |
| Move floating button | Click and drag it anywhere on screen |
| Quit | Right-click tray icon → Quit |
While recording, a dark floating bar appears at the bottom of your screen showing:
- ● Listening… status with a smooth live waveform visualizer to show your microphone activity
- The last three dictated lines (newest is white; older lines dim)
The overlay auto-hides a few seconds after dictation ends. You can drag it anywhere and close it with ✕. The hide delay is configurable in Settings → Advanced.
When recording is active, a sweep arc appears inside the mic button tracking how much of the maximum recording time has been used:
| Ring colour | Meaning |
|---|---|
| Green | > 50% of time remaining |
| Amber | 20–50% of time remaining |
| Red | < 20% remaining — wrapping up soon |
A small seconds counter appears at the top of the button. The ring and counter disappear instantly when recording ends.
| Say | Gets typed |
|---|---|
| "period" / "full stop" | . |
| "comma" | , |
| "question mark" | ? |
| "exclamation mark" | ! |
| "new line" | line break |
| "new paragraph" | blank line |
| "semicolon" | ; |
| "colon" | : |
| "open quote" / "close quote" | " |
| "dash" | — |
| "ellipsis" | … |
| "delete that" / "scratch that" | removes the spoken word |
Open Settings → Corrections to define find-and-replace rules applied after every transcription.
| Column | Description |
|---|---|
| Find | The word or phrase Whisper tends to get wrong |
| Replace | What you actually want typed |
Rules are case-insensitive by default. They are stored in %APPDATA%\DictateAnywhere\corrections.json and apply on top of Whisper's output after spoken punctuation processing.
Example rules:
| Find | Replace |
|---|---|
colour |
color |
Starbucks |
Starburst |
gonna |
going to |
Open Settings from the tray icon menu. Changes take effect immediately.
- Engine mode —
hybrid(recommended),local, orcloud - Whisper model size —
tinyis the fastest;smallis the best CPU/accuracy tradeoff - Compute type —
int8(efficient) orfloat16(fastest for GPU) - Local device —
cuda(recommended for NVIDIA) orcpu - Language — BCP-47 code (
en,fr,de,auto, …)
- Choose microphone device
- VAD aggressiveness (0–3)
- Silence timeout before auto-stop
- Max recording length
- Set any key combination (e.g.
ctrl+alt+d,f9,ctrl+shift+space) - Choose between toggle and push-to-talk modes
- Show / hide the floating button
- Size and opacity
- Always-on-top toggle
- Enable or disable the transcription preview overlay
- Configure the auto-hide delay (ms)
- Opacity & Color — customize the overlay's transparency and text color for better visibility
- Add, edit, and remove word correction rules
- Rules are applied after every transcription in the order listed
- Map spoken commands (e.g. "open notepad") to specific local application executable files to launch them instantly.
- Azure Speech API key (stored securely in Windows Credential Manager)
- Azure region
- Test connection button
- Toggle spoken punctuation
- Toggle auto-capitalisation
- Text injection method (clipboard or SendInput)
- Start with Windows
- Backup & Sync — "Export Settings" and "Import Settings" buttons at the bottom of the dialog box to save or load JSON profiles.
- Log level
- Updates section — enable/disable automatic update checks; "Check now" button for an on-demand check
Transform your speech in real-time using Large Language Models.
- Install Ollama
- Download a model (e.g.,
ollama run llama3) - In DictateAnywhere Settings → Polish, select
ollamaand pick your model.
- Get an API key from Google AI Studio
- In DictateAnywhere Settings → Cloud STT, paste your Gemini API key.
- In Settings → Polish, select
geminiand choose a model (e.g.,gemini-flash-lite-latest).
Use the Custom Prompt action to turn DictateAnywhere into a real-time translator or specialized assistant.
- Example Prompt: "Translate the following text to Spanish and output ONLY the translation."
- Example Prompt: "Format the following text as a SQL query."
DictateAnywhere supports Google's Gemini models for highly accurate, multi-language cloud transcription.
- Get a free-tier API key from Google AI Studio.
- Paste the key in the Cloud STT tab.
- Switch your Engine Mode to
geminiin the Engine tab.
The free tier of Azure Speech gives you 5 hours of transcription per month at no cost.
- Create a free Azure account at azure.microsoft.com/free
- Create a Speech resource (Free tier F0)
- Copy your Key 1 and Region
- In DictateAnywhere Settings → Azure Cloud tab, paste the key and set the region
- The key is encrypted by Windows and never touches disk in plain text
For the best experience with Hindi, Malayalam, Tamil, etc., DictateAnywhere integrates with Sarvam AI.
- Get an API key from the Sarvam AI Dashboard.
- In DictateAnywhere Settings → Sarvam AI tab:
- Paste your API Key (stored securely in Windows Credential Manager).
- Select your preferred model (e.g.,
saaras:v3). - Enable WebSocket Streaming for ultra-low latency real-time transcription.
- Switch the Engine Mode to
sarvamin the Engine tab.
DictateAnywhere silently checks GitHub Releases at startup (after a 15-second delay) and shows a notification if a newer version is available.
- At most once per day — subsequent launches that day skip the check
- Three choices when a new version is found:
- Download — opens your browser to the GitHub release page
- Skip this version — suppresses notifications for that specific release (persisted to config)
- Remind me later — dismisses; will show again on the next daily check
- Manual check — Settings → Advanced → Updates → Check now
- Disable entirely — uncheck "Check for updates automatically"
Right-click the tray icon → Session History to open a log of every utterance dictated in the current run.
| Control | Action |
|---|---|
| Search box | Filter utterances by text |
| Copy | Copies selected entry to clipboard |
| Export | Saves full history as a .txt file |
| Clear | Wipes the history list |
History is in-memory only and resets on restart. Use Export to save anything you need to keep.
No Python required on the target machine after building.
scripts\build.batOutput: dist\DictateAnywhere\DictateAnywhere.exe
# Activate the venv
.venv\Scripts\activate
# Run tests
scripts\test.bat
# Run in dev mode (console visible)
scripts\run_dev.bat
# Lint
pip install ruff
ruff check src\DictateAnywhere/
├── src/dictateanywhere/
│ ├── main.py ← entry point & orchestrator
│ ├── __init__.py ← version string
│ ├── audio/
│ │ ├── capture.py ← mic input, RMS level callback, TimedCapture
│ │ └── vad.py ← WebRTC voice activity detection
│ ├── transcription/
│ │ ├── engine.py ← abstract STT base class
│ │ ├── local_engine.py ← faster-whisper (offline)
│ │ └── cloud_engine.py ← Azure Speech SDK (cloud)
│ ├── core/
│ │ ├── hotkey_manager.py ← global hotkey registration
│ │ ├── text_injector.py ← types text at cursor (clipboard / SendInput)
│ │ ├── punctuation.py ← spoken → symbol conversion
│ │ ├── corrections.py ← word correction rules (corrections.json)
│ │ └── updater.py ← GitHub Releases update checker
│ ├── ui/
│ │ ├── tray.py ← system tray icon (pystray)
│ │ ├── floating_widget.py ← draggable mic button + countdown ring
│ │ ├── preview_window.py ← transcription overlay + level meter
│ │ ├── history_window.py ← session history viewer
│ │ └── settings_window.py ← tabbed settings dialog (tkinter)
│ └── utils/
│ ├── config.py ← JSON config in %APPDATA%\DictateAnywhere\
│ └── secure_storage.py ← API key → Windows Credential Manager
├── tests/ ← pytest test suite
├── scripts/ ← install, run, build, test .bat files
├── assets/icons/ ← .ico / .png for tray and .exe
├── .github/workflows/ ← CI + release workflow
├── requirements.txt
├── pyproject.toml
└── README.md
| File | Contents |
|---|---|
config.json |
All settings (hotkey, engine, UI prefs, update state) |
corrections.json |
Word correction rules |
dictateanywhere.log |
Rolling application log |
- Per-app language profiles (e.g. English for Word, French for LibreOffice)
- Multi-monitor floating widget awareness
- macOS / Linux support (pynput instead of pywin32)
- Real-time streaming transcription (Local & Sarvam WebSocket)
- GPU acceleration support (CUDA via CTranslate2)
- Noise floor auto-calibration
- Custom wake word to start recording hands-free
Pull requests are welcome! Please:
- Fork the repo and create a feature branch
- Run
scripts\test.batand make sure all tests pass - Follow the existing code style (type hints, docstrings, no
print(), use thelogger) - Open a pull request with a clear description
MIT — see LICENSE.
- faster-whisper — Optimised Whisper inference by SYSTRAN
- OpenAI Whisper — The original speech recognition model
- Azure Cognitive Services Speech
- pystray — System tray icon library
- webrtcvad — WebRTC voice activity detection
- sounddevice — PortAudio bindings for Python