Live Instant Subtitles & Real-time Multi-language Translation
DilMesh is a powerful desktop application that provides real-time speech-to-text and instant translation, capable of broadcasting subtitles to multiple windows simultaneously. It's designed for streamers, presenters, conference organizers, and anyone needing accessible, multilingual communication on the fly.
Choose from 5 built-in STT providers β switch between them live from Settings:
| Provider | Type | Description |
|---|---|---|
| Google Cloud (GCP) | βοΈ Cloud | Industry-leading accuracy. 60 min free/month. Supports interim results, auto-punctuation, enhanced models, confidence threshold, and profanity filtering. |
| Deepgram | βοΈ Cloud | Ultra-low latency with Nova-3 model. Built-in multi-language auto-detection, speaker diarization, smart formatting, filler word removal, and keyword boosting. |
| NVIDIA Riva | β‘ GPU Server | Self-hosted GPU inference via gRPC. For organizations with on-premise AI infrastructure. |
| Sherpa-ONNX | π£οΈ Offline | Fully offline ASR. Includes Omnilingual 1B model (1600+ languages), English Zipformer, Chinese/English bilingual, and Chinese Paraformer models. |
| Local Whisper | π§ Offline | OpenAI Whisper running locally via HuggingFace Transformers. Supports Tiny, Base, Small, Medium, Large v3 Turbo, and quantized variants. No internet required. |
Translate subtitles into any target language in real time:
| Provider | Type | Description |
|---|---|---|
| Google Cloud Translation | βοΈ Cloud | V2 Basic API β 500K characters free/month. |
| NLLB-200 | π§ Offline | Meta's 200-language model, ONNX quantized (~800MB). Runs 100% offline after download. |
| Riva NMT | β‘ GPU Server | Neural machine translation on self-hosted Riva server. |
| Disabled | π« β | Use DilMesh as a captions-only display without translation. |
- Open multiple independent subtitle windows simultaneously β one per audience group, language, or OBS scene.
- Each window is configured via a Preset and supports multiple Language Layers:
- Live Layer: Shows raw real-time captions as they are transcribed (no translation delay).
- Translation Layer: Receives finalized sentences, runs them through the translation pipeline, and displays them with CPS (Characters Per Second) pacing.
- Each layer has independent per-layer controls:
- Language selection (30+ languages)
- Font family, font size, text color
- X/Y position (percentage-based, fully flexible)
- Max lines limit
- Max width (in px) β enables side-by-side multi-language layouts on a single screen
- Text shadow toggle
- Create, edit, duplicate, and delete named presets for different subtitle configurations.
- Each preset stores its own language layers, background/chroma key color, vertical alignment, and display target.
- Presets are persisted on disk and survive app restarts.
- Live-editing: save a preset and instantly push the changes to the open window without restarting.
- Detect all connected monitors automatically.
- Assign a preset to a specific display β opening it will fullscreen on that display instantly.
- Windowed mode is also supported for flexible layouts.
- Integrated Silero VAD runs locally, only forwarding audio to the STT provider when speech is detected.
- Saves API costs, reduces noise, and improves transcription quality.
- Configurable sensitivity threshold and minimum silence duration sliders.
- Subtitle sentences are detected via punctuation-based clause detection β no waiting for silence.
- Configurable sentence-split characters (
.,!,?,β¦,,,;,:). - Detected clauses are sent for translation immediately, giving near-zero perceptible delay.
- CPS Queue Player: displays translated subtitles at a configurable reading speed (default: 17 CPS β Netflix standard). Queue depth is also configurable.
- When using Deepgram with multiple recognition languages selected, DilMesh opens one parallel WebSocket stream per language.
- Best result across all streams is selected per utterance based on confidence scores and dominant language bias.
- Supports
is_finalandspeech_finalevents for accurate sentence boundaries.
- Available for both GCP and Deepgram providers.
- Masked words are fully removed from captions β not passed to translation APIs.
- Real-time audio level visualization with peak indicators, displayed in the dashboard header.
- Automatically uses the configured microphone device.
- Enumerate all available audio input devices.
- Select any specific microphone from the Settings panel.
- Minimize DilMesh to the system tray β subtitle windows and transcription continue running in the background.
- Click the tray icon to show/hide the dashboard.
50+ languages supported across providers, including English (US/UK/AU), Turkish, German, French, Spanish, Italian, Portuguese, Russian, Arabic, Japanese, Korean, Chinese, Hindi, and many more. Deepgram also supports Multi (auto-detect) mode.
30+ languages available for subtitle translation layers, grouped by region: Western Europe, Eastern Europe, Nordic, Baltic, Asia, Middle East, Africa. Includes Chinese Simplified, Traditional, and Cantonese variants.
Control presets, transcription, and open/close windows from a single place.
Broadcast subtitles to multiple windows or project them fullscreen on a specific display.
- Node.js v18 or higher
- pnpm (this project uses pnpm for package management)
- A supported STT provider API key (Google Cloud, Deepgram, etc.) β or use offline providers with no key needed.
-
Clone the repository:
git clone https://github.com/antlionguard/dilmesh.git cd dilmesh -
Install dependencies:
pnpm install
-
Run in Development Mode:
pnpm dev
To create a distributable application for your OS:
-
macOS (DMG/App):
pnpm build:mac
-
Windows (NSIS/Portable):
pnpm build:win
-
Both platforms:
pnpm build:all
- Create a project in Google Cloud Console.
- Enable Cloud Speech-to-Text API and Cloud Translation API.
- Create a Service Account and download the JSON Key File.
- In DilMesh, go to Settings β API Integrations β Google Cloud.
- Paste the contents of your JSON key file.
Free Tier: Speech-to-Text is free for the first 60 min/month (Standard model). Translation V2 is free for the first 500K characters/month.
- Sign up at console.deepgram.com β you get $200 free credit on signup.
- Create an API key.
- In DilMesh, go to Settings β API Integrations β Deepgram and paste your key.
- Configure the model (Nova-3 recommended), language, and options in Settings β Speech-to-Text.
- Go to Settings β Speech-to-Text β Whisper.
- Download a model (Tiny to Large). Recommended:
Large V3 Turbo Q8(~834MB) for best accuracy/speed balance. - Select the downloaded model and start transcription.
- Go to Settings β Speech-to-Text β Sherpa-ONNX.
- Download a model (e.g., Omnilingual 1B for 1600+ language support, or language-specific models).
- Select the downloaded model as active.
- Go to Settings β Translation β NLLB-200.
- Click Download (~800MB, one time).
- After download, NLLB is used automatically for all translation layers when selected.
- Set up a Riva Server (requires an NVIDIA GPU on the server side).
- In DilMesh, go to Settings β API Integrations β NVIDIA Riva.
- Enter your server URL (e.g.,
localhost:50051) and SSL settings if needed.
Microphone β Silero VAD β Active STT Provider
β
Punctuation Detector
(real-time clause detection)
β
ββββββββββββββββββββββ΄βββββββββββββββββββββββ
β β
Live Caption Layers Translation Layers
(instant, no delay) (clause β Translation API)
β β
Projection Window CPS Queue Player
(per-layer display) (timed display pacing)
Each projection window can have any number of language layers stacked on top of each other or placed side-by-side, all fed from the same single audio input.
We love contributions! Whether it's fixing bugs, adding new languages, implementing new STT/translation providers, or improving the UI.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Distributed under the MIT License. See LICENSE for more information.
If you find this project useful, you can support its development!


