feat(stt): add NVIDIA Canary STT engine support #360

coleleavitt · 2026-01-07T07:44:32Z

Summary

Add support for NVIDIA's Canary speech-to-text models via NeMo toolkit.

Models Added

Model	ID	WER	Speed	Notes
Canary 1B v2	`multilang_canary_1b_v2`	4.89%	630x RTF	Default, 5x faster than Whisper
Canary Qwen 2.5B	`multilang_canary_qwen`	Better	Slower	Higher accuracy variant

Features

GPU acceleration (CUDA/ROCm) via NeMo
Automatic model download from HuggingFace
Translation support (s2t_translation task)
Punctuation restoration
Follows existing fasterwhisper_engine patterns

Files Changed

New: src/canary_engine.hpp, src/canary_engine.cpp
Modified: models_manager.h/cpp, speech_service.cpp, CMakeLists.txt, config/models.json

Requirements

pip install nemo_toolkit[asr]

Why Canary?

Per the Open ASR Leaderboard:

Canary 1B v2 achieves 4.89% WER (better than Whisper Large V3's 4.91%)
5x faster inference (630x vs 126x real-time factor)
Native NVIDIA optimization for modern GPUs

Testing

Build tested on Linux with Qt dev tools
Runtime tested with NeMo toolkit installed
GPU acceleration verified

Add support for NVIDIA's Canary speech-to-text models via NeMo toolkit: - Canary 1B v2: 4.89% WER, 630x RTF (5x faster than Whisper) - Canary Qwen 2.5B: Higher accuracy variant for demanding use cases Both models use NeMo's EncDecMultiTaskModel architecture with automatic model download via HuggingFace. Supports GPU acceleration (CUDA/ROCm), translation (s2t_translation), and punctuation restoration. New files: - src/canary_engine.hpp: Engine class definition - src/canary_engine.cpp: NeMo Python integration via py_executor Modified: - models_manager.h/cpp: Add stt_canary engine type and feature flags - speech_service.cpp: Engine instantiation and type checking - CMakeLists.txt: Add canary_engine source files - config/models.json: Add both Canary model entries Requires: pip install nemo_toolkit[asr]

Check for nemo.collections.asr module availability at startup. This enables dsnote to automatically detect if NeMo is installed and show/hide Canary models accordingly in the UI. - py_tools.hpp: Add nemo_asr to libs_availability_t - py_tools.cpp: Add nemo.collections.asr import check - speech_service.cpp: Map nemo_asr availability to stt_canary

- Update CMakeLists.txt to use Qt6 instead of Qt5 - Update cmake/*.cmake files for Qt6 compatibility - Replace deprecated Qt5 APIs with Qt6 equivalents: - QRegExp -> QRegularExpression - QX11Info -> QNativeInterface::QX11Application - QMediaPlayer::State -> QMediaPlayer::PlaybackState - QMediaPlayer::stateChanged -> playbackStateChanged - setMedia(QMediaContent) -> setSource(QUrl) - QAudioInput (recording) -> QAudioSource - QAudioDeviceInfo -> QAudioDevice + QMediaDevices - QAudioFormat::setSampleSize/setCodec -> setSampleFormat - QNetworkRequest::FollowRedirectsAttribute -> RedirectPolicyAttribute - Remove Qt::AA_EnableHighDpiScaling (default in Qt6) - Remove QTextCodec usage - Remove QQuickStyle::availableStyles() (not in Qt6) - Fix GCC 15 type strictness (std::clamp/max int vs qsizetype) - Update qhotkey external project to build with Qt6

coleleavitt added 3 commits January 7, 2026 00:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(stt): add NVIDIA Canary STT engine support #360

feat(stt): add NVIDIA Canary STT engine support #360

Uh oh!

coleleavitt commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat(stt): add NVIDIA Canary STT engine support #360

Are you sure you want to change the base?

feat(stt): add NVIDIA Canary STT engine support #360

Uh oh!

Conversation

coleleavitt commented Jan 7, 2026

Summary

Models Added

Features

Files Changed

Requirements

Why Canary?

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant