Skip to content

feat: Windows support — WASAPI audio, Parakeet, mouse tracker, GPU model#40

Open
sk8ersquare wants to merge 9 commits intopoodle64:mainfrom
sk8ersquare:pr/windows-support
Open

feat: Windows support — WASAPI audio, Parakeet, mouse tracker, GPU model#40
sk8ersquare wants to merge 9 commits intopoodle64:mainfrom
sk8ersquare:pr/windows-support

Conversation

@sk8ersquare
Copy link
Copy Markdown

Full Windows x86_64 support. Tested on Windows 11 with NVIDIA GPU.

What's included

CI: Windows build target

  • Added windows-latest to the release matrix
  • Windows MSI 255-per-component version limit: CI patches 2026.x.y26.x.y for the Windows build only (Tauri's NSIS installer doesn't have this limit; only MSI does)
  • Removes invalid tauri.conf.json fields that caused Windows builds to fail

Audio capture: WASAPI sample format handling

Windows WASAPI devices commonly report i16 or i32 as their native sample format. The recorder was always requesting f32 from cpal, causing the audio stream to fail or produce silence on many Windows devices.

Fixed: detects the native SampleFormat and converts i16/i32/u16f32 in the callback. Same pattern used by Handy.

Parakeet (Sherpa-ONNX) on Windows

Parakeet's Sherpa-ONNX static archive works on Windows (unlike Linux where the archive lacks .a files). Enabled via --features parakeet in Windows CI.

Fixes required:

  • default-features = false on sherpa-rs (the tts default feature conflicts with static + download-binaries)
  • .cargo/config.toml: /DEFAULTLIB:advapi32.lib — ONNX Runtime (embedded in sherpa-onnx) requires advapi32 for ETW telemetry symbols; espeak-ng needs it for registry access. sherpa-rs-sys doesn't declare this dependency.

Mouse tracker: Windows cursor following

The recording indicator (cursor-dot style) was always showing at the fallback position on Windows because get_mouse_position() had no Windows implementation (only macOS/Linux). After 90 failed polls (~1.4s) it moved off-screen.

Fixed: added Win32 GetCursorPos + GetDeviceCaps for DPI-aware logical pixel coordinates.

Platform-aware permissions UI

Overview pane previously showed macOS-only Accessibility and Input Monitoring rows on all platforms. Windows users saw permission steps they couldn't complete.

Fixed:

  • Accessibility (step 3) hidden behind {#if isMacos}
  • allRequiredDone only requires Model + Microphone on non-macOS
  • check_model_downloaded passes undefined instead of null (Tauri v2 serialization)

GPU model

Manifest includes parakeet-tdt-0.6b-v2-fp16 (FP16 ONNX). On Windows, users with NVIDIA GPUs and CUDA installed can download this model for ~3-4x faster transcription. Falls back to CPU automatically.

Tested

  • ✅ Windows 11, NVIDIA RTX, Whisper Small (CPU)
  • ✅ Overview pane loads correctly (2-step setup: model + mic)
  • ✅ Parakeet downloadable and functional
  • ✅ Recording indicator follows cursor

Not included (future work)

  • DirectML GPU acceleration (works on AMD/Intel/NVIDIA without CUDA — Handy's approach with ort-directml)
  • Windows tray icon improvements

Eve added 9 commits March 23, 2026 16:42
- windows-latest runner with --no-default-features (Whisper only)
- Install LLVM via choco (required for whisper-rs clang bindings)
- LIBCLANG_PATH set for Windows runner
- Config file permissions already correctly guarded with #[cfg(unix)]
Windows MSI/NSIS requires each version component to be <= 255.
Year-based version 2026.x.x breaks this constraint.
Set bundle.windows.version to 26.2.11.0 (century-stripped) for
Windows packaging while keeping the full version everywhere else.
…ields

- CI: strip century from version before Windows build (2026.x -> 26.x)
- Remove invalid bundle.windows.version and nsis.displayVersion from config
- Tauri 2.0 doesn't support those fields, version comes from Cargo.toml
Windows GPU:
- Add parakeet-tdt-0.6b-v2-fp16 to manifest.json (FP16 CUDA-optimised model)
- parakeet.rs: auto-detect fp16 vs int8 model files; try CUDA provider
  first on Windows when parakeet-cuda feature enabled, fall back to CPU
- Add nemo_transducer_fp16 model type to manifest.rs is_backend_available()
- Add parakeet-cuda Cargo feature for NVIDIA GPU Windows builds
- Add tauri-plugin-os for runtime platform detection

Platform-aware permissions UI (OverviewPane):
- Accessibility permission card: macOS only
- Input Monitoring permission card: macOS only
- Show in Dock toggle: macOS only
- Troubleshooting / TCC reset section: macOS only
- allPermissionsGranted derived: only requires Accessibility + Input
  Monitoring on macOS; Windows/Linux only require Microphone

macOS and Linux builds are 100% unchanged — all guards are additive
cfg/runtime checks that default to the existing behaviour.
… (v2026.3.2)

Three Windows bugs fixed:

1. Audio capture — native sample format handling (WASAPI)
   Windows WASAPI devices commonly report i16 or i32 as their native format.
   The recorder was always requesting f32 from cpal, which either failed or
   produced silent recordings when the device's native format didn't match.
   Now detects the native SampleFormat and converts i16/i32/u16 → f32 in
   the callback, so the rest of the pipeline always sees correct f32 audio.
   This fixes 'records for 2 seconds then stops' on Whisper Small.

2. Parakeet backend enabled on Windows
   CI was building Windows with --no-default-features, stripping the parakeet
   feature. Sherpa-ONNX static linking works on Windows (unlike Linux where
   the archive lacks .a files). Updated CI to --features parakeet and updated
   Cargo.toml to enable whisper-rs CPU backend (default features) on Windows.
   Parakeet models now show as downloadable and functional on Windows.

3. Overview setup flow — Windows/Linux
   - Accessibility (step 3) is hidden on non-macOS; not required there
   - allRequiredDone only requires model + mic on non-macOS
   - check_model_downloaded called with undefined instead of null to avoid
     Tauri v2 Option<String> deserialization edge case
The recording indicator was not following the cursor on Windows because
get_mouse_position() always returned None (only macOS/Linux were implemented).

Result: indicator appeared at bottom-centre fallback, then after ~1.4s
(90 failed polls × 16ms) the failure threshold hit and moved it off-screen.

Fix: add Windows implementation using GetCursorPos + GetDeviceCaps for DPI-
aware logical coordinate conversion. Matches how Handy's rdev dependency
handles Windows cursor input.
…wnload-binaries)

sherpa-rs-sys 0.6.8 build script errors when tts+static+download-binaries
are all enabled. The tts feature is enabled by default; adding
default-features = false resolves the conflict.
ONNX Runtime (embedded in sherpa-onnx static archive) requires advapi32.lib
for ETW telemetry symbols (EventWriteTransfer, EventRegister, etc.) and
espeak-ng needs it for registry access (RegOpenKeyExA, RegQueryValueExA).
sherpa-rs-sys doesn't add this automatically; adding it via build.rs.
… propagate to deps)

build.rs rustc-link-lib only affects the top-level crate's link step, not
transitive deps like sherpa-rs. Using .cargo/config.toml rustflags with
/DEFAULTLIB:advapi32.lib ensures the lib is available when link.exe
resolves sherpa-onnx's ETW and registry symbols.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant