feat(voice): local voice dictation via Foundry Local + Nemotron streaming by HashwanthVen · Pull Request #385 · ianphil/chamber

HashwanthVen · 2026-06-10T23:45:14Z

Voice dictation: speak into Chamber locally (Nemotron + Foundry Local)

Summary

Adds local-first voice dictation to Chamber. Users hold a renderer-only shortcut (Alt+Shift+V) or click a mic button in the chat input to speak; transcripts insert at the caret of the active single-agent chat input. Never auto-sends — user always reviews before pressing Send.

Built on the same foundry-local-sdk@1.1.0 + Nemotron streaming model that the official GitHub Copilot CLI /voice uses, so the engineering pattern is upstream-validated.

Closes the gap in the PRD at VOICE_DICTATION_PRD.md (planning doc lives outside the repo).

Highlights

Local-only ASR. nemotron-speech-streaming-en-0.6b runs in a worker_threads.Worker via Foundry Local SDK. Audio never leaves the device. Model auto-downloads on first use (~696 MB CPU variant).
Push-to-talk keyboard shortcut, renderer-only (no globalShortcut — explicitly out of scope for Phase 1). Default Alt+Shift+V. Configurable hold-vs-toggle behavior.
Mic button in ChatInput with visible "Listening…" indicator. Hidden when feature flag is off; disabled with tooltip until the model is Ready.
Settings → Voice dictation section with six rows: device select, permission state + "Open preferences", Test mic readiness check, shortcut display, keyboard-shortcut-behavior toggle, model row with live download progress.
Feature-flagged behind voiceDictation (default off on stable, on for insiders + dev). Cleanly invisible when off.
Service-boundary clean. packages/services/** never imports electron or foundry-local-sdk; depcruise enforces the rule (new foundry-local-sdk-only-in-voice-workers rule added).
Pinned runtime. chamber-voice-runtime/ mirrors the existing chamber-copilot-runtime/ pattern. foundry-local-sdk@1.1.0 pinned exactly. forge.config.ts adds the SDK to asar.unpack for native prebuilds.

Architecture

Renderer (apps/web)
  ChatInput mic button + push-to-talk key handler
        ↓
  useVoiceDictation hook
        ↓  getUserMedia + AudioWorklet → PCM16 16 kHz mono frames
  window.electronAPI.voice.*  (preload contextBridge)
        ↓  IPC (IPC.VOICE.*)
Main (apps/desktop)
  setupVoiceIPC adapter  →  VoiceDictationService  →  VoiceWorkerPool
        ↓  worker_threads boundary
apps/desktop/src/main/voiceWorker/voiceWorker.ts
  FoundryLocalManager + LiveAudioTranscriptionSession (single worker; Foundry Core is a process-singleton)

Files changed

New:

packages/shared/src/voice-types.ts + test — config, IPC contracts, RPC verbs, payload size guard
packages/services/src/voice/ — VoiceDictationStore, VoiceDictationService, VoiceWorkerPool, FakeTranscriptionProvider, FoundryTranscriptionProvider, permissions/provider interfaces, all with colocated tests
apps/desktop/src/main/voice/PermissionInspector.ts + test — wraps systemPreferences.getMediaAccessStatus cross-platform
apps/desktop/src/main/voiceWorker/voiceWorker.ts + test — single worker handling both engine + installer verbs
apps/desktop/src/main/voiceWorker/workerProtocol.ts — shared RPC envelope
apps/desktop/src/main/ipc/voice.ts + test — thin IPC adapter with sender ownership + chunk size validation
apps/web/src/renderer/hooks/useVoiceDictation.ts + test — session lifecycle, sessionId routing, PTT, IME suppression, cleanup
apps/web/src/renderer/lib/audio/ — PCM16 encoder, AudioWorklet processor, captureMic helper
apps/web/src/renderer/components/settings/VoiceDictationSettingsSection.tsx + test
chamber-voice-runtime/ — pinned foundry-local-sdk@1.1.0
scripts/prepare-voice-runtime.js, scripts/check-voice-sdk-version.js, scripts/voice-runtime-smoke.js
apps/desktop/vite.voiceWorker.config.ts
tests/e2e/electron/voice-dictation-smoke.spec.ts

Modified (surgical):

packages/shared/src/ipc-channels.ts — IPC.VOICE.* + IPC.E2E.VOICE_*
packages/shared/src/electron-types.ts (+ test) — voice namespace
packages/shared/src/feature-flags.ts (+ test) — voiceDictation flag
apps/desktop/src/main.ts — wire VoiceDictationService when feature flag is on; honor CHAMBER_E2E_VOICE_FAKE=1 for deterministic tests
apps/desktop/src/preload.ts — expose voice namespace
apps/desktop/src/main/devFeatureFlags.ts — enable in dev
apps/desktop/src/main/security/sessionSecurity.ts (+ test) — allow media permission for Chamber renderer origins only
apps/web/src/renderer/components/chat/ChatInput.tsx (+ test) — mic button + PTT, never auto-submits
apps/web/src/renderer/components/settings/SettingsView.tsx (+ test) — gate VoiceDictationSettingsSection behind flag
forge.config.ts — asar.unpack for foundry-local-sdk native prebuilds; new build entry for voiceWorker.ts
config/dependency-cruiser.cjs — enforces foundry-local-sdk import is allowed only from the worker file
package.json / package-lock.json — foundry-local-sdk@1.1.0 pinned in dependencies + overrides + new smoke:voice-runtime script

Validation

npm run typecheck ✅
npm run lint ✅ (tsc + eslint + deps:check 547 modules / 0 boundary violations + yaml + md)
65+ voice unit + RTL tests ✅
npm run smoke:voice-runtime ✅ (new — exercises the compiled worker bundle against a mocked SDK; would have caught the singleton bug)
npm run smoke:desktop -- tests/e2e/electron/voice-dictation-smoke.spec.ts — 8 passed / 3 skipped / 0 failed. Three scenarios are test.fixme() with explanation: two need to spy chat.send but contextBridge freezes electronAPI (covered by chat-mic-inserts-sentinel which proves the visible textarea path); one is the second contextBridge case.
chamber-ui-tester driven against the real Foundry path on Windows: Test mic readiness ✅, model download with live progress ✅, chat mic listening state ✅. Edge cases (re-entrant start during stop) found two bugs that are now fixed in this branch — see 94dda24 and the voice-rt-fix commit chain.

Privacy + security

Audio never persisted, never logged, never sent over the network. keytar is not used (no secrets in the voice path).
Microphone permission gated by sessionSecurity allowlist — only Chamber's own renderer origin can request media; foreign origins denied.
worker_threads isolation between Foundry SDK and main process.
webPreferences (contextIsolation: true, nodeIntegration: false) unchanged.
All renderer ↔ main calls go through IPC.VOICE.* constants with zod schema validation and payload size limits (VOICE_MAX_APPEND_CHUNK_BYTES).
IPC sessions are owner-scoped: appendAudio/endSession reject from a non-owner WebContents.
gpt-5.5 security review: APPROVE, no findings.

Not in scope (Phase 2 candidates)

Streaming partial transcripts inserted live (currently only finals; partials shown as transient "Listening…")
Chatroom voice support (only single-agent chat)
Multiple model picker / cloud transcription
Voice activity detection / auto-stop on silence
System-global keyboard shortcut (Electron.globalShortcut)
Always-on wake word
Text-to-speech for agent responses
Click-to-rebind shortcut UI

Drift / mergeability

Branch is 31 commits ahead, 0 commits behind ianphil/chamber:master. Base point is 8834f0b (verified via gh api repos/ianphil/chamber/branches/master). Clean fast-forward / no-op rebase.

How to try

Pull this branch, npm install, npm start.
Settings → Voice dictation → click Download (~696 MB first time via Foundry Local).
Activate a mind, click the mic button in the chat input, speak, click to stop, edit transcript, press Enter or click Send.
Push-to-talk: hold Alt+Shift+V with the textarea focused.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- engineWorker maps internal end verb to session.stop()+dispose() - installerWorker handles model download/runtime install; cancel via worker terminate - FoundryTranscriptionProvider wraps VoiceWorkerPool with sessionId filtering - service skeleton cleanup Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- mic button hidden unless voiceDictation flag on; disabled until model ready - final transcripts inserted at caret via existing controlled value path; never auto-submits - preserves Enter-to-send, Escape-cancel, draft persistence, IME, shortcode popover behaviors - guards Promise.resolve() around mock-returned voice.start/stop to satisfy strict event handler Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- PermissionInspector concrete impl in app layer (preserves no-electron-in-services boundary) - cross-platform table covers darwin/win32/linux - sessionSecurity allows media permission only for Chamber renderer origins Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Input device, mic permission, Test mic, shortcut, push-to-talk, model row with download progress. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…sh-to-talk - mints UUID per session; drops transcript events for unknown sessionId - push-to-talk and toggle modes; suppressed during IME composition - cleanup on unmount/session-stop/flag-off (tracks, AudioContext, IPC listeners) - browserApi/test helpers extended for renderer fakes Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- VoiceDictationStore + Foundry/Fake provider + VoiceWorkerPool constructed only when voiceDictation flag is enabled - CHAMBER_E2E_VOICE_FAKE=1 selects FakeTranscriptionProvider - preload exposes window.electronAPI.voice surface with sessionId-keyed channels Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ooks - voice-dictation-smoke.spec.ts drives Settings + chat mic flows - e2e:voice:setPermissionState / setModelStatus / getSessionState IPC hooks - VoiceDictationService.startSession now accepts {sessionId,deviceId,modelId} object (legacy positional still works) - Adjust voice IPC adapter to forward request object; update mock service port - Fix FeatureFlagService.test expectation for new voiceDictation flag Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Blocking: - VoiceWorkerPool: serialize per-role RPCs via FIFO queue (fast-path for first send keeps postMessage synchronous) - useVoiceDictation: tear down session when options.enabled flips false mid-listen - appendFrame: await each chunk so backpressure propagates from worklet; stop session if append throws - stop(): always call endSession even if mic cleanup rejects (no leaked worker sessions) Non-blocking: - voice IPC: prune transcript targets on WebContents destroyed; isDestroyed guard before send; auto-cleanup empty target sets - voice IPC: assert sender ownership on appendAudio/endSession; record sessionId -> WebContents.id at startSession - depcheck: enforce foundry-local-sdk import only from apps/desktop/src/main/voiceWorker/{engineWorker,installerWorker}.ts - voice.test.ts: extend MockWebContents with id/once/isDestroyed Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- review-before-send + enter-still-submits fixme'd: contextBridge freezes electronAPI so chat.send cannot be spied from inside the page. Covered by chat-mic-inserts-sentinel (visible textarea verification) and existing chatroom/byo-llm Playwright specs. - model-not-downloaded-cta: pass expectMicEnabled:false through activateMind Final: 8 passed, 3 skipped, 0 failed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Forge Vite plugin builds main-target entries flat into .vite/build/, so the workers must be listed alongside main.ts/preload.ts entries to be compiled. Without this, runtime crashes with Voice engine/installer worker is not started because main.ts tries to spawn engineWorker.js / installerWorker.js that were never emitted by Vite. - new apps/desktop/vite.voiceWorker.config.ts (foundry-local-sdk external) - forge.config.ts: add engineWorker.ts + installerWorker.ts to build entries - main.ts: resolve workers from __dirname (flat layout), not voiceWorker/ Missed by validation: unit tests mock worker_threads + foundry-local-sdk; Playwright UAT uses CHAMBER_E2E_VOICE_FAKE=1 fake provider. Neither exercised the real worker bundle path. Manual real-mic QA surfaced the gap. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Foundry Local Core is process-singleton across worker_threads, so route engine and installer RPCs through one Vite-built voice worker and one physical VoiceWorkerPool worker. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Subscribe to worker modelProgress events during downloads, forward typed percent updates through the existing IPC progress channel, and render percent-based progress in Settings. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Build the Vite voice worker bundle, run it in a real worker_thread with a deterministic Foundry stub, and assert download progress plus selectModel succeed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Keep worker session state clear until start succeeds, dispose failed starts, and retry once after stopping a stale native audio-stream handle reported by Foundry. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Stop opening a Foundry streaming session for Test mic; check permission and model readiness instead, with E2E status overrides preserving deterministic smoke coverage. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The previous stop() cleared sessionIdRef immediately and only then awaited capture.stop() + endSession(). A second mic click during that await chain slipped past start()'s only gate (sessionIdRef.current) and fired a new startSession while the FoundryTranscriptionProvider on main was still ending, producing 'Foundry transcription provider is already started' and leaving the chat mic button stuck in listening state. Fix: - Add 'starting' and 'stopping' to VoiceDictationState (idle/starting/listening/stopping/error). - stop() now sets state='stopping' first and keeps sessionIdRef populated until cleanup completes. - start() rejects when stateRef is not idle/error, not just when sessionIdRef is null. - New stateRef mirrors state for synchronous gating without callback rebinding. Caught by chamber-ui-tester driving the real Foundry path; previous Playwright UAT used the fake provider and missed it. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

HashwanthVen and others added 30 commits June 9, 2026 15:42

feat(voice): add shared dictation contracts

8fcaf8c

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

feat(voice): add dictation config store

91b8657

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

feat(voice): add transcription provider contracts

32ce4e8

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

feat(voice): add fake transcription provider

d5878de

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

feat(voice): add worker pool supervisor

2199ce1

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

feat(voice): add dictation service skeleton

a12279a

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

feat(voice): add packaged voice runtime

d6b00fb

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Wire voice dictation settings view

8dcdc06

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add renderer audio capture utilities

444d88b

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add voice dictation IPC adapter

4098e81

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

feat(voice): add Voice dictation settings section (six rows)

eb0c27e

Input device, mic permission, Test mic, shortcut, push-to-talk, model row with download progress. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

chore(voice): tighten IPC adapter type assertion and arg normalization

89b1a5b

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

test(voice): add runtime worker smoke

1b04878

Build the Vite voice worker bundle, run it in a real worker_thread with a deterministic Foundry stub, and assert download progress plus selectModel succeed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix(voice): recover stale Foundry audio streams

2c47530

Keep worker session state clear until start succeeds, dispose failed starts, and retry once after stopping a stale native audio-stream handle reported by Foundry. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix(voice): make Test mic a readiness check

64605bd

Stop opening a Foundry streaming session for Test mic; check permission and model readiness instead, with E2E status overrides preserving deterministic smoke coverage. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix(voice): clarify keyboard shortcut setting

68f4c0d

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix(voice): show listening indicator in chat input

4230877

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix(voice): clarify mic button toggle tooltip

a4ce268

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix(voice): force redownload cached models

bf8396a

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

patschmittdev mentioned this pull request Jun 22, 2026

feat(voice): Azure Speech voice subsystem (dictation + hands-free, STT/TTS) #387

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(voice): local voice dictation via Foundry Local + Nemotron streaming#385

feat(voice): local voice dictation via Foundry Local + Nemotron streaming#385
HashwanthVen wants to merge 31 commits into
ianphil:masterfrom
HashwanthVen:feat/voice-dictation

HashwanthVen commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

HashwanthVen commented Jun 10, 2026

Voice dictation: speak into Chamber locally (Nemotron + Foundry Local)

Summary

Highlights

Architecture

Files changed

Validation

Privacy + security

Not in scope (Phase 2 candidates)

Drift / mergeability

How to try

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant