feat(voice): local voice dictation via Foundry Local + Nemotron streaming#385
Open
HashwanthVen wants to merge 31 commits into
Open
feat(voice): local voice dictation via Foundry Local + Nemotron streaming#385HashwanthVen wants to merge 31 commits into
HashwanthVen wants to merge 31 commits into
Conversation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- engineWorker maps internal end verb to session.stop()+dispose() - installerWorker handles model download/runtime install; cancel via worker terminate - FoundryTranscriptionProvider wraps VoiceWorkerPool with sessionId filtering - service skeleton cleanup Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- mic button hidden unless voiceDictation flag on; disabled until model ready - final transcripts inserted at caret via existing controlled value path; never auto-submits - preserves Enter-to-send, Escape-cancel, draft persistence, IME, shortcode popover behaviors - guards Promise.resolve() around mock-returned voice.start/stop to satisfy strict event handler Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- PermissionInspector concrete impl in app layer (preserves no-electron-in-services boundary) - cross-platform table covers darwin/win32/linux - sessionSecurity allows media permission only for Chamber renderer origins Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Input device, mic permission, Test mic, shortcut, push-to-talk, model row with download progress. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…sh-to-talk - mints UUID per session; drops transcript events for unknown sessionId - push-to-talk and toggle modes; suppressed during IME composition - cleanup on unmount/session-stop/flag-off (tracks, AudioContext, IPC listeners) - browserApi/test helpers extended for renderer fakes Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- VoiceDictationStore + Foundry/Fake provider + VoiceWorkerPool constructed only when voiceDictation flag is enabled - CHAMBER_E2E_VOICE_FAKE=1 selects FakeTranscriptionProvider - preload exposes window.electronAPI.voice surface with sessionId-keyed channels Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ooks
- voice-dictation-smoke.spec.ts drives Settings + chat mic flows
- e2e:voice:setPermissionState / setModelStatus / getSessionState IPC hooks
- VoiceDictationService.startSession now accepts {sessionId,deviceId,modelId} object (legacy positional still works)
- Adjust voice IPC adapter to forward request object; update mock service port
- Fix FeatureFlagService.test expectation for new voiceDictation flag
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Blocking:
- VoiceWorkerPool: serialize per-role RPCs via FIFO queue (fast-path for first send keeps postMessage synchronous)
- useVoiceDictation: tear down session when options.enabled flips false mid-listen
- appendFrame: await each chunk so backpressure propagates from worklet; stop session if append throws
- stop(): always call endSession even if mic cleanup rejects (no leaked worker sessions)
Non-blocking:
- voice IPC: prune transcript targets on WebContents destroyed; isDestroyed guard before send; auto-cleanup empty target sets
- voice IPC: assert sender ownership on appendAudio/endSession; record sessionId -> WebContents.id at startSession
- depcheck: enforce foundry-local-sdk import only from apps/desktop/src/main/voiceWorker/{engineWorker,installerWorker}.ts
- voice.test.ts: extend MockWebContents with id/once/isDestroyed
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- review-before-send + enter-still-submits fixme'd: contextBridge freezes electronAPI so chat.send cannot be spied from inside the page. Covered by chat-mic-inserts-sentinel (visible textarea verification) and existing chatroom/byo-llm Playwright specs. - model-not-downloaded-cta: pass expectMicEnabled:false through activateMind Final: 8 passed, 3 skipped, 0 failed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Forge Vite plugin builds main-target entries flat into .vite/build/, so the workers must be listed alongside main.ts/preload.ts entries to be compiled. Without this, runtime crashes with Voice engine/installer worker is not started because main.ts tries to spawn engineWorker.js / installerWorker.js that were never emitted by Vite. - new apps/desktop/vite.voiceWorker.config.ts (foundry-local-sdk external) - forge.config.ts: add engineWorker.ts + installerWorker.ts to build entries - main.ts: resolve workers from __dirname (flat layout), not voiceWorker/ Missed by validation: unit tests mock worker_threads + foundry-local-sdk; Playwright UAT uses CHAMBER_E2E_VOICE_FAKE=1 fake provider. Neither exercised the real worker bundle path. Manual real-mic QA surfaced the gap. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Foundry Local Core is process-singleton across worker_threads, so route engine and installer RPCs through one Vite-built voice worker and one physical VoiceWorkerPool worker. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Subscribe to worker modelProgress events during downloads, forward typed percent updates through the existing IPC progress channel, and render percent-based progress in Settings. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Build the Vite voice worker bundle, run it in a real worker_thread with a deterministic Foundry stub, and assert download progress plus selectModel succeed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Keep worker session state clear until start succeeds, dispose failed starts, and retry once after stopping a stale native audio-stream handle reported by Foundry. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Stop opening a Foundry streaming session for Test mic; check permission and model readiness instead, with E2E status overrides preserving deterministic smoke coverage. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The previous stop() cleared sessionIdRef immediately and only then awaited capture.stop() + endSession(). A second mic click during that await chain slipped past start()'s only gate (sessionIdRef.current) and fired a new startSession while the FoundryTranscriptionProvider on main was still ending, producing 'Foundry transcription provider is already started' and leaving the chat mic button stuck in listening state. Fix: - Add 'starting' and 'stopping' to VoiceDictationState (idle/starting/listening/stopping/error). - stop() now sets state='stopping' first and keeps sessionIdRef populated until cleanup completes. - start() rejects when stateRef is not idle/error, not just when sessionIdRef is null. - New stateRef mirrors state for synchronous gating without callback rebinding. Caught by chamber-ui-tester driving the real Foundry path; previous Playwright UAT used the fake provider and missed it. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Voice dictation: speak into Chamber locally (Nemotron + Foundry Local)
Summary
Adds local-first voice dictation to Chamber. Users hold a renderer-only shortcut (Alt+Shift+V) or click a mic button in the chat input to speak; transcripts insert at the caret of the active single-agent chat input. Never auto-sends — user always reviews before pressing Send.
Built on the same
foundry-local-sdk@1.1.0+ Nemotron streaming model that the official GitHub Copilot CLI/voiceuses, so the engineering pattern is upstream-validated.Closes the gap in the PRD at
VOICE_DICTATION_PRD.md(planning doc lives outside the repo).Highlights
nemotron-speech-streaming-en-0.6bruns in aworker_threads.Workervia Foundry Local SDK. Audio never leaves the device. Model auto-downloads on first use (~696 MB CPU variant).globalShortcut— explicitly out of scope for Phase 1). DefaultAlt+Shift+V. Configurable hold-vs-toggle behavior.voiceDictation(default off on stable, on for insiders + dev). Cleanly invisible when off.packages/services/**never importselectronorfoundry-local-sdk; depcruise enforces the rule (newfoundry-local-sdk-only-in-voice-workersrule added).chamber-voice-runtime/mirrors the existingchamber-copilot-runtime/pattern.foundry-local-sdk@1.1.0pinned exactly.forge.config.tsadds the SDK toasar.unpackfor native prebuilds.Architecture
Files changed
New:
packages/shared/src/voice-types.ts+ test — config, IPC contracts, RPC verbs, payload size guardpackages/services/src/voice/—VoiceDictationStore,VoiceDictationService,VoiceWorkerPool,FakeTranscriptionProvider,FoundryTranscriptionProvider, permissions/provider interfaces, all with colocated testsapps/desktop/src/main/voice/PermissionInspector.ts+ test — wrapssystemPreferences.getMediaAccessStatuscross-platformapps/desktop/src/main/voiceWorker/voiceWorker.ts+ test — single worker handling both engine + installer verbsapps/desktop/src/main/voiceWorker/workerProtocol.ts— shared RPC envelopeapps/desktop/src/main/ipc/voice.ts+ test — thin IPC adapter with sender ownership + chunk size validationapps/web/src/renderer/hooks/useVoiceDictation.ts+ test — session lifecycle, sessionId routing, PTT, IME suppression, cleanupapps/web/src/renderer/lib/audio/— PCM16 encoder, AudioWorklet processor, captureMic helperapps/web/src/renderer/components/settings/VoiceDictationSettingsSection.tsx+ testchamber-voice-runtime/— pinnedfoundry-local-sdk@1.1.0scripts/prepare-voice-runtime.js,scripts/check-voice-sdk-version.js,scripts/voice-runtime-smoke.jsapps/desktop/vite.voiceWorker.config.tstests/e2e/electron/voice-dictation-smoke.spec.tsModified (surgical):
packages/shared/src/ipc-channels.ts—IPC.VOICE.*+IPC.E2E.VOICE_*packages/shared/src/electron-types.ts(+ test) —voicenamespacepackages/shared/src/feature-flags.ts(+ test) —voiceDictationflagapps/desktop/src/main.ts— wire VoiceDictationService when feature flag is on; honorCHAMBER_E2E_VOICE_FAKE=1for deterministic testsapps/desktop/src/preload.ts— exposevoicenamespaceapps/desktop/src/main/devFeatureFlags.ts— enable in devapps/desktop/src/main/security/sessionSecurity.ts(+ test) — allowmediapermission for Chamber renderer origins onlyapps/web/src/renderer/components/chat/ChatInput.tsx(+ test) — mic button + PTT, never auto-submitsapps/web/src/renderer/components/settings/SettingsView.tsx(+ test) — gate VoiceDictationSettingsSection behind flagforge.config.ts—asar.unpackforfoundry-local-sdknative prebuilds; new build entry forvoiceWorker.tsconfig/dependency-cruiser.cjs— enforcesfoundry-local-sdkimport is allowed only from the worker filepackage.json/package-lock.json—foundry-local-sdk@1.1.0pinned in dependencies + overrides + newsmoke:voice-runtimescriptValidation
npm run typecheck✅npm run lint✅ (tsc + eslint + deps:check 547 modules / 0 boundary violations + yaml + md)npm run smoke:voice-runtime✅ (new — exercises the compiled worker bundle against a mocked SDK; would have caught the singleton bug)npm run smoke:desktop -- tests/e2e/electron/voice-dictation-smoke.spec.ts— 8 passed / 3 skipped / 0 failed. Three scenarios aretest.fixme()with explanation: two need to spychat.sendbut contextBridge freezeselectronAPI(covered bychat-mic-inserts-sentinelwhich proves the visible textarea path); one is the second contextBridge case.chamber-ui-testerdriven against the real Foundry path on Windows: Test mic readiness ✅, model download with live progress ✅, chat mic listening state ✅. Edge cases (re-entrant start during stop) found two bugs that are now fixed in this branch — see94dda24and the voice-rt-fix commit chain.Privacy + security
keytaris not used (no secrets in the voice path).sessionSecurityallowlist — only Chamber's own renderer origin can requestmedia; foreign origins denied.worker_threadsisolation between Foundry SDK and main process.webPreferences(contextIsolation: true,nodeIntegration: false) unchanged.IPC.VOICE.*constants with zod schema validation and payload size limits (VOICE_MAX_APPEND_CHUNK_BYTES).appendAudio/endSessionreject from a non-ownerWebContents.gpt-5.5security review: APPROVE, no findings.Not in scope (Phase 2 candidates)
Electron.globalShortcut)Drift / mergeability
Branch is 31 commits ahead, 0 commits behind
ianphil/chamber:master. Base point is8834f0b(verified viagh api repos/ianphil/chamber/branches/master). Clean fast-forward / no-op rebase.How to try
npm install,npm start.Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com