feat(voice): Azure Speech voice subsystem (dictation + hands-free, STT/TTS) by patschmittdev · Pull Request #387 · ianphil/chamber

patschmittdev · 2026-06-22T15:39:58Z

Summary

Adds an optional Azure Speech voice subsystem, behind a feature flag (off by default):

Microphone dictation into the composer (speech-to-text).
Hands-free conversation mode (STT in, TTS out).

This is a standalone cloud voice alternative to the local Foundry dictation in #385. They are different architectures (cloud Azure vs on-device); shipping this does not preclude later converging Azure STT under #385's TranscriptionProvider contract.

Security posture (please review)

The subscription key is stored only via the keytar CredentialStore port. AzureSpeechStore throws rather than falling back if no OS keychain is available.
The JSON config file holds non-secret metadata only: writeConfig runs stripKey, and coerce never emits apiKey. The key never reaches disk in plaintext.
The renderer never receives the key. It authenticates with short-lived (~9 min) tokens minted in the main process via mintToken / issueToken.
SSRF defense: the region is validated against ^[a-z0-9-]+$ before building the issueToken URL host.
The Electron session security boundary (sessionSecurity.ts) is extended to scope the Azure Speech endpoints and mic permission.
AzureSpeechStore.ts is registered in the credential-write security invariant allowlist (security-boundaries.invariant.test.ts) with the review documented inline, same boundary contract as ByoLlmStore.

Branch shape (off the current master tip, 0 behind)

bb8b482 refactor(ui): extract shared UI foundation off master
6b40ed8 feat(voice): Azure Speech voice subsystem (STT/TTS) + security boundary
080bfa3 test(security): register AzureSpeechStore in credential-write allowlist + changelog

54 files.

Test evidence

npm run lint: green (tsc + eslint + dependency-cruiser 537 modules / 0 violations + yaml + markdown).
Security invariant security-boundaries.invariant.test.ts: 12/12 pass after the reviewed allowlist registration.
Full npm test before the fix was 2078 pass / 2 fail, where the two failures were exactly (a) this credential-boundary invariant and (b) MindProfileService > rejects symlinked profile files. The invariant now passes; the only remaining failure is the symlink test, a Windows fs.symlinkSync EPERM (Developer Mode) limitation that is unrelated and green in CI on Linux.

Notes

Feature-flagged off by default.
No linked issue.

Split the Azure Speech voice FEATURE out of feat/webgl-ambient-background onto the ui-foundation base. Delivers the full voice subsystem with its trust boundary intact: Main / security: - AzureSpeechStore: subscription key lives in the OS keychain (injected CredentialStore); only non-secret metadata persists to disk; region is regex-validated (SSRF guard); renderer gets short-lived minted tokens, never the key - azureSpeech IPC adapter (get/save/disable/test/mintToken) gated on the flag - sessionSecurity: connect-src allows the Azure Speech STT/TTS endpoints; the permission handler grants microphone only when the voice flag is on and always denies camera (video). Theme-hash CSP changes are intentionally NOT here (they belong to shell-theming) - azureSpeech feature flag across feature-flags / devFeatureFlags / docs Renderer: - components/voice: VoiceModeController + VoiceModeOverlay - hooks: useVoiceInput (dictation) + useVoiceConversation (hands-free) - lib: azureSpeechRecognizer / azureSpeechSynthesizer / sentenceChunker - settings: AzureSpeechSettingsSection Folded-in audit fixes: microsoft-cognitiveservices-speech-sdk pinned EXACT (1.50.0, was a caret range) since it is a security-relevant SDK. DIVERGENCE (intentional, owned): this branch is the voice *capability layer*. The integration points into ChatInput/ChatPanel (dictation + voice-mode buttons) and SettingsView (the Voice section) are NOT wired here, because those files are owned by feat/chat-polish and feat/settings-ia. Re-wiring them is the explicit reconciliation step when voice merges with those branches. The voice components/hooks are covered directly by their own tests. Verification: tsc clean; lint clean (537 modules, 0 violations); 93 voice + security tests pass, including AzureSpeechStore key-isolation / SSRF-guard / token-mint and the sessionSecurity mic/camera + speech-CSP tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The Azure Speech voice subsystem stores the subscription key only through the keytar-backed CredentialStore port; its JSON config is key-stripped (stripKey and coerce never persist apiKey) and the renderer authenticates with short-lived issued tokens, never the key. Add AzureSpeechStore.ts to the security-boundary invariant's approved credential-writer allowlist (same contract as ByoLlmStore), with the review documented inline, and record the voice subsystem under the Unreleased changelog. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Patrick Schmitt and others added 2 commits June 7, 2026 10:54

patschmittdev mentioned this pull request Jun 22, 2026

refactor(ui): shared UI foundation (tokens, theming, Inter, primitives) #389

Open

patschmittdev changed the base branch from master to refactor/ui-foundation June 22, 2026 16:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(voice): Azure Speech voice subsystem (dictation + hands-free, STT/TTS)#387

feat(voice): Azure Speech voice subsystem (dictation + hands-free, STT/TTS)#387
patschmittdev wants to merge 2 commits into
refactor/ui-foundationfrom
feat/azure-speech-voice

patschmittdev commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant