feat(voice): Azure Speech voice subsystem (dictation + hands-free, STT/TTS)#387
Open
patschmittdev wants to merge 2 commits into
Open
feat(voice): Azure Speech voice subsystem (dictation + hands-free, STT/TTS)#387patschmittdev wants to merge 2 commits into
patschmittdev wants to merge 2 commits into
Conversation
Split the Azure Speech voice FEATURE out of feat/webgl-ambient-background onto the ui-foundation base. Delivers the full voice subsystem with its trust boundary intact: Main / security: - AzureSpeechStore: subscription key lives in the OS keychain (injected CredentialStore); only non-secret metadata persists to disk; region is regex-validated (SSRF guard); renderer gets short-lived minted tokens, never the key - azureSpeech IPC adapter (get/save/disable/test/mintToken) gated on the flag - sessionSecurity: connect-src allows the Azure Speech STT/TTS endpoints; the permission handler grants microphone only when the voice flag is on and always denies camera (video). Theme-hash CSP changes are intentionally NOT here (they belong to shell-theming) - azureSpeech feature flag across feature-flags / devFeatureFlags / docs Renderer: - components/voice: VoiceModeController + VoiceModeOverlay - hooks: useVoiceInput (dictation) + useVoiceConversation (hands-free) - lib: azureSpeechRecognizer / azureSpeechSynthesizer / sentenceChunker - settings: AzureSpeechSettingsSection Folded-in audit fixes: microsoft-cognitiveservices-speech-sdk pinned EXACT (1.50.0, was a caret range) since it is a security-relevant SDK. DIVERGENCE (intentional, owned): this branch is the voice *capability layer*. The integration points into ChatInput/ChatPanel (dictation + voice-mode buttons) and SettingsView (the Voice section) are NOT wired here, because those files are owned by feat/chat-polish and feat/settings-ia. Re-wiring them is the explicit reconciliation step when voice merges with those branches. The voice components/hooks are covered directly by their own tests. Verification: tsc clean; lint clean (537 modules, 0 violations); 93 voice + security tests pass, including AzureSpeechStore key-isolation / SSRF-guard / token-mint and the sessionSecurity mic/camera + speech-CSP tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Azure Speech voice subsystem stores the subscription key only through the keytar-backed CredentialStore port; its JSON config is key-stripped (stripKey and coerce never persist apiKey) and the renderer authenticates with short-lived issued tokens, never the key. Add AzureSpeechStore.ts to the security-boundary invariant's approved credential-writer allowlist (same contract as ByoLlmStore), with the review documented inline, and record the voice subsystem under the Unreleased changelog. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This was referenced Jun 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an optional Azure Speech voice subsystem, behind a feature flag (off by default):
This is a standalone cloud voice alternative to the local Foundry dictation in #385. They are different architectures (cloud Azure vs on-device); shipping this does not preclude later converging Azure STT under #385's
TranscriptionProvidercontract.Security posture (please review)
CredentialStoreport.AzureSpeechStorethrows rather than falling back if no OS keychain is available.writeConfigrunsstripKey, andcoercenever emitsapiKey. The key never reaches disk in plaintext.mintToken/issueToken.^[a-z0-9-]+$before building theissueTokenURL host.sessionSecurity.ts) is extended to scope the Azure Speech endpoints and mic permission.AzureSpeechStore.tsis registered in the credential-write security invariant allowlist (security-boundaries.invariant.test.ts) with the review documented inline, same boundary contract asByoLlmStore.Branch shape (off the current master tip, 0 behind)
bb8b482refactor(ui): extract shared UI foundation off master6b40ed8feat(voice): Azure Speech voice subsystem (STT/TTS) + security boundary080bfa3test(security): register AzureSpeechStore in credential-write allowlist + changelog54 files.
Test evidence
npm run lint: green (tsc + eslint + dependency-cruiser 537 modules / 0 violations + yaml + markdown).security-boundaries.invariant.test.ts: 12/12 pass after the reviewed allowlist registration.npm testbefore the fix was 2078 pass / 2 fail, where the two failures were exactly (a) this credential-boundary invariant and (b)MindProfileService > rejects symlinked profile files. The invariant now passes; the only remaining failure is the symlink test, a Windowsfs.symlinkSyncEPERM (Developer Mode) limitation that is unrelated and green in CI on Linux.Notes