Add AssemblyAI background STT with desktop cloud batch and speaker identity#7446
Add AssemblyAI background STT with desktop cloud batch and speaker identity#7446Git-on-my-level wants to merge 45 commits into
Conversation
Update desktop PTT tests to patch stt_provider_service after chat.py refactor, remove the unused duration parameter from postprocess_words, and skip self-voice review candidates when no identity assignment is available.
BYOK users can supply a fifth AssemblyAI key for sync/background/postprocess workloads; when Assembly routing is enabled but no Assembly key is present, Deepgram BYOK is used instead of Omi's server Assembly key. Co-authored-by: Cursor <cursoragent@cursor.com>
Wire desktop Audio Recording to POST /v2/desktop/background-transcribe via chunker/session queue, add backend endpoints and e2e script, and include an agent prompt for multi-chunk non-desktop E2E verification. Co-authored-by: Cursor <cursoragent@cursor.com>
Stabilize desktop cloud batch recording by using 15s cloud chunks, nonfatal backpressure, speech activity gating, explicit batch language, and resilient ASR queue draining. Add explicit desktop background conversation finish routing, AssemblyAI background fail-closed behavior, route/rate-limit coverage, and developer docs for the batch path. Validation: backend focused pytest suite 41 passed, 2 skipped; Swift BackgroundTranscription 15 passed; APIClient finish route 1 passed; ListenProtocol 25 passed; git diff --check clean; live Omi Dev session uploaded 15s AssemblyAI chunks, suppressed quiet-room chunks, and reconciled conversation ba94a0a9-1af8-4d51-b98b-0a0f269bef65.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e1ebf27a4d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if isStartingTranscription { | ||
| cloudBackgroundStartTask?.cancel() | ||
| cloudBackgroundStartTask = nil | ||
| isStartingTranscription = false | ||
| isCloudBackgroundTranscription = false | ||
| cloudBackgroundSession = nil | ||
| cloudBackgroundConversationId = nil | ||
| AssistantSettings.shared.transcriptionEnabled = false | ||
| return |
There was a problem hiding this comment.
Stop capture when canceling startup
When stopTranscription() is called during startup, this branch cancels cloudBackgroundStartTask and returns without stopping active capture or resetting isTranscribing. In startCloudBackgroundTranscription, isTranscribing is set to true before startup completes (around line 1627), so if the user taps stop during that window, the app can keep recording until a second stop (or until other cleanup happens), which is a privacy/UX regression for the cloud background path.
Useful? React with 👍 / 👎.
Run Omi speaker identity matching on AssemblyAI desktop background chunks before applying global chunk offsets, update provider run identity metrics, and cover the Omi user match path in desktop background transcription tests. Validation: backend focused pytest suite 42 passed, 2 skipped; speaker identity focused suite 23 passed; pre-commit Python formatting clean; git diff --check clean.
Summary
This PR adds the AssemblyAI-backed prerecorded/background STT path and wires it through desktop Audio Recording as a gated cloud batch flow.
At a high level it adds:
/v2/desktop/background-conversation/*and/v2/desktop/background-transcribeIntended Behavior
sync,background, andpostprocess./v4/listen.multito avoid AssemblyAI language-detection failures on short or quiet chunks.Rollout / Config
AssemblyAI remains off by default.
Important knobs:
ASSEMBLYAI_API_KEYASSEMBLYAI_BACKGROUND_STT_ENABLEDASSEMBLYAI_BACKGROUND_STT_WORKLOADS(sync,background,postprocessby default)ASSEMBLYAI_STT_MODELASSEMBLYAI_BASE_URLASSEMBLYAI_POLL_INTERVAL_SECONDSASSEMBLYAI_MAX_POLL_SECONDSASSEMBLYAI_SMOKE_AUDIO_URLRollback is config-only: set
ASSEMBLYAI_BACKGROUND_STT_ENABLED=falseor remove a workload fromASSEMBLYAI_BACKGROUND_STT_WORKLOADS.BYOK adds optional
X-BYOK-AssemblyAI; existing Deepgram BYOK behavior is preserved when users have only a Deepgram key.Notable Code Areas
backend/utils/stt/provider_service.py,backend/utils/stt/providers.py,backend/utils/stt/assemblyai_adapter.py,backend/utils/stt/deepgram_adapter.pybackend/routers/desktop_background.py,backend/utils/conversations/desktop_background.pybackend/utils/stt/background_speaker_identity.py,backend/utils/stt/conversation_reconstructor.py,backend/models/transcript_segment.pybackend/database/transcription_provider_usage.py,backend/utils/stt/provider_evaluation.py,backend/scripts/stt/provider_comparison_gate.pydesktop/Desktop/Sources/AppState.swift,desktop/Desktop/Sources/TranscriptionService.swift,desktop/Desktop/Sources/BackgroundTranscription/*desktop/Desktop/Sources/MainWindow/Pages/SettingsPage.swift,desktop/Desktop/Sources/OnboardingBYOKStepView.swiftdocs/doc/developer/backend/assemblyai_background_rollout.mdx,docs/doc/developer/backend/listen_pusher_pipeline.mdx,scripts/desktop_assemblyai_e2e.pyValidation
Automated checks run on this branch:
backend/venv/bin/python -m pytest tests/unit/test_assemblyai_adapter.py tests/unit/test_desktop_background_transcribe.py tests/unit/test_background_provider_service.py tests/unit/test_byok_assemblyai_routing.py tests/unit/test_rate_limiting.py::TestRouterPolicyMapping::test_all_router_policies_exist -v-> 41 passed, 2 skippedxcrun swift test -c debug --package-path Desktop --filter BackgroundTranscription-> 15 passedxcrun swift test -c debug --package-path Desktop --filter APIClientRoutingTests/testFinishBackgroundConversationRoutesToExplicitPythonConversation-> 1 passedxcrun swift test -c debug --package-path Desktop --filter ListenProtocolTests-> 25 passed.git/hooks/pre-commit-> Python formatting cleangit diff --check/git diff --cached --check-> cleanLive desktop evidence with Omi Dev (
com.omi.desktop-dev) against local backend with AssemblyAI enabled:ba94a0a9-1af8-4d51-b98b-0a0f269bef65, local session56language=enprovider=assemblyai,segments=1, run id5849a05c-1c65-4072-915c-4e7959fad97aManual Test Checklist
Before broad rollout, verify:
ASSEMBLYAI_BACKGROUND_STT_ENABLED=true,ASSEMBLYAI_API_KEYset, andASSEMBLYAI_BACKGROUND_STT_WORKLOADS=sync,background,postprocessscripts/desktop_assemblyai_e2e.py --background-batch --api http://127.0.0.1:8080 --language enproduces non-empty AssemblyAI segments/v4/listen, PTT, and voice-message flows still route to DeepgramCaveats / Reviewer Notes
speaker_id=0.