Detect spontaneous calls via mic activity and offer to record#1119
Conversation
Groundwork for offering to record when a call starts, including a spontaneous Google Meet with no calendar invite (see docs/auto-call-detection-spec.md). The one real unknown was whether the modern Core Audio process-object API (macOS 14.2+) can read which process holds the mic input, attributed by bundle ID, without the NSAudioCaptureUsageDescription TCC permission. A throwaway spike (scripts/dev/mic-activity-spike.swift) confirms it can: reading the process-object list plus per-process IsRunningInput, PID, and BundleID returns clean with no prompt. The spike also surfaced the key design constraint, later confirmed against a live Meet call: the browser holds the mic via a helper process (com.google.Chrome.helper), so attribution must match the browser family by prefix, not exact bundle ID.
Pure, tested attribution layer for call detection, with no behavior change yet. Native conferencing apps map to their own provider; any browser-family process maps to googleMeet, the representative browser call, which is what closes the spontaneous-Meet gap. Unknown apps map to nil so they never prompt. Browser matching is by family prefix (String.matchesBundleFamily) so a helper or service process attributes to its parent. This is required, not cosmetic: a live Meet call is held by com.google.Chrome.helper, and Safari audio runs in com.apple.WebKit.GPU. Also adds a distinct mic-input prompt reason for analytics and a score above the frontmost-browser hint.
Thin watcher over the Core Audio process-object API that emits the set of non-self bundle IDs currently holding the mic input. Metadata only, no audio tap, no TCC permission. Not wired into the detector yet. A call starting does not change the process-object list (the browser helper already exists; only its IsRunningInput flips), so a pure list listener would miss it. The monitor listens on the input device's running-somewhere edge for low latency and runs a small periodic backstop scan for correctness. All CoreAudio reads, listeners, and state stay on one serial queue and results hop to the main actor; attribution and self-exclusion are pure helpers covered by fast tests.
Wire the mic signal into the existing prompt pipeline. updateMicInputUsers stores the non-self mic-using bundle IDs and re-evaluates; on the inactive edge it clears that provider's mic backoff so the next call can prompt again, while snooze and dismiss still suppress within one call. Mic candidates reuse the runtime-app source so the existing snooze/dismiss/backoff is untouched, score above the frontmost hint, and carry the mic-input reason. Browser calls map to googleMeet with a neutral "Call detected in your browser" title so a Zoom-web or Teams-web call is not mislabeled. An isOwnCaptureActive gate keeps the whole path off while Transcripted itself is recording or dictating.
Turn the feature on end to end. AutoCallDetectionPreferences persists the on/off state, defaulting on to match Notion and Plaud, behind a clear "Auto-detect calls" toggle on the General settings page. The app constructs MicActivityMonitor, points its output at the detector, sets isOwnCaptureActive from meeting-recording and dictation state, and starts or stops the monitor based on the preference (live-toggleable via a change notification). Everything stays on device: it reads which app holds the mic, never audio.
Capture how the implementation landed in the spec (device-edge plus backstop scan instead of a process-list listener; browser family-prefix matching) and resolve its open questions: default on, browsers plus known conferencing apps only, and a distinct mic-input analytics reason. Note the new files in the Meeting and Support CLAUDE.md guides.
Both attribution branches fire during real calls: a spontaneous Google Meet with no calendar invite shows "Call detected in your browser", and a native Zoom call shows "Zoom call detected".
|
Thanks for digging into this, Alfonso. The spontaneous-call angle is useful, and I’m going to review it carefully against latest |
|
Thanks again. I reviewed this against latest I would not land this branch as-is yet, though. Current Suggested next step: rebase on current |
|
Thanks again, Alfonso. I see this has been updated/rebased cleanly now, so I’m going to bring it onto |
|
Thanks again, Alfonso. This is merged into I reviewed the updated branch against the newest
Really appreciate the contribution. |
Why
Transcripted prompts for scheduled or visible meeting apps, but spontaneous browser calls can be invisible. This adds a local mic-activity signal so Transcripted can offer to record ad-hoc calls before users forget. Closes #1118.
Current status
mainin commitfacd5bc5.repo-hygienepass,build-and-testpass,hardware-smokesskipped.What changed
MicActivityMonitor, a Core Audio process-object watcher that emits non-Transcripted bundle IDs using the mic. It observes metadata only; no audio tap and no new TCC permission.MeetingPromptDetectoras a.micInputreason while reusing the existing prompt, snooze, dismiss, and backoff machinery.Proof from current head
Automation run by Codex after the final hardening commits:
bash scripts/dev/agent-preflight.shbash -n run-tests.shbash -n scripts/entrypoints/run-tests.shbash build-deps.sh --forcebash build.sh --no-openbash run-tests.sh— 5342/5342 passedbash run-integration-smoke.shbash scripts/ops/transcripted-qa-bench.sh --mode full— PASS, 16/16 checksrepo-hygienebuild-and-testReview proof:
codex-reviewwas attempted but the local Codex CLI was usage-limited, so it did not produce a verdict.no actionable findings.Manual proof boundary:
Risk notes