Skip to content

Detect spontaneous calls via mic activity and offer to record#1119

Merged
r3dbars merged 11 commits into
r3dbars:mainfrom
acorretti:feat/auto-call-detection
Jun 14, 2026
Merged

Detect spontaneous calls via mic activity and offer to record#1119
r3dbars merged 11 commits into
r3dbars:mainfrom
acorretti:feat/auto-call-detection

Conversation

@acorretti

@acorretti acorretti commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Why

Transcripted prompts for scheduled or visible meeting apps, but spontaneous browser calls can be invisible. This adds a local mic-activity signal so Transcripted can offer to record ad-hoc calls before users forget. Closes #1118.

Current status

  • Rebased/merged onto current main in commit facd5bc5.
  • GitHub reports the PR conflict-free / mergeable.
  • GitHub checks are green: repo-hygiene pass, build-and-test pass, hardware-smokes skipped.

What changed

  • Added MicActivityMonitor, a Core Audio process-object watcher that emits non-Transcripted bundle IDs using the mic. It observes metadata only; no audio tap and no new TCC permission.
  • Added tested mic bundle/provider heuristics and a Settings toggle for Auto-detect calls.
  • Fed mic activity into MeetingPromptDetector as a .micInput reason while reusing the existing prompt, snooze, dismiss, and backoff machinery.
  • Hardened the prompt path after review:
    • suppresses mic prompts while Transcripted is already recording
    • suppresses stale callbacks after the toggle is disabled
    • preserves pending/dismiss/snooze cooldowns across transient mic drops
    • avoids repeat prompts for the same active mic session
    • lets native Zoom/Teams mic candidates keep a matching calendar title
    • keeps generic browser mic prompts neutral to avoid mislabeling Meet vs Zoom-web vs Teams-web
    • backs off CoreAudio fallback polling to 60s
  • Updated docs/spec and fast-test coverage.

Proof from current head

Automation run by Codex after the final hardening commits:

  • bash scripts/dev/agent-preflight.sh
  • bash -n run-tests.sh
  • bash -n scripts/entrypoints/run-tests.sh
  • bash build-deps.sh --force
  • bash build.sh --no-open
  • bash run-tests.sh — 5342/5342 passed
  • bash run-integration-smoke.sh
  • bash scripts/ops/transcripted-qa-bench.sh --mode full — PASS, 16/16 checks
  • GitHub repo-hygiene
  • GitHub build-and-test

Review proof:

  • codex-review was attempted but the local Codex CLI was usage-limited, so it did not produce a verdict.
  • Fallback independent Claude Code review found prompt-spam and labeling issues; the branch was patched.
  • Final focused Claude Code review of the last hardening commit: no actionable findings.

Manual proof boundary:

  • Original author proof says a live spontaneous Google Meet showed browser-call copy and a native Zoom call showed Zoom copy.
  • This Codex worker did not rerun manual live-call hardware proof after the rebase; the new proof above is automated/local/CI plus independent review.

Risk notes

  • Browser mic activity is intentionally neutral. A real Google Meet in the browser may show generic browser copy, because the mic process alone cannot distinguish Meet from Zoom-web or Teams-web safely.
  • Phase 2 per-call network/WebRTC confirmation remains deferred until real-world false positives justify it.

Groundwork for offering to record when a call starts, including a
spontaneous Google Meet with no calendar invite (see
docs/auto-call-detection-spec.md).

The one real unknown was whether the modern Core Audio process-object API
(macOS 14.2+) can read which process holds the mic input, attributed by
bundle ID, without the NSAudioCaptureUsageDescription TCC permission. A
throwaway spike (scripts/dev/mic-activity-spike.swift) confirms it can:
reading the process-object list plus per-process IsRunningInput, PID, and
BundleID returns clean with no prompt. The spike also surfaced the key
design constraint, later confirmed against a live Meet call: the browser
holds the mic via a helper process (com.google.Chrome.helper), so
attribution must match the browser family by prefix, not exact bundle ID.
Pure, tested attribution layer for call detection, with no behavior change
yet. Native conferencing apps map to their own provider; any browser-family
process maps to googleMeet, the representative browser call, which is what
closes the spontaneous-Meet gap. Unknown apps map to nil so they never
prompt.

Browser matching is by family prefix (String.matchesBundleFamily) so a
helper or service process attributes to its parent. This is required, not
cosmetic: a live Meet call is held by com.google.Chrome.helper, and Safari
audio runs in com.apple.WebKit.GPU. Also adds a distinct mic-input prompt
reason for analytics and a score above the frontmost-browser hint.
Thin watcher over the Core Audio process-object API that emits the set of
non-self bundle IDs currently holding the mic input. Metadata only, no
audio tap, no TCC permission. Not wired into the detector yet.

A call starting does not change the process-object list (the browser helper
already exists; only its IsRunningInput flips), so a pure list listener
would miss it. The monitor listens on the input device's running-somewhere
edge for low latency and runs a small periodic backstop scan for
correctness. All CoreAudio reads, listeners, and state stay on one serial
queue and results hop to the main actor; attribution and self-exclusion are
pure helpers covered by fast tests.
Wire the mic signal into the existing prompt pipeline. updateMicInputUsers
stores the non-self mic-using bundle IDs and re-evaluates; on the inactive
edge it clears that provider's mic backoff so the next call can prompt
again, while snooze and dismiss still suppress within one call.

Mic candidates reuse the runtime-app source so the existing
snooze/dismiss/backoff is untouched, score above the frontmost hint, and
carry the mic-input reason. Browser calls map to googleMeet with a neutral
"Call detected in your browser" title so a Zoom-web or Teams-web call is not
mislabeled. An isOwnCaptureActive gate keeps the whole path off while
Transcripted itself is recording or dictating.
Turn the feature on end to end. AutoCallDetectionPreferences persists the
on/off state, defaulting on to match Notion and Plaud, behind a clear
"Auto-detect calls" toggle on the General settings page.

The app constructs MicActivityMonitor, points its output at the detector,
sets isOwnCaptureActive from meeting-recording and dictation state, and
starts or stops the monitor based on the preference (live-toggleable via a
change notification). Everything stays on device: it reads which app holds
the mic, never audio.
Capture how the implementation landed in the spec (device-edge plus
backstop scan instead of a process-list listener; browser family-prefix
matching) and resolve its open questions: default on, browsers plus known
conferencing apps only, and a distinct mic-input analytics reason. Note the
new files in the Meeting and Support CLAUDE.md guides.
Both attribution branches fire during real calls: a spontaneous Google Meet
with no calendar invite shows "Call detected in your browser", and a native
Zoom call shows "Zoom call detected".

r3dbars commented Jun 14, 2026

Copy link
Copy Markdown
Owner

Thanks for digging into this, Alfonso. The spontaneous-call angle is useful, and I’m going to review it carefully against latest main before deciding whether this should land as-is or needs reshaping.

r3dbars commented Jun 14, 2026

Copy link
Copy Markdown
Owner

Thanks again. I reviewed this against latest main; the underlying idea is strong and the CoreAudio process-object approach looks like a real way to close the spontaneous-call gap.

I would not land this branch as-is yet, though. Current main has moved more prompt policy into the synthetic prompt evaluator, and this branch predates that work. The auto-merge only hard-conflicts in Tests/FastTests.manifest, but the mic-input path also needs to be rebased/adapted so it fits the newer prompt architecture and keeps the synthetic coverage story clean.

Suggested next step: rebase on current main, keep both SyntheticMeetingPromptTests and the new mic tests in the manifest, remove the small duplicate @available typo in TranscriptedApp.swift, then rerun the mapped checks: bash build-deps.sh --force, bash build.sh --no-open, bash run-tests.sh, and bash run-integration-smoke.sh.

@r3dbars r3dbars mentioned this pull request Jun 14, 2026
12 tasks

r3dbars commented Jun 14, 2026

Copy link
Copy Markdown
Owner

Thanks again, Alfonso. I see this has been updated/rebased cleanly now, so I’m going to bring it onto main after the Home row-actions PR and run the repo checks on the combined result.

@r3dbars r3dbars merged commit ee79880 into r3dbars:main Jun 14, 2026
3 checks passed
@r3dbars

r3dbars commented Jun 14, 2026

Copy link
Copy Markdown
Owner

Thanks again, Alfonso. This is merged into main now at ee798805.

I reviewed the updated branch against the newest main; the mic-activity approach looks solid, and the backoff/self-capture guards are covered by tests. I also ran the repo gates on the merged result:

  • bash build-deps.sh --force
  • bash build.sh --no-open
  • bash run-tests.sh (5347 passed)
  • bash run-integration-smoke.sh
  • bash -n run-tests.sh && bash -n scripts/entrypoints/run-tests.sh
  • codex-review --mode branch --base origin/main (clean, no actionable findings)

Really appreciate the contribution.

acorretti pushed a commit to acorretti/transcripted that referenced this pull request Jun 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Detect spontaneous calls and offer to record, even with no calendar invite

2 participants