Skip to content

feat: add cross-platform audio probe#880

Open
thymikee wants to merge 10 commits into
mainfrom
feat/web-audio-probe
Open

feat: add cross-platform audio probe#880
thymikee wants to merge 10 commits into
mainfrom
feat/web-audio-probe

Conversation

@thymikee

@thymikee thymikee commented Jun 25, 2026

Copy link
Copy Markdown
Member

Summary

Adds audio probe start|status|stop for agent-device sessions. Agents can poll compact RMS/peak dBFS buckets while continuing other actions.

Supported backends now include:

  • Web browser sessions via the managed agent-browser provider, sampling HTML <audio>/<video> media elements through Web Audio.
  • macOS sessions via the macOS helper, sampling host system audio through ScreenCaptureKit.
  • iOS simulators and Android emulators on macOS hosts, using the same host system audio capture path because simulator/emulator sound is rendered by the Mac.

The Expo test app Audio tab now dogfoods both web and native playback with a generated 6-second 440 Hz sample. Docs/help are updated across CLI help, README, website command/debugging/client API/setup/security pages with platform support, timing units, polling flow, and Screen Recording requirements.

Tradeoffs: web does not capture whole-tab/system audio; host capture requires macOS Screen Recording permission and can include other audible host apps. Physical iOS and Android devices remain unsupported because their audio is not rendered through the Mac host.

Scope: 36 files overall, covering the audio command family, web provider, macOS helper/backend, iOS simulator and Android emulator host-routing, observability tests, docs/help, and the Expo test app.

Validation

Passed locally: focused unit tests for session-observability, capabilities, observability command metadata, and CLI args/help; CI=true pnpm typecheck; CI=true pnpm test-app:typecheck; CI=true pnpm format; CI=true pnpm build; CI=true pnpm check:fallow --base origin/main; swift build -c release --package-path macos-helper; iOS native test app build/install on iPhone 17 Pro and iPhone 17 Pro Max simulators; Android native test app build/install on Pixel 9 Pro XL emulator.

Focused test command:

CI=true pnpm exec vitest run --project unit src/daemon/handlers/__tests__/session-observability.test.ts src/core/__tests__/capabilities.test.ts src/commands/observability/index.test.ts src/utils/__tests__/args.test.ts

Result: 4 files passed, 174 tests passed.

Latest GitHub checks after the simulator support fix: Unit Tests, Coverage, Typecheck, Lint & Format, Fallow Code Quality, Integration Tests, Web Platform Smoke, Layering Guard, No test-only DI seams, iOS Runner Swift Compatibility, Bundle Size, CodeQL, and analysis jobs are passing. One duplicate Smoke Tests job was still pending when this description was updated.

Manual web dogfood used a live 20s probe while clicking the sample button:

agent-device audio probe start 20 1000 --platform web --session audio-review-followup --json
agent-device click 'label="Start sample"' --platform web --session audio-review-followup --json
agent-device audio probe status --platform web --session audio-review-followup --json

Real web status output excerpt:

{
  "audio": "probe",
  "state": "running",
  "active": true,
  "heard": true,
  "source": "media-elements",
  "backend": "agent-browser",
  "durationMs": 20000,
  "elapsedMs": 9085,
  "bucketMs": 1000,
  "sampleCount": 10,
  "mediaElementCount": 1,
  "sourceCount": 1,
  "rmsDbfs": [-90, -90, -11, -11, -11, -11, -11, -90, -90, -90],
  "peakDbfs": [-90, -90, -9, -9, -9, -9, -9, -90, -90, -90]
}

Simulator/emulator verification:

  • iOS: loaded the current Expo test app from Metro 8082 on iPhone 17 Pro Max, opened the Audio tab, and confirmed the native audio sample UI rendered.
  • Android: booted Pixel 9 Pro XL emulator, built/installed the current Expo app with expo-audio, loaded the app from Metro 8082, opened the Audio tab, and tapped Start sample.
  • Both audio probe start 10 1000 --platform ios|android reached the new host-system audio helper path, then failed at the expected local macOS TCC gate because Screen Recording permission is denied on this machine:
{
  "code": "COMMAND_FAILED",
  "message": "failed to start host audio probe: {\"error\":{\"details\":{\"error\":\"The user declined TCCs for application, window, display capture\",\"permission\":\"screen-recording\"},\"message\":\"audio probe requires Screen Recording permission on macOS\"},\"ok\":false}"
}

All manual agent-device sessions were closed, Metro was stopped, and the Android emulator booted for verification was shut down.

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown

Size Report

Metric Base Current Diff
JS raw 1.3 MB 1.4 MB +15.7 kB
JS gzip 437.6 kB 442.3 kB +4.7 kB
npm tarball 575.8 kB 583.7 kB +7.9 kB
npm unpacked 1.9 MB 2.0 MB +26.7 kB

Startup median (7 runs, lower is better):

Scenario Base Current Diff
CLI --version 26.9 ms 26.2 ms -0.7 ms
CLI --help 46.4 ms 46.4 ms -0.0 ms

Top changed chunks:

Chunk Raw diff Gzip diff
dist/src/9722.js +9.6 kB +3.1 kB
dist/src/session.js +2.3 kB +578 B
dist/src/2948.js +1.7 kB +490 B
dist/src/cli-help.js +2.1 kB +489 B
dist/src/9919.js +38 B +11 B

@thymikee

Copy link
Copy Markdown
Member Author

Fallow Code Quality is failing on new complexity in the audio path. Please split/flatten these before review:

  • src/daemon/handlers/session-observability.ts:643 resolveAudioCommandRequest — cyclomatic 16, cognitive 10, 60 lines, CRAP 71.3
  • src/platforms/web/agent-browser-provider.ts:298 normalizeAgentBrowserAudioProbeResult — cyclomatic 10, cognitive 9, 23 lines, CRAP 31.6

After updating, please rerun pnpm check:fallow --base origin/main or confirm the CI check is green.

@thymikee thymikee changed the title feat: add web audio probe feat: add audio probe Jun 25, 2026
@thymikee

Copy link
Copy Markdown
Member Author

Review pass complete: I did not find actionable blockers.

What I checked:

  • Public command wiring: audio is in the command catalog, typed client, batch policy, command metadata, CLI reader/writer, and output formatter.
  • Daemon route: audio is routed through the observability handler in daemon-command-registry, with an active-session requirement and web-only capability check.
  • Production backend path: web sessions resolve the request-scoped agent-browser provider, call agent-browser eval, unwrap the eval result envelope, and return bounded rmsDbfs/peakDbfs arrays rather than raw audio data.
  • Tests cover command parsing/projection, daemon registry ownership, provider normalization, and the provider-scenario path through the request router.
  • Validation includes green static checks/Fallow/build plus a live web dogfood run where audio probe status reported heard: true with compact dBFS buckets.

Residual risk is the stated product scope: this samples capturable HTML media elements exposed to Web Audio, not whole-tab/OS/device audio. That tradeoff is documented in the PR body and CLI guidance, so I would call this merge-ready once maintainers are comfortable with that first slice.

@thymikee thymikee force-pushed the feat/web-audio-probe branch from 422287d to f4f4d59 Compare June 26, 2026 10:55
@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://callstack.github.io/agent-device/pr-preview/pr-880/

Built to branch gh-pages at 2026-06-26 15:03 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@thymikee thymikee changed the title feat: add audio probe feat: add cross-platform audio probe Jun 26, 2026
@thymikee thymikee force-pushed the feat/web-audio-probe branch from cbe14d0 to 6f81454 Compare June 26, 2026 13:37
@thymikee

Copy link
Copy Markdown
Member Author

Review finding on the latest green #880 head:

Blocking: normal close does not stop an active host audio probe. session-close.ts defines stopSessionAudioProbe at lines 89-94 and teardownSessionResources calls it, but handleCloseCommand starts cleanup at app logs/perf/snapshot helper and never calls stopSessionAudioProbe before deleting the session. For macOS/iOS simulator/Android emulator audio probes, start stores a long-lived helper child in session.audioProbe; closing the session while a probe is running drops the only handle and lets the ScreenCaptureKit helper continue until its duration expires. Please call the audio cleanup helper from handleCloseCommand and add a close-with-active-audio-probe regression test.

@thymikee thymikee force-pushed the feat/web-audio-probe branch from 6f81454 to f910aaa Compare June 26, 2026 14:46

Copy link
Copy Markdown
Member Author

Thermo-Nuclear Code Quality Review

Feature works and the command/capability/CLI wiring is clean and consistent with the existing observability family. But there's one structural regression worth fixing before merge, plus several duplication/indirection smells that the codebase's own conventions argue against.

No file crosses the 1000-line threshold (largest is session-observability.ts at 774). But agent-browser-provider.ts nearly doubles (373 → 675), and essentially all of that growth is one function. That's the headline.

1. Blocker — the ~190-line browser probe is a magic string, not code (agent-browser-provider.ts)

buildAudioProbeEvalScript returns a template literal containing a complete state machine: source discovery, AudioContext plumbing, dBFS math, ring-buffered sampling, gesture-resume listeners, start/sample/stop lifecycle, and two full result-object literals. It's ~300 of the +303 lines in this file.

The issue isn't that it's eval'd in the browser (unavoidable for Web Audio) — it's that it's authored as an opaque string:

  • Not type-checked, not linted, not formatted. typecheck/format/lint all skip it. It's the one place in the diff where window[key], entry.audible, probe.timer, etc. have zero compiler coverage, in a file where everything around it is strict TS.
  • Third independent re-implementation of the same probe semantics — alongside the Swift AudioProbe.swift and the TS normalizeAudioProbeRecord. Three encodings of one JSON contract, none sharing a definition. The silenceDb = -90 / dBFS clamp logic is copy-pasted from Swift's audioProbeSilenceDb.
  • The result object is hand-built twice inside the string (no-AudioContext fallback and getContext()-failed fallback are near-identical 16-field literals), and both differ subtly from the canonical AudioProbeResult type — e.g. they hardcode durationMs: Number(options.durationMs) || 10000, re-encoding the daemon default in a second place.

There's a test that page.evaluate(script)s it against fakes, which is better than nothing — but testing-a-string-via-eval is not a substitute for the script being a real module.

Remedy: lift the script into its own first-class source file (web/audio-probe-script.ts or a .js asset) that is type-checked/linted, inlined at build time or read at runtime. That alone collapses the two duplicated fallback literals into one builder and lets the dBFS/shape logic be shared rather than re-typed.

2. Duplicated device-matrix predicate that must be kept in sync by hand

Two predicates encode "macOS ∨ iOS-simulator ∨ Android-emulator":

  • usesHostSystemAudioProbe in daemon/audio-probe.ts:22
  • isHostSystemAudioProbeDevice in core/capabilities.ts:25

They look identical but diverge deliberately — capabilities also returns true for web and gates the host arm on process.platform === 'darwin'. That's the dangerous kind of divergence: two routing predicates, subtly different, that together decide web-provider vs. host-helper dispatch. Change the matrix once (e.g. a new sim target) and one copy gets missed.

Remedy: one canonical predicate in capabilities (it already owns the capability matrix), with the host-only subset derived from it.

3. src/platforms/host-system-audio.ts is an identity wrapper

The entire file is:

export async function startHostSystemAudioProbeProcess(options) {
  return await startMacOsAudioProbeProcess(options);
}

…plus a re-declared options type duplicating the macos-helper param type. It adds a platform-neutral name over something that only ever calls the macOS helper, and buys no clarity — audio-probe.ts can import startMacOsAudioProbeProcess directly. If it's meant as a future seam for non-Mac hosts, that's speculative; delete the indirection until a second implementation exists.

4. The 14-field result shape is hand-built in four places

AudioProbeResult has a canonical definition and a normalizeAudioProbeRecord factory in audio-probe-result.ts, yet full literals of the shape are re-typed in:

These already drift: the two TS copies disagree on sourceCount (0 vs 1). Give the shape one construction site — e.g. emptyAudioProbeResult(partial) — and spread the fallbacks over it.

5. Cast papering over a type narrowing (normalizeAgentBrowserAudioProbeResult)

return normalizeAudioProbeRecord(record, {...}) as WebAudioProbeResult;

normalizeAudioProbeRecord returns AudioProbeResult (union source, optional mediaElementCount), but the cast asserts the web-narrowed { source: 'media-elements'; mediaElementCount: number } — which the normalizer doesn't actually guarantee (mediaElementCount can come back undefined). Either make the boundary real (a web-specific normalizer that enforces the narrowing) or widen the return type honestly.

Minor / confirm-only

  • Web probes aren't torn down on session close. stopSessionAudioProbe only touches session.audioProbe, which is host-only; web probe state lives in the browser window, so session-close.ts leaves a running web probe alive in the page. It's self-limiting (in-browser setTimeout + bounded buffers) so probably fine — worth a one-line note acknowledging the asymmetry, since the host path goes to lengths to stop cleanly.
  • CLI/capability/catalog wiring is good. audioCommandFacet / reader / writer / output follow the logs/network pattern exactly, and the audio validation helpers in session-observability.ts mirror the perf/network resolvers cleanly. No notes there.
  • Swift helper is well-structured (lock-guarded bucket accumulator, Codable response). Its only cross-cutting issue is being the third copy of the contract (fix: skill should work, even if the npm package is not installed #1/bug: fill on search does not always work #4).

Before approving

  1. Extract the eval script out of the string literal into a checked source file (fix: skill should work, even if the npm package is not installed #1) — the real blocker.
  2. Collapse the two device predicates to one canonical home (Update README with correct GitHub link #2).
  3. Delete the identity wrapper or justify the seam (bug: --session flag does not take effect #3).

#4 and #5 are strongly preferred but reasonable as fast-follows. The wiring/CLI/docs work is clean and needs no changes.


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant