Skip to content

feat(audio): native cpal microphone capture + native cue#20

Closed
leo-fengchao wants to merge 6 commits into
that-yolanda:masterfrom
leo-fengchao:codex/native-cpal-capture
Closed

feat(audio): native cpal microphone capture + native cue#20
leo-fengchao wants to merge 6 commits into
that-yolanda:masterfrom
leo-fengchao:codex/native-cpal-capture

Conversation

@leo-fengchao

@leo-fengchao leo-fengchao commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

💡 Overview

On macOS, recording now captures audio natively via cpal (CoreAudio) instead of the WebView getUserMedia path, and the start/end cues play natively too. This removes the renderer from the audio hot path on macOS and makes capture/cue timing more predictable. Windows keeps the existing getUserMedia path.

This is the first of two PRs; the transcription failure-recovery / retry feature builds on top of this one.

🛠️ Key Changes

  • Native capture (macOS): cpal-based microphone capture with streaming resampling to 16 kHz mono, replacing the renderer getUserMedia route. Includes a unit-tested streaming resampler.
  • Native cue: start/end cues play via paste::play_sound instead of a base64 event to a renderer AudioContext.
  • Settle delay: macOS uses 0 ms (no browser AEC/AGC to converge under native capture); other platforms keep 350 ms.
  • Non-blocking warmup: the capture-thread ready signal uses a tokio::oneshot instead of a blocking std::mpsc recv, so it no longer blocks a tokio worker.
  • Docs: native overlay + retry design note.

🧪 Testing

  • cargo test (236 unit tests) and npx vitest run (123 tests) pass; cargo clippy clean.
  • Manually verified on macOS: native capture, native start/end cues, recording start/stop.

📸 Screenshots

No user-visible UI change (audio path only).

🔗 Related

Follow-up PR (failure recovery / retry) builds on this branch.

leochenfc and others added 5 commits June 24, 2026 00:25
为避免 WebView/WebRTC 音频处理导致录音音量前轻后响,macOS 改用 Rust/cpal 采集并复用现有 ASR 缓冲管线。

同时保存 ASR 实际收到的 16k mono PCM 为 WAV,便于继续排查音频质量。
原先用 std::sync::mpsc 的阻塞 recv 在 async 上下文里等待采集
线程就绪,会占住一个 tokio worker 线程。改用 oneshot 异步等待,
采集流建好的瞬间即返回,不再阻塞执行器。

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
先固化 WebView-free 主路径、late result 修复、录音资产和重试策略,确保后续实现按已确认的阶段推进。
调试原生悬浮窗时常需要不触发热重载的运行方式。

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
原 WebView getUserMedia 路径靠 base64 事件让前端 AudioContext 播
提示音;原生采集下直接用 paste::play_sound 播放,更稳更准时。
原生采集没有浏览器 AEC/AGC 需要收敛,macOS 的 settle 延迟置 0,
按键到「开始」更跟手。

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@leo-fengchao leo-fengchao force-pushed the codex/native-cpal-capture branch from aa4832b to f24a146 Compare June 23, 2026 16:31
@leo-fengchao leo-fengchao changed the title feat(audio): native cpal capture with transcription failure recovery feat(audio): native cpal microphone capture + native cue Jun 23, 2026
- native_audio 仅在 macOS 编译,cpal 也只在 macOS 用到;放在通用
  依赖里会让 Linux CI 拉取 cpal→alsa-sys,需要 libasound2-dev
  (alsa.pc) 而构建失败。改到 macOS target 依赖即可避开。
- append_audio_samples 的 app 仅用于 macOS 原生波形,非 macOS 下
  以 let _ = app 消除 unused 警告(clippy -D warnings)。

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@leo-fengchao leo-fengchao force-pushed the codex/native-cpal-capture branch from 7cc192f to fa7d4eb Compare June 24, 2026 01:38
@leo-fengchao

Copy link
Copy Markdown
Contributor Author

Superseded by #21, which was squash-merged into master as 4f73ee4. Since #21 was stacked on this branch, that squash already includes all of this PR's commits (native cpal capture, native cue, settle=0, the cpal macOS-gating and the Linux-CI fixes). Verified master now contains native_audio.rs, the macOS-gated cpal dependency, the native emit_cue, and the unused-arg fix — so there is nothing left to merge here. Closing as redundant.

@leo-fengchao leo-fengchao deleted the codex/native-cpal-capture branch June 24, 2026 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants