Skip to content

feat(overlay): transcription failure recovery with one-tap retry#21

Merged
that-yolanda merged 8 commits into
that-yolanda:masterfrom
leo-fengchao:codex/transcription-retry
Jun 24, 2026
Merged

feat(overlay): transcription failure recovery with one-tap retry#21
that-yolanda merged 8 commits into
that-yolanda:masterfrom
leo-fengchao:codex/transcription-retry

Conversation

@leo-fengchao

Copy link
Copy Markdown
Contributor

💡 Overview

Gives transcription failures a full recovery path so a single network blip / ASR timeout no longer loses an entire utterance. The audio is retained, the failure is surfaced on the overlay with a retry affordance, and retry replays the saved audio and streams the result back live.

⚠️ Built on top of #20 (native cpal capture). Until #20 merges, this PR's diff also shows #20's commits. Please review/merge #20 first.

🛠️ Key Changes

  • Recording retention: the same 16 kHz mono PCM sent to ASR is buffered and written to a WAV on stop. A new keep_recordings setting keeps successful recordings ~31 days; failed ones are kept for retry and reclaimed by the same 31-day sweep, or deleted once a retry succeeds. History gains play / retry entries.
  • One-tap retry: failures show a retryable hint + button on the overlay; retry replays the WAV, streams the transcript live, and restores focus to the user's previous app before pasting (clicking the button activates the overlay, so the foreground app is captured and reactivated).
  • Keyboard & ESC: while a retryable failure is shown, pressing the main hotkey triggers the retry; ESC terminates both the error and the retrying state. The retry button shows the triggering hotkey, e.g. 重试 (R ⌥), matching the settings-page symbols.
  • No-speech quick stop: stopping without speaking ends immediately (ESC-like) instead of waiting on a commit timeout, skipping the leading start-cue window when judging silence.
  • ASR robustness: commit_and_await_final returns an error on timeout instead of silently falling back to partial text, so the failure/retry path can engage.

🧪 Testing

  • cargo test (246 unit tests) and npx vitest run (123 tests) pass; cargo clippy clean; biome clean.
  • New unit tests cover hotkey-label formatting, no-speech audio-signal detection, and history failure/replace records.
  • Manually verified on macOS: WAV save/retain/cleanup, overlay/keyboard/history retry, ESC termination, focus restore, and the retry button label.

📸 Screenshots

UI changes (overlay retry button + label, settings "保留录音" toggle, history play/retry entries) — screenshots to be added.

🔗 Related

Depends on #20.

leochenfc and others added 5 commits June 24, 2026 00:25
为避免 WebView/WebRTC 音频处理导致录音音量前轻后响,macOS 改用 Rust/cpal 采集并复用现有 ASR 缓冲管线。

同时保存 ASR 实际收到的 16k mono PCM 为 WAV,便于继续排查音频质量。
原先用 std::sync::mpsc 的阻塞 recv 在 async 上下文里等待采集
线程就绪,会占住一个 tokio worker 线程。改用 oneshot 异步等待,
采集流建好的瞬间即返回,不再阻塞执行器。

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
先固化 WebView-free 主路径、late result 修复、录音资产和重试策略,确保后续实现按已确认的阶段推进。
调试原生悬浮窗时常需要不触发热重载的运行方式。

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
原 WebView getUserMedia 路径靠 base64 事件让前端 AudioContext 播
提示音;原生采集下直接用 paste::play_sound 播放,更稳更准时。
原生采集没有浏览器 AEC/AGC 需要收敛,macOS 的 settle 延迟置 0,
按键到「开始」更跟手。

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@leo-fengchao leo-fengchao force-pushed the codex/transcription-retry branch from 2bc9cca to 377e280 Compare June 24, 2026 01:31
leochenfc and others added 3 commits June 24, 2026 09:38
- native_audio 仅在 macOS 编译,cpal 也只在 macOS 用到;放在通用
  依赖里会让 Linux CI 拉取 cpal→alsa-sys,需要 libasound2-dev
  (alsa.pc) 而构建失败。改到 macOS target 依赖即可避开。
- append_audio_samples 的 app 仅用于 macOS 原生波形,非 macOS 下
  以 let _ = app 消除 unused 警告(clippy -D warnings)。

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
超时回退到 partial/final 文本会把未完成的识别当成成功结果,
掩盖网络问题。改为返回错误并把超时放宽到 15s,让上层据此走
失败兜底(提示 + 重试),结果更可控。

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
为转写失败提供完整的恢复路径,避免一次网络抖动就丢失整段语音。

录音留存:与发送给 ASR 相同的 16k 单声道 PCM 整段缓存,停止时
写为 WAV。成功后按 keep_recordings 设置保留或删除,失败录音保留
以供重试,未重试的录音 31 天后随留存清理一并回收。新增「保留录音」
设置项与历史记录里的播放/重试入口。

一键重试:失败在悬浮窗给出可重试提示与按钮,重放保留的 WAV 重新
转写,结果以流式方式回填,焦点在粘贴前交还给原窗口(点击按钮会
激活悬浮窗,故记录并恢复前台 App)。

键盘操作:失败提示展示期间再次按主热键即触发重试;错误态与重试中
均可按 ESC 终止。重试按钮显示触发热键(如「重试 (R ⌥)」),符号与
设置页一致。

其它:没说话就停止时立即结束、不进入重试(跳过开头提示音窗口的
能量判断);macOS 原生采集无 AEC/AGC,settle 延迟置 0。

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@leo-fengchao leo-fengchao force-pushed the codex/transcription-retry branch from 377e280 to 2154c6e Compare June 24, 2026 01:39
@that-yolanda that-yolanda merged commit 4f73ee4 into that-yolanda:master Jun 24, 2026
2 checks passed
@leo-fengchao leo-fengchao deleted the codex/transcription-retry branch June 24, 2026 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants