Skip to content

feat(asr): stabilize start cue, fix dropped leading words, add auto-reconnect#19

Merged
that-yolanda merged 1 commit into
that-yolanda:masterfrom
leo-fengchao:feat/cue-and-reconnect
Jun 16, 2026
Merged

feat(asr): stabilize start cue, fix dropped leading words, add auto-reconnect#19
that-yolanda merged 1 commit into
that-yolanda:masterfrom
leo-fengchao:feat/cue-and-reconnect

Conversation

@leo-fengchao

Copy link
Copy Markdown
Contributor

What & why

Three reliability fixes for the recording flow. They share one finishing pipeline (finalize_and_paste) and the cue mechanism, so they are sent together; they can be reviewed as three independent concerns.

1. Start cue plays reliably

Cues now play through a dedicated, kept-warm renderer AudioContext with pre-decoded buffers, instead of spawning afplay on every cue. A freshly spawned afplay competes with an output device that is still settling, which attenuated the cue (low volume) or cut it short. The backend resolves the configured sound file and emits its bytes as base64 (cue:play); afplay remains a fallback if the file can't be read.

2. Leading words no longer dropped

  • A 350ms settle delay after the mic stream is ready (before entering Recording / playing the cue) gives the browser AEC/AGC time to converge — getUserMedia resolving does not mean the DSP has. Value tuned on-device.
  • WebSocket writes are serialized through a FIFO writer task, so the commit's last packet is always sent after every audio frame (an out-of-order last packet makes the server reject the tail).
  • The renderer awaits the final audio flush before signaling audio_stopped.

3. Auto-reconnect + text salvage

  • The ASR session connects in the background, so the user can speak immediately; audio captured before the session is ready is buffered and flushed on attach.
  • On a recoverable error/close mid-recording, a fresh session is reconnected (up to 3 attempts), carrying already-recognized text across so nothing is lost; the user gets audible/visual cues for the interruption and resume.
  • On a fatal error or exhausted retries, the recording is finalized with whatever was recognized instead of being discarded.
  • A session_epoch guards background connect/reconnect against cancel/restart races.

Notes

  • ASR error classification (fatal vs transient) lives in doubao.rs; the local sherpa-onnx engines are unaffected (they don't emit reconnectable errors).
  • keep_clipboard restore and the sherpa-onnx hotword→LLM-prompt hint are preserved in the shared finalize path.

Testing

  • cargo clippy -- -D warnings, cargo fmt --check, cargo test (163 passing)
  • vitest (122 passing), pnpm build:web, tsc --noEmit
  • Manual: start cue stable, no dropped leading words, network-drop reconnect + salvage verified on macOS.

🤖 Generated with Claude Code

…econnect

Three reliability fixes for the recording flow, sharing one finishing pipeline:

- Start cue stability: play cues through a dedicated, kept-warm renderer
  AudioContext with pre-decoded buffers instead of spawning afplay each time,
  so the cue is full-volume and never truncated. The backend resolves the sound
  file and emits it as base64 (cue:play); afplay stays as a fallback. Why: a
  freshly spawned afplay competes with an output device that is still settling,
  attenuating or clipping the cue.

- Dropped leading words: add a 350ms settle delay after the mic stream is ready
  (lets the browser AEC/AGC converge) before entering Recording and playing the
  cue; serialize WebSocket writes through a FIFO task so the last packet is
  always sent after every audio frame; await the final audio flush before
  signaling stop. Why: getUserMedia resolving does not mean the DSP has
  converged, and an out-of-order last packet makes the server reject the tail.

- Auto-reconnect + text salvage: connect the ASR session in the background so
  the user can speak immediately, buffering audio until the session attaches;
  on a recoverable error/close, reconnect a fresh session carrying the
  already-recognized text; on a fatal error or exhausted retries, finalize with
  whatever was recognized instead of discarding it. session_epoch guards against
  cancel/restart races. Why: transient network drops should not lose a recording.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@leo-fengchao leo-fengchao force-pushed the feat/cue-and-reconnect branch from 6412cca to 2e67186 Compare June 16, 2026 15:23
@leo-fengchao

leo-fengchao commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

heads-up:本 PR 已 rebase 到最新 master,冲突已解决,代码本身 cargo check / clippy -D warnings / cargo fmt --check / vitest(除下述一项)/ tsc 均通过。

CI 里若出现 1 个 vitest 失败,与本 PR 改动无关,是 master 上已存在的问题:

  • web/src/bridge/settings.tsgetHistory(daysBack = 1)(默认 1)
  • web/tests/bridge/settings.test.ts 仍断言默认值为 3
    expect(invoke).toHaveBeenCalledWith("get_history", { daysBack: 3 });
    

这两个文件本 PR 都未触碰,在干净的 master 上也会失败(应该是某次 stats 相关改动调整了 bridge 默认值但未同步更新该测试)。我没有擅自修改,因为默认值到底应为 1 还是 3 取决于设计意图,改测试或改 bridge 都可能盖掉真实预期——留给你判断。

@that-yolanda that-yolanda merged commit 2e67186 into that-yolanda:master Jun 16, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants