feat(asr): stabilize start cue, fix dropped leading words, add auto-reconnect#19
Merged
that-yolanda merged 1 commit intoJun 16, 2026
Conversation
…econnect Three reliability fixes for the recording flow, sharing one finishing pipeline: - Start cue stability: play cues through a dedicated, kept-warm renderer AudioContext with pre-decoded buffers instead of spawning afplay each time, so the cue is full-volume and never truncated. The backend resolves the sound file and emits it as base64 (cue:play); afplay stays as a fallback. Why: a freshly spawned afplay competes with an output device that is still settling, attenuating or clipping the cue. - Dropped leading words: add a 350ms settle delay after the mic stream is ready (lets the browser AEC/AGC converge) before entering Recording and playing the cue; serialize WebSocket writes through a FIFO task so the last packet is always sent after every audio frame; await the final audio flush before signaling stop. Why: getUserMedia resolving does not mean the DSP has converged, and an out-of-order last packet makes the server reject the tail. - Auto-reconnect + text salvage: connect the ASR session in the background so the user can speak immediately, buffering audio until the session attaches; on a recoverable error/close, reconnect a fresh session carrying the already-recognized text; on a fatal error or exhausted retries, finalize with whatever was recognized instead of discarding it. session_epoch guards against cancel/restart races. Why: transient network drops should not lose a recording. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
6412cca to
2e67186
Compare
Contributor
Author
|
heads-up:本 PR 已 rebase 到最新 CI 里若出现 1 个 vitest 失败,与本 PR 改动无关,是
这两个文件本 PR 都未触碰,在干净的 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
Three reliability fixes for the recording flow. They share one finishing pipeline (
finalize_and_paste) and the cue mechanism, so they are sent together; they can be reviewed as three independent concerns.1. Start cue plays reliably
Cues now play through a dedicated, kept-warm renderer
AudioContextwith pre-decoded buffers, instead of spawningafplayon every cue. A freshly spawnedafplaycompetes with an output device that is still settling, which attenuated the cue (low volume) or cut it short. The backend resolves the configured sound file and emits its bytes as base64 (cue:play);afplayremains a fallback if the file can't be read.2. Leading words no longer dropped
getUserMediaresolving does not mean the DSP has. Value tuned on-device.audio_stopped.3. Auto-reconnect + text salvage
session_epochguards background connect/reconnect against cancel/restart races.Notes
fatalvs transient) lives indoubao.rs; the local sherpa-onnx engines are unaffected (they don't emit reconnectable errors).keep_clipboardrestore and the sherpa-onnx hotword→LLM-prompt hint are preserved in the shared finalize path.Testing
cargo clippy -- -D warnings,cargo fmt --check,cargo test(163 passing)vitest(122 passing),pnpm build:web,tsc --noEmit🤖 Generated with Claude Code