Skip to content

Pr/sherpa streaming backend#536

Merged
H-Chris233 merged 2 commits into
Open-Less:betafrom
weikeyi:pr/sherpa-streaming-backend
May 27, 2026
Merged

Pr/sherpa streaming backend#536
H-Chris233 merged 2 commits into
Open-Less:betafrom
weikeyi:pr/sherpa-streaming-backend

Conversation

@weikeyi
Copy link
Copy Markdown
Contributor

@weikeyi weikeyi commented May 27, 2026

User description

摘要

  • 将 Windows sherpa-onnx-local 从骨架推进到实际 runtime/provider 实现,支持 offline batch 与 experimental online streaming 两种模型模式。
  • 新增 Zipformer 中英双语流式模型 catalog、必需文件、下载映射和 mode 判断。
  • 后端 runtime 分离缓存 OfflineRecognizerOnlineRecognizer,并记录最近一次 prepare/transcribe/audio/error 诊断信息。
  • provider 增加 online worker:录音时消费 PCM chunk,输出 partial token,停止录音后返回最终 RawTranscript
  • Coordinator/commands 接入 sherpa streaming:online partial 通过 local-asr-token 事件发出,final 继续复用现有 polish / insert / history 收尾路径。
  • 增加 sherpa 模型 alias、language hint、下载校验、release archive、runtime 状态等单元测试覆盖。

验证

  • cargo check --manifest-path src-tauri/Cargo.toml
  • cargo test --manifest-path src-tauri/Cargo.toml --lib sherpa

结果:sherpa 相关测试 42 passed; 0 failed

说明

  • 这个 PR 只包含后端 runtime/provider/Coordinator/commands 接线,不包含前端 UI、i18n、spike example 和规划文档。
  • online streaming 代码路径已接入,但仍需要 Windows 真机下载模型后验证 partial/final、RTF、CPU 和长录音稳定性。

PR Type

Enhancement, Tests


Description

  • Add Zipformer streaming model catalog

  • Implement offline and online recognizers

  • Emit partial tokens during dictation

  • Harden downloads, archives, and status tests


Diagram Walkthrough

flowchart LR
  A["Sherpa catalog and metadata"] --> B["Sherpa runtime"]
  C["Download and archive checks"] --> B
  B --> D["Sherpa provider"]
  D --> E["Coordinator dictation"]
  D -- "partial tokens" --> F["local-asr-token events"]
  E -- "final transcript" --> G["History / polish flow"]
Loading

File Walkthrough

Relevant files
Documentation
2 files
mod.rs
Document sherpa offline and streaming modes                           
+1/-1     
types.rs
Refresh sherpa preference documentation                                   
+1/-2     
Enhancement
5 files
sherpa.rs
Add streaming model metadata and helpers                                 
+88/-9   
sherpa_provider.rs
Split provider into offline and online workers                     
+273/-20
sherpa_runtime.rs
Implement offline and streaming recognizer lifecycle         
+590/-51
coordinator.rs
Update coordinator for sherpa runtime lifecycle                   
+7/-9     
dictation.rs
Wire streaming partial tokens into dictation                         
+35/-8   
Tests
2 files
sherpa_download.rs
Verify archives and download integrity                                     
+186/-0 
commands.rs
Add sherpa alias and hint validation tests                             
+53/-10 

@github-actions
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Timeout Leak

If the online recognizer exceeds audio_timeout, finish() returns an error after sending Cancel but drops the worker JoinHandle without joining it. A slow or stuck streaming session can therefore leave the background thread running until process exit, leaking resources across repeated timeouts.

async fn finish(self, audio_timeout: Duration) -> Result<String> {
    let _ = self.tx.send(OnlineWorkerMessage::Finish);
    let result_rx = self
        .result_rx
        .lock()
        .take()
        .ok_or_else(|| anyhow::anyhow!("sherpa-onnx streaming result already taken"))?;
    let join_handle = self.join_handle.lock().take();
    let result = tokio::time::timeout(audio_timeout, async move {
        tokio::task::spawn_blocking(move || {
            result_rx.recv().map_err(|error| {
                anyhow::anyhow!("sherpa-onnx streaming worker closed: {error}")
            })?
        })
        .await
        .map_err(|error| anyhow::anyhow!("sherpa-onnx streaming join failed: {error:#}"))?
    })
    .await;
    let result = match result {
        Ok(result) => result,
        Err(_) => {
            self.cancelled.store(true, Ordering::SeqCst);
            let _ = self.tx.send(OnlineWorkerMessage::Cancel);
            anyhow::bail!("sherpa-onnx streaming transcribe timeout");
        }
    };
    if let Some(join_handle) = join_handle {
        let _ = join_handle.join();
    }
Unbounded Queue

The online path uses an unbounded mpsc::channel for PCM chunks. If the recognizer falls behind the recorder on a slow CPU or during a long session, audio buffers can accumulate without limit and grow memory usage linearly with recording length.

let (tx, rx) = mpsc::channel::<OnlineWorkerMessage>();
let (result_tx, result_rx) = mpsc::channel::<Result<String>>();

@H-Chris233
Copy link
Copy Markdown
Collaborator

如果实测没有问题的话,踢我一下我给你合了

@weikeyi
Copy link
Copy Markdown
Contributor Author

weikeyi commented May 27, 2026

@H-Chris233 实测过 流式输出没有问题

@H-Chris233 H-Chris233 merged commit 38e9fb2 into Open-Less:beta May 27, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants