Merge pull request #22 from zyumo777/main

t41372 · web-flow · commit 8e48675460df · 2026-01-24T23:45:58.000-07:00
docs: 添加 Piper TTS 安装指南 AND docs: 添加 Fire Red ASR 模型使用教程
diff --git a/docs/user-guide/backend/asr.md b/docs/user-guide/backend/asr.md
@@ -85,6 +85,53 @@ uv add onnxruntime-gpu==1.17.1 sherpa-onnx==1.10.39+cuda -f https://k2-fsa.githu
 2. 将模型文件放置在项目的 `models` 目录下
 3. 按照 `conf.yaml` 中的说明修改 `sherpa_onnx_asr` 的相关配置
 
+### 使用 Fire Red ASR 模型
+
+[Fire Red ASR](https://github.com/FireRedTeam/FireRedASR) 是一个高质量的中英文语音识别模型，在 sherpa-onnx 中也得到了支持。相比默认的 SenseVoiceSmall 模型，Fire Red ASR 在中英文混合场景下表现更好。
+
+#### 推荐用户
+- 需要高质量中英文混合识别的用户
+- 对识别准确度要求较高的用户
+- 配置难度: 简单
+
+#### 下载模型
+
+首先确保安装了 `huggingface_hub`，以便使用命令行下载模型：
+
+```sh
+uv add huggingface_hub
+```
+
+使用 huggingface-cli 下载模型：
+
+```sh
+uv run hf download csukuangfj/sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16 --local-dir models/sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16
+```
+
+#### 配置使用
+
+在 `conf.yaml` 中配置 Fire Red ASR 模型：
+
+```yaml
+asr_config:
+  asr_model: 'sherpa_onnx_asr'
+  
+  sherpa_onnx_asr:
+    model_type: 'fire_red_asr'
+    
+    fire_red_asr_encoder: './models/sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/encoder.int8.onnx'
+    fire_red_asr_decoder: './models/sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/decoder.int8.onnx'
+    tokens: './models/sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/tokens.txt'
+    
+    num_threads: 4
+    provider: 'cpu'  # 可选 'cpu' 或 'cuda'
+    use_itn: False
+```
+
+:::info
+如果您使用 CUDA 推理，建议下载 fp16 版本的模型以获得更好的效果。将上述配置中的 `encoder.int8.onnx` 和 `decoder.int8.onnx` 替换为对应的 fp16 版本文件即可。
+:::
+
 ## `fun_asr` (本地)
 
 [FunASR](https://github.com/modelscope/FunASR?tab=readme-ov-file) 是 ModelScope 的一个基础端到端语音识别工具包，支持多种 ASR 模型。其中，阿里的 [FunAudioLLM](https://github.com/FunAudioLLM/SenseVoice) 的 SenseVoiceSmall 模型在性能和速度上都表现不错。
diff --git a/docs/user-guide/backend/tts.md b/docs/user-guide/backend/tts.md
@@ -20,6 +20,62 @@ sherpa-onnx 是一个强大的推理引擎，支持多种 TTS 模型（包括 Me
 如需使用 GPU 推理（仅支持 CUDA），请参考 [CUDA推理](/docs/user-guide/backend/asr#cuda-推理)。
 :::
 
+## Piper TTS（本地 & 轻量快速）
+Piper 是一个快速、本地化的神经网络文本转语音系统，支持多种语言和声音。使用预训练的 ONNX 模型，可在 CPU 上实现实时语音合成。
+
+### 安装步骤
+1. 安装 piper-tts：
+```sh
+uv pip install piper-tts
+```
+
+2. 下载模型文件：
+   - Piper 需要使用经过训练的 ONNX 模型文件来进行语音生成
+   - **推荐模型**：
+     - `zh_CN-huayan-medium` - 中文（普通话）
+     - `en_US-lessac-medium` - 英文
+     - `ja_JP-natsuya-medium` - 日文
+   
+   - **下载方式**：
+     - 方式一：手动下载
+       - 中文模型：[https://huggingface.co/csukuangfj/vits-piper-zh_CN-huayan-medium/tree/main](https://huggingface.co/csukuangfj/vits-piper-zh_CN-huayan-medium/tree/main)
+       - 其他模型：在 [Hugging Face](https://huggingface.co/models) 搜索 "piper" 或自行训练
+     - 方式二：使用命令自动下载（不推荐）
+       ```sh
+       python -m piper.download_voices zh_CN-huayan-medium
+       ```
+
+   
+   - **文件存放**：
+     - 下载 `.onnx` 和 `.onnx.json` 两个文件到 `models/piper/` 目录
+
+3. 在 `conf.yaml` 中配置：
+```yaml
+piper_tts:
+  model_path: "models/piper/zh_CN-huayan-medium.onnx"  # ONNX 模型文件路径
+  speaker_id: 0              # 多说话人模型的说话人 ID（单说话人模型使用 0）
+  length_scale: 1.0          # 语速控制（1.0 为正常速度，>1.0 更慢，<1.0 更快）
+  noise_scale: 0.667         # 音频变化程度（0.0-1.0）
+  noise_w: 0.8               # 说话风格变化程度（0.0-1.0）
+  volume: 1.0                # 音量（0.0-1.0）
+  normalize_audio: true      # 是否标准化音频
+  use_cuda: false            # 是否使用 GPU 加速（需要 CUDA 支持）
+```
+
+4. 在 `conf.yaml` 中设置 `tts_model: piper_tts`
+
+### 特点
+- ✅ 完全本地化，无需网络连接
+- ✅ CPU 实时推理，速度快
+- ✅ 支持多种语言和声音
+- ✅ 支持 GPU 加速（可选）
+- ✅ 模型文件小，易于部署
+
+:::tip
+如需更多模型选择，可访问 [Piper 语音样本页面](https://rhasspy.github.io/piper-samples/) 试听并下载不同语言和声音的模型。
+:::
+
+
 ## pyttsx3（轻量快速）
 简单易用的本地 TTS 引擎，使用系统默认语音合成器。使用 `py3-tts` 而不是更著名的 `pyttsx3`，因为 `pyttsx3` 似乎无人维护，且在测试电脑上无法运行。
 
diff --git a/i18n/en/docusaurus-plugin-content-docs/current/user-guide/backend/asr.md b/i18n/en/docusaurus-plugin-content-docs/current/user-guide/backend/asr.md
@@ -120,6 +120,53 @@ If you want to try other speech recognition models:
 3. Modify the relevant configuration of `sherpa_onnx_asr` according to the instructions in `conf.yaml`
 3. Modify the relevant configuration of `sherpa_onnx_asr` according to the instructions in `conf.yaml`
 
+### Using Fire Red ASR Model
+
+[Fire Red ASR](https://github.com/FireRedTeam/FireRedASR) is a high-quality Chinese-English speech recognition model that is also supported in sherpa-onnx. Compared to the default SenseVoiceSmall model, Fire Red ASR performs better in Chinese-English mixed scenarios.
+
+#### Recommended Users
+- Users who need high-quality Chinese-English mixed recognition
+- Users with high requirements for recognition accuracy
+- Configuration difficulty: Simple
+
+#### Download Model
+
+First, ensure `huggingface_hub` is installed to use the command line for downloading models:
+
+```sh
+uv add huggingface_hub
+```
+
+Use huggingface-cli to download the model:
+
+```sh
+uv run hf download csukuangfj/sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16 --local-dir models/sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16
+```
+
+#### Configuration
+
+Configure the Fire Red ASR model in `conf.yaml`:
+
+```yaml
+asr_config:
+  asr_model: 'sherpa_onnx_asr'
+  
+  sherpa_onnx_asr:
+    model_type: 'fire_red_asr'
+    
+    fire_red_asr_encoder: './models/sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/encoder.int8.onnx'
+    fire_red_asr_decoder: './models/sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/decoder.int8.onnx'
+    tokens: './models/sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/tokens.txt'
+    
+    num_threads: 4
+    provider: 'cpu'  # Options: 'cpu' or 'cuda'
+    use_itn: False
+```
+
+:::info
+If you're using CUDA inference, it's recommended to download the fp16 version of the model for better results. Replace `encoder.int8.onnx` and `decoder.int8.onnx` with their fp16 counterparts in the configuration above.
+:::
+
 ## `fun_asr` (Local)
 
 [FunASR](https://github.com/modelscope/FunASR?tab=readme-ov-file) is a fundamental end-to-end speech recognition toolkit from ModelScope that supports various ASR models. Among them, Alibaba's [FunAudioLLM](https://github.com/FunAudioLLM/SenseVoice) SenseVoiceSmall model performs well in both performance and speed.
diff --git a/i18n/en/docusaurus-plugin-content-docs/current/user-guide/backend/tts.md b/i18n/en/docusaurus-plugin-content-docs/current/user-guide/backend/tts.md
@@ -19,6 +19,56 @@ sherpa-onnx is a powerful inference engine that supports multiple TTS models (in
 For GPU inference (CUDA only), please refer to [CUDA Inference](/docs/user-guide/backend/asr#cuda-inference).
 :::
 
+## Piper TTS (Local & Lightweight and Fast)
+Piper is a fast, local neural text-to-speech system that supports multiple languages and voices. It uses pre-trained ONNX models and can achieve real-time speech synthesis on CPU.
+
+### Installation Steps
+1. Install piper-tts:
+```sh
+uv pip install piper-tts
+```
+
+2. Download model files:
+   - Piper requires trained ONNX model files for speech generation
+   - **Recommended models**:
+     - `zh_CN-huayan-medium` - Chinese (Mandarin)
+     - `en_US-lessac-medium` - English
+     - `ja_JP-natsuya-medium` - Japanese
+   
+   - **Download methods**:
+     - Method : Manual download
+       - Chinese model: [https://huggingface.co/csukuangfj/vits-piper-zh_CN-huayan-medium/tree/main](https://huggingface.co/csukuangfj/vits-piper-zh_CN-huayan-medium/tree/main)
+       - Other models: Search "piper" on [Hugging Face](https://huggingface.co/models) or train your own
+
+   - **File placement**:
+     - Download both `.onnx` and `.onnx.json` files to the `models/piper/` directory
+
+3. Configure in `conf.yaml`:
+```yaml
+piper_tts:
+  model_path: "models/piper/zh_CN-huayan-medium.onnx"  # ONNX model file path
+  speaker_id: 0              # Speaker ID for multi-speaker models (use 0 for single-speaker models)
+  length_scale: 1.0          # Speech rate control (1.0 = normal, >1.0 = slower, <1.0 = faster)
+  noise_scale: 0.667         # Audio variation level (0.0-1.0)
+  noise_w: 0.8               # Speaking style variation level (0.0-1.0)
+  volume: 1.0                # Volume (0.0-1.0)
+  normalize_audio: true      # Whether to normalize audio
+  use_cuda: false            # Whether to use GPU acceleration (requires CUDA support)
+```
+
+1. Set `tts_model: piper_tts` in `conf.yaml`
+
+### Features
+- ✅ Completely local, no internet connection required
+- ✅ Real-time CPU inference, fast speed
+- ✅ Supports multiple languages and voices
+- ✅ Supports GPU acceleration (optional)
+- ✅ Small model files, easy to deploy
+
+:::tip
+For more model options, visit the [Piper Voice Samples page](https://rhasspy.github.io/piper-samples/) to listen and download models for different languages and voices.
+:::
+
 ## pyttsx3 (Lightweight and Fast)
 A simple and easy-to-use local TTS engine that uses the system's default speech synthesizer. We use `py3-tts` instead of the more famous `pyttsx3` because `pyttsx3` seems unmaintained and failed to run on the test computer.