You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: i18n/en/docusaurus-plugin-content-docs/current/user-guide/backend/asr.md
+47Lines changed: 47 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -120,6 +120,53 @@ If you want to try other speech recognition models:
120
120
3. Modify the relevant configuration of `sherpa_onnx_asr` according to the instructions in `conf.yaml`
121
121
3. Modify the relevant configuration of `sherpa_onnx_asr` according to the instructions in `conf.yaml`
122
122
123
+
### Using Fire Red ASR Model
124
+
125
+
[Fire Red ASR](https://github.com/FireRedTeam/FireRedASR) is a high-quality Chinese-English speech recognition model that is also supported in sherpa-onnx. Compared to the default SenseVoiceSmall model, Fire Red ASR performs better in Chinese-English mixed scenarios.
126
+
127
+
#### Recommended Users
128
+
- Users who need high-quality Chinese-English mixed recognition
129
+
- Users with high requirements for recognition accuracy
130
+
- Configuration difficulty: Simple
131
+
132
+
#### Download Model
133
+
134
+
First, ensure `huggingface_hub` is installed to use the command line for downloading models:
135
+
136
+
```sh
137
+
uv add huggingface_hub
138
+
```
139
+
140
+
Use huggingface-cli to download the model:
141
+
142
+
```sh
143
+
uv run hf download csukuangfj/sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16 --local-dir models/sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16
If you're using CUDA inference, it's recommended to download the fp16 version of the model for better results. Replace `encoder.int8.onnx` and `decoder.int8.onnx` with their fp16 counterparts in the configuration above.
168
+
:::
169
+
123
170
## `fun_asr` (Local)
124
171
125
172
[FunASR](https://github.com/modelscope/FunASR?tab=readme-ov-file) is a fundamental end-to-end speech recognition toolkit from ModelScope that supports various ASR models. Among them, Alibaba's [FunAudioLLM](https://github.com/FunAudioLLM/SenseVoice) SenseVoiceSmall model performs well in both performance and speed.
Copy file name to clipboardExpand all lines: i18n/en/docusaurus-plugin-content-docs/current/user-guide/backend/tts.md
+50Lines changed: 50 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,6 +19,56 @@ sherpa-onnx is a powerful inference engine that supports multiple TTS models (in
19
19
For GPU inference (CUDA only), please refer to [CUDA Inference](/docs/user-guide/backend/asr#cuda-inference).
20
20
:::
21
21
22
+
## Piper TTS (Local & Lightweight and Fast)
23
+
Piper is a fast, local neural text-to-speech system that supports multiple languages and voices. It uses pre-trained ONNX models and can achieve real-time speech synthesis on CPU.
24
+
25
+
### Installation Steps
26
+
1. Install piper-tts:
27
+
```sh
28
+
uv pip install piper-tts
29
+
```
30
+
31
+
2. Download model files:
32
+
- Piper requires trained ONNX model files for speech generation
33
+
-**Recommended models**:
34
+
-`zh_CN-huayan-medium` - Chinese (Mandarin)
35
+
-`en_US-lessac-medium` - English
36
+
-`ja_JP-natsuya-medium` - Japanese
37
+
38
+
-**Download methods**:
39
+
- Method : Manual download
40
+
- Chinese model: [https://huggingface.co/csukuangfj/vits-piper-zh_CN-huayan-medium/tree/main](https://huggingface.co/csukuangfj/vits-piper-zh_CN-huayan-medium/tree/main)
41
+
- Other models: Search "piper" on [Hugging Face](https://huggingface.co/models) or train your own
42
+
43
+
-**File placement**:
44
+
- Download both `.onnx` and `.onnx.json` files to the `models/piper/` directory
45
+
46
+
3. Configure in `conf.yaml`:
47
+
```yaml
48
+
piper_tts:
49
+
model_path: "models/piper/zh_CN-huayan-medium.onnx"# ONNX model file path
50
+
speaker_id: 0# Speaker ID for multi-speaker models (use 0 for single-speaker models)
normalize_audio: true # Whether to normalize audio
56
+
use_cuda: false # Whether to use GPU acceleration (requires CUDA support)
57
+
```
58
+
59
+
1. Set `tts_model: piper_tts` in `conf.yaml`
60
+
61
+
### Features
62
+
- ✅ Completely local, no internet connection required
63
+
- ✅ Real-time CPU inference, fast speed
64
+
- ✅ Supports multiple languages and voices
65
+
- ✅ Supports GPU acceleration (optional)
66
+
- ✅ Small model files, easy to deploy
67
+
68
+
:::tip
69
+
For more model options, visit the [Piper Voice Samples page](https://rhasspy.github.io/piper-samples/) to listen and download models for different languages and voices.
70
+
:::
71
+
22
72
## pyttsx3 (Lightweight and Fast)
23
73
A simple and easy-to-use local TTS engine that uses the system's default speech synthesizer. We use `py3-tts` instead of the more famous `pyttsx3` because `pyttsx3` seems unmaintained and failed to run on the test computer.
0 commit comments