Skip to content

Lourdle/CosyVoiceForOnnx

Repository files navigation

CosyVoice for ONNX

English (current) | 简体中文

This project contains modified versions of CosyVoice's Flow and HiFT modules to make them exportable to ONNX format. It also includes necessary source code from the original CosyVoice and Matcha-TTS repositories to ensure compatibility and usability.

Modified Files

Modules simplified to minimize dependencies

  • matcha/hifigan/xutils.py: Removed dependency on matplotlib.
  • matcha/models/components/flow_matching.py: Removed logger.
  • cosyvoice/utils/class_utils.py: Removed unused imports and functions.

Modules adapted for TorchScript and ONNX

  • cosyvoice/flow/flow_matching.py: Adjusted forward to meet TorchScript requirements.
  • cosyvoice/flow/flow.py: Adjusted forward to meet TorchScript requirements; removed test code.
  • cosyvoice/flow/DiT/modules.py: Modified AttnProcessor to manually implement scaled dot-product attention, casting QK to float32 before calculating attention scores to avoid NaN issues.
  • cosyvoice/hifigan/generator.py: Added ISTFT class to replace torch.istft, avoiding complex tensors to adapt to ONNX; removed test code.
  • cosyvoice/transformer/attention.py: Removed cache support.
  • cosyvoice/transformer/encoder_layer.py: Removed cache support.
  • cosyvoice/transformer/upsample_encoder.py: Removed streaming support; adjusted forward implementation to meet TorchScript requirements.

Usage

Install Dependencies

Please install dependencies first:

pip install -r requirements.txt

Note: The export scripts themselves do not depend on ONNX Runtime. Install it additionally only if you need local inference verification.

Export Flow Module to ONNX

Use the script convert_flow_to_onnx.py to export the model to ONNX.

Arguments

  • --model_path: Path to the CosyVoice model checkpoint directory.
  • --flow_name: Filename of the Flow module (default: flow.pt).
  • --half: Convert module parameters to half precision.
  • --output_path: Path to save the exported ONNX file.
  • --int32_token: Use int32 for input tokens (default is int64 if not specified).
  • --add_speed_control: Add speed control input for the Flow module.
  • --device: PyTorch device used for export (default: default). Must be a valid device string (e.g., cuda:0, cpu). When default, the script uses cuda if available, otherwise cpu.

Example

python convert_flow_to_onnx.py --device cpu --half --model_path path/to/model --output_path path/to/output.onnx --int32_token --add_speed_control

Export HiFT Module to ONNX

Use the script convert_hift_to_onnx.py to export the model to ONNX.

Arguments

  • --model_path: Path to the CosyVoice model checkpoint directory.
  • --hift_name: Filename of the HiFT module (default: hift.pt).
  • --output_path: Path to save the exported ONNX file.
  • --device: PyTorch device used for export (default: default). Behavior is the same as the Flow export script.

Example

python convert_hift_to_onnx.py --model_path path/to/model --output_path path/to/output.onnx

Compose Flow and HiFT ONNX Models

Use the script compose_flow_hift.py to combine the exported Flow and HiFT modules into a single ONNX.

Arguments

  • --flow_path: Path to the Flow ONNX.
  • --hift_path: Path to the HiFT ONNX.
  • --output_path: Path to save the composed ONNX.

Example

python compose_flow_hift.py --flow_path path/to/flow.onnx --hift_path path/to/hift.onnx --output_path path/to/composed.onnx

Known Limitations

  • Streaming support has been removed.
  • The ONNX conversion tool only supports CosyVoice2-0.5B and Fun-CosyVoice3-0.5B-2512 versions.

FAQ

Combination error: "Bad node spec for node. Name: flow_/decoder/Loop OpType: Loop". Tested to occur with onnx version 1.19.0. If this issue occurs, please use onnx version 1.16.0 for the combination operation, which has been tested to work.

Error when running Flow ONNX model with speed control. Ensure the speed input is a scalar of type float32. For example:

speed = np.array(1.25, dtype=np.float32)

Or C++ code:

float speed_value = 1.25f;
Ort::Value speed = Ort::Value::CreateTensor(memory_info, &speed_value, sizeof(float), nullptr, 0, ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT);

Combination error: "google.protobuf.message.EncodeError: Failed to serialize proto" This error is usually caused by insufficient computer memory. Try closing other programs that consume a lot of memory, or run the combination script on a computer with more memory.

Test Environment

I have tested in the following environment:

  • Python 3.12.9/3.10.18
  • PyTorch 2.8.0+cu129/2.3.1+cu121
  • ONNX 1.16.0
  • ONNX Runtime DirectML 1.22.0 (Python) / 1.23.0 (C++) (for runtime verification)
  • Windows 11 25H2 x64

Acknowledgements

License

This repository redistributes code covered by Apache-2.0 (CosyVoice) and MIT (Matcha-TTS) licenses; the custom conversion scripts in the root directory of the repository and utils use the MIT license. See the LICENSE and NOTICE files for details; please retain upstream license headers when modifying the source code.

About

No description or website provided.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages