-
Prerequisites:
- Python 3.12+
- UV package manager (recommended)
- CUDA-capable GPU (recommended for performance)
-
Install Dependencies:
uv sync
This pipeline requires a pretrained HiFi-GAN model for waveform synthesis.
-
Create directory:
mkdir -p pretrained_models/hifigan
-
Download Model: Download
generator.ckptfrom the SpeechBrain HiFi-GAN HuggingFace repo (or your preferred source). -
Place File: Save the
generator.ckptfile insidepretrained_models/hifigan/.
Run the main pipeline on an audio file:
uv run main.py --source input_audio.wav| Argument | Description | Default |
|---|---|---|
--source |
Path to the source audio file (accented). | Required |
--pitch |
Pitch transfer strength (0.0 - 1.0). | 0.7 |
--timbre |
Timbre transfer strength (0.0 - 1.0). | 0.4 |
--energy |
Energy transfer strength (0.0 - 1.0). | 0.3 |
--gender |
Force 'male' or 'female' gender for TTS. | Auto-detect |
--text |
Override transcription with provided text. | Auto-transcribe |
uv run main.py --source ishika-before-clean.wav --pitch 0.8 --timbre 0.5main.py: Entry point for the pipeline.pipeline.py: Core logic for audio processing (DTW, PSOLA, spectral transfer).stt.py: Speech-to-Text module using faster-whisper.tts.py: Text-to-Speech module using Chatterbox.voices/: Directory containing reference voice samples.pretrained_models/: Directory for local model weights.
[License Information Here]