Skip to content

StreamlineDevelopers/AccentConversion

Repository files navigation

Local Accent Correction

Installation

  1. Prerequisites:

    • Python 3.12+
    • UV package manager (recommended)
    • CUDA-capable GPU (recommended for performance)
  2. Install Dependencies:

    uv sync

Model Setup

This pipeline requires a pretrained HiFi-GAN model for waveform synthesis.

  1. Create directory:

    mkdir -p pretrained_models/hifigan
  2. Download Model: Download generator.ckpt from the SpeechBrain HiFi-GAN HuggingFace repo (or your preferred source).

  3. Place File: Save the generator.ckpt file inside pretrained_models/hifigan/.

Usage

Run the main pipeline on an audio file:

uv run main.py --source input_audio.wav

Options

Argument Description Default
--source Path to the source audio file (accented). Required
--pitch Pitch transfer strength (0.0 - 1.0). 0.7
--timbre Timbre transfer strength (0.0 - 1.0). 0.4
--energy Energy transfer strength (0.0 - 1.0). 0.3
--gender Force 'male' or 'female' gender for TTS. Auto-detect
--text Override transcription with provided text. Auto-transcribe

Example

uv run main.py --source ishika-before-clean.wav --pitch 0.8 --timbre 0.5

Project Structure

  • main.py: Entry point for the pipeline.
  • pipeline.py: Core logic for audio processing (DTW, PSOLA, spectral transfer).
  • stt.py: Speech-to-Text module using faster-whisper.
  • tts.py: Text-to-Speech module using Chatterbox.
  • voices/: Directory containing reference voice samples.
  • pretrained_models/: Directory for local model weights.

License

[License Information Here]

About

So this is a POC. Without model. but the pipeline is based on https://www.isca-archive.org/interspeech_2025/nguyen25c_interspeech.pdf

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages