TTS(= Text-To-Speech) Model for studying and researching. This Repository is mainly based on
ming024/FastSpeech2 and we modified and added codes. And We converted sentecnes into arpabet TextGrid files by Montreal Forced Aligner (MFA) before training. We could get these files from
ming024/FastSpeech2 repo. You can download from here: Google Drive Folder Link. This is why this repo is named fastspeech2_a.
Additionally, I added some codes from:
- 🤗
accelerate:multi-gpu- Trained on 2 x NVIDIA GeForce RTX 4090 GPUs - ✍🏻️
wandbwandbinstead ofTensorboard.wandbis compatible with 🤗accelerateand with 🔥pytorch.
torchmalloc.pyand 🌈coloramacan show your resource in real-time (during training)noisereduceis available when you runpreprocessor.py.Non-Stataionary Noise Reductionprop_decreasecan avoid data-distortion. (0.0 ~ 1.0)- Actually, NOT USED.
- 🔥
[Pytorch-Hub]NVIDIA/HiFi-GAN: used as a vocoder.
- LJSpeech
Language: English 🇺🇸Speaker: Single Speakersample_rate: 22.05kHz
These codes are run and the example-speeches are synthesized in my vscode environment. I moved this Jupyter Notebook file to Colab to share the synthesized example-speeches below:
- (EXAMPLE_Jupyternotebook) Synthesis.ipynb
- (EXAMPLE_CLI) Synthesis.ipynb
- More_Examples_Synthesized.ipynb
This preprocess.py can give you the pitch, energy, duration and phones from TextGrid files.
python preprocess.py config/LJSpeech/preprocess.yaml
First, you should log-in wandb with your token key in CLI.
wandb login --relogin '<your-wandb-api-token>'
Next, you can set your training environment with following commands.
accelerate config
With this command, you can start training.
accelerate launch train.py --n_epochs 800 --save_start_step 12000 --save_epochs 20 --synthesis_logging_epochs 20 --try_name T_01_LJSpeech
Also, you can train your TTS model with this command.
CUDA_VISIBLE_DEVICES=2,3 accelerate launch train.py --n_epochs 800 --save_start_step 12000 --save_epochs 20 --synthesis_logging_epochs 20 --try_name T_01_LJSpeech
you can synthesize speech in CLI with this command:
python synthesize.py --raw_texts <Text to syntheize to speech> --restore_step 100000
You can refer to Colab notebooks (Examples) above if you wanna synthesize.
Also, you can check these jupyter-notebooks:


