🍵 matcha_tts_e

This repo is mainly based on 🍵 Matcha-TTS Official Github and some codes are modified. The purpose of this repository is to study and study 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching.

🔥Pytorch, ⚡Lightning, 🐉🐲🐲 hydra-core
🤗 wandb Click 👉

Trying to code simpler

While studying 🍵 Matcha-TTS Official Github, I modified some codes to make it simpler.

Logger: 🤗 wandb (More comfortable and easy access)
Vocoder: 🔥 [Pytorch-Hub]NVIDIA/HiFi-GAN
Alignment: resemble-ai/monotonic_align

Colab notebooks (Examples):

These codes are run and the example-speeches are synthesized in my vscode environment. I moved this Jupyter Notebook file to Colab to share the synthesized example-speeches below:

😲 trim_butterfly_16.ipynb | BS: 16 | NVIDIA GeForce RTX 4080 (x1)
😵 decent_meadow_46.ipynb | BS: 32 | LR: 2e-5 | NVIDIA GeForce RTX 4080 (x1)
⭐ wobbly_frog_53.ipynb | BS: 16 | bf16-mixed | NVIDIA GeForce RTX 4080 (x1)
👽 wobbly_serenity_54.ipynb | BS: 32 | bf16-mixed | NVIDIA GeForce RTX 4080 (x1)
😣 jolly_frog_47.ipynb | BS: 32 | LR: 2e-5 | NVIDIA GeForce RTX 4090 (x1)
🌟 eager_frost_50.ipynb | BS: 16 | NVIDIA GeForce RTX 4090 (x1)
✨ royal_grass_56.ipynb | BS: 16 | bf16-mixed | NVIDIA GeForce RTX 4090 (x1)

MemoryCleanupCallback Added!

import gc
import torch
import lightning as L

  class MemoryCleanupCallback(L.Callback):
      def on_train_epoch_end(self, trainer, pl_module):
          if torch.cuda.is_available():
              torch.cuda.empty_cache()
          gc.collect()
          
      def on_validation_epoch_end(self, trainer, pl_module):
          if torch.cuda.is_available():
              torch.cuda.empty_cache()
          gc.collect()

MAS(=Monotonic Alignment Search) Installation

This is not included in requirements.txt. You can install MAS(Monotonic_Alignment_Search) with a following command below:

resemble-ai/monotonic_align

pip install git+https://github.com/resemble-ai/monotonic_align.git

you can use like this:

import monotonic_align

Dataset: LJSpeech

Language: English 🇺🇸
Speaker: Single Speaker
sample_rate: 22.05kHz

Compute `mel_mean`, `mel_std` of ljspeech dataset

Let's assume we are training with LJ Speech

Download the dataset from here, extract it to your own data dir (In my case: data/LJSpeech/ljs/LJSpeech-1.1), and prepare the file lists to point to the extracted data like for item 5 in the setup of the NVIDIA Tacotron 2 repo.
Go to configs/data/ljspeech.yaml and change

train_filelist_path: data/filelists/ljs_audio_text_train_filelist.txt
valid_filelist_path: data/filelists/ljs_audio_text_val_filelist.txt

Generate normalisation statistics with the yaml file of dataset configuration

PYTHONPATH=. python matcha/utils/generate_data_statistics.py

Update these values in configs/data/ljspeech.yaml under data_statistics key.

data_statistics:  # Computed for ljspeech dataset 
  mel_mean: -5.5170512199401855
  mel_std: 2.0643811225891113

Now you got ready to train!

Train

First, you should log-in wandb with your token key in CLI.

wandb login --relogin '<your-wandb-api-token>'

And you can run training with one of these commands:

PYTHONPATH=. python matcha/train.py experiment=ljspeech

# If you run training on a cetain gpu_id:
CUDA_VISIBLE_DEVICES=2 PYTHONPATH=. python matcha/train.py experiment=ljspeech

Also, you can run for multi-gpu training:

# If you run multi-gpu training:
CUDA_VISIBLE_DEVICES=2,3 PYTHONPATH=. python matcha/train.py experiment=ljspeech trainer.devices=[0,1]

Synthesize

These codes are run and the example-speeches are synthesized in my vscode environment. I moved this Jupyter-Notebook file to Colab to share the synthesized example-speeches.

Samples_wobbly_frog_53.ipynb

you can check more samples Colab notebooks (Examples) above.
You can refer to the code for synthesis: matcha/utils/synthesize_utils.py
This notebook is also on this github repo: notebooks/Samples_wobbly_frog_53.ipynb
CLI Arguments: Will be Updated!

Reference

🍵 Paper: Matcha-TTS: A fast TTS architecture with conditional flow matching
└ Github: 🍵 Matcha-TTS Official Github
MAS(Monotonic Alignment Search)
└ resemble-ai/monotonic_align
🔥 Pytorch
⚡ Lightning
🐉🐲🐲 hydra-core

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
configs		configs
data/filelists		data/filelists
matcha		matcha
notebooks		notebooks
readme_imgs		readme_imgs
.project-root		.project-root
README.md		README.md
ljspeech.json		ljspeech.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🍵 matcha_tts_e

Trying to code simpler

Colab notebooks (Examples):

MemoryCleanupCallback Added!

MAS(=Monotonic Alignment Search) Installation

Dataset: LJSpeech

Compute `mel_mean`, `mel_std` of ljspeech dataset

Train

Synthesize

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🍵 matcha_tts_e

Trying to code simpler

Colab notebooks (Examples):

MemoryCleanupCallback Added!

MAS(=Monotonic Alignment Search) Installation

Dataset: LJSpeech

Compute mel_mean, mel_std of ljspeech dataset

Train

Synthesize

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Compute `mel_mean`, `mel_std` of ljspeech dataset

Packages