Automatic Audio Segment based on Text (AAST)

This repository makes it easier and more powerful to use the Audio Segment Model provided by MMS Aligner. For Japanese, kanji to hiragana conversion is supported before voice segmentation. (use -l args)

Installation

Step 1: Git clone

git clone https://github.com/YJ-20/Automatic-Audio-Segment-based-on-Text.git

Step 2: Executes the installation script

cd Automatic-Audio-Segment-based-on-Text
bash install.sh

Run

Step 1: Prepare Data Creates a text file corresponding to the original audio file. The audio will be split according to the text file's newlines. txt, csv, and xlsx are supported as text file extensions.

Example content of the input text file :
```
Text of the desired first segment
Text of the desired second segment
Text of the desired third segment
```

Step 2: Run AAST You can choose run the audio segment code using GUI or CLI.
Step 2-1: GUI Run Just execute the Python file run.py and follow the buttons on the tkinter GUI window.
```
python run.py
```
Step 2-2: CLI Run Directly execute the Python file align_and_segment.py. (required: -a, -t) Specify the audio path and text file path. (optional: -o, -l) Specify the path where the output file will be saved and the language of the audio.
```
python align_and_segment.py -a /path/to/audio.wav -t /path/to/textfile -o /path/to/output_dir -l <iso_code>
```

The code above saves the split audio files under the output directory based on newlines in the input test file. Splitted audio information is stored in the splitted_audio_info.csv file in the output directory. Information stored: 1. original start time (audio_start_sec), 2. audio path (audio_filepath) 3. length (duration) 4. text

{"audio_start_sec": 0.0, "audio_filepath": "/path/to/output/segment1.wav", "duration": 6.8, "text": "she wondered afterwards how she could have spoken with that hard serenity how she could have"}
{"audio_start_sec": 6.8, "audio_filepath": "/path/to/output/segment2.wav", "duration": 5.3, "text": "gone steadily on with story after story poem after poem till"}
{"audio_start_sec": 12.1, "audio_filepath": "/path/to/output/segment3.wav", "duration": 5.9, "text": "allan's grip on her hands relaxed and he fell into a heavy tired sleep"}

Source: MMS Aligner

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
uroman		uroman
README.md		README.md
align_and_segment.py		align_and_segment.py
align_utils.py		align_utils.py
config.json		config.json
install.sh		install.sh
kanji_converter.py		kanji_converter.py
norm_config.py		norm_config.py
punctuations.lst		punctuations.lst
requirements.txt		requirements.txt
run.py		run.py
text_normalization.py		text_normalization.py
ui.py		ui.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic Audio Segment based on Text (AAST)

Installation

Run

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Automatic Audio Segment based on Text (AAST)

Installation

Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages