This project demonstrates forced alignment of speech audio with text transcriptions using the Montreal Forced Aligner (MFA), including handling of Out-of-Vocabulary (OOV) words.
- Audio files: 6 WAV files (3 from F2BJRLP dataset, 3 from ISLE dataset)
- Transcripts: Corresponding text transcriptions
- Total duration: 97.16 seconds
- Windows 10/11
- Miniconda3
- Download from https://docs.conda.io/en/latest/miniconda.html
- Install with default settings
- Restart terminal
# Create conda environment
conda create -n aligner -c conda-forge montreal-forced-aligner
# Activate environment
conda activate aligner
# Verify installation
mfa version# Download acoustic model
mfa model download acoustic english_us_arpa
# Download dictionary
mfa model download dictionary english_us_arpa
# Download G2P model (for OOV handling)
mfa model download g2p english_us_arpaMFA-Forced-Alignment-Assignment/
├── audio/ # Original .wav files
├── transcripts/ # Original .txt transcription files
├── corpus/ # Paired audio + text files (created during usage)
├── dictionary/ # Custom pronunciation dictionary files
├── output_before_oov/ # TextGrid files (initial alignment)
├── output_after_oov/ # TextGrid files (after OOV handling)
├── screenshots/ # Praat visualization screenshots
├── docs/ # Additional documentation
├── REPORT.md # Detailed analysis and observations
└── README.md # This file
MFA requires audio and transcription files to share the same base name in a single directory.
mkdir corpus
cp audio/*.wav corpus/
cp transcripts/*.txt corpus/mfa align corpus/ english_us_arpa english_us_arpa output_before_oov/ --cleanmfa validate corpus/ english_us_arpa english_us_arpa --ignore_acousticsThis command produces logs including oovs_found.txt with all OOV words and their counts.
-
Save the base dictionary locally:
mfa model save dictionary english_us_arpa dictionary/base_dict.txt
-
Create custom dictionary:
cp dictionary/base_dict.txt dictionary/custom_dict.txt
Edit custom_dict.txt and add manual pronunciations for OOV words in ARPABET format.
-
Re-run alignment with custom dictionary:
mfa align corpus/ dictionary/custom_dict.txt english_us_arpa output_after_oov/ --clean
- Initial alignment: 6 TextGrid files generated in output_before_oov/
- OOV words identified: 22 unique types (48 total tokens)
- Custom dictionary created with phonetic pronunciations for all OOV items
- Final alignment completed with improved TextGrids in output_after_oov/
TextGrid files can be opened in Praat for visualization:
- Download Praat: https://www.fon.hum.uva.nl/praat/
- Open the corresponding .wav file and .TextGrid file
- Select both objects and click View & Edit
Screenshots of before/after OOV handling comparisons are stored in the screenshots/ folder.
- Alignment accuracy improved significantly after adding custom pronunciations
- Most OOV issues were caused by proper names and non-standard terms
- Detailed analysis and metrics are provided in REPORT.md
Shashank Vishwakarma
February 2026