Fork of sprocket speicialized in aligning normal and electrolarynx speech.
- Prepare wav files. Use
test_prepare_*_files.pyto rename, resample to 16 kHz, and copy files from source to the working folder, i.e. underexample/data/wav/. - Also use
test_cut_wavs.pyandtest_stretch_audio.pyto cut initial transients and pre-stretch WAV fiels if needed. - Run
initialize.pysteps 1, 2, and 3. - Modify the
*.ymland*.listfiles if needed.- Add
fakein the YML file if needed. - Use the
f0andnpowhistograms underconf/figureto set reasonable thresholds.
- Add
- See
example/conf/speaker/nasal_tsai_mhint_20211128_cut[0.10].ymlfor an example.
- Run
run_sprocket.pysteps 1, 2, 3, and 6. The results will be stored inexample/data/pair/*/aligned/.
example/src/yml.py,example/src/extract_features.py,sprocket/speech/feature_extrator.py,sprocket/speech/analyzer.py: Add the ability to provide fake f0 (e.g. 100 Hz electrolarynx excitations) and median-filter the extracted f0 by specifying the kernel size.sprocket/speech/extfrm.py: Log the number of non-silent frames extracted.example/src/estimate_twf_and_jnt.py: Median-filter the power and also save the joint feature vectors in the HDF5 file format. Useful for outputting the aligned WAV files.example/run_sprocket.py: Add an extra step 6, which outputs the aligned WAV files. Also disable step 4 & 5.
example/file_utils.py: Add utilites to rename and resample files. Used bytest_prepare_{nasal, normal}_files.pyexample/src/output_aligned.py: Take the results of sprocket step 3 (aligning) and resynthesize aligned WAVs. Used in sprocket step 6.
example/test_prepare_{nasal, normal}_files.py: Prepare nasal and normal files.example/test_stretch_audio.py: Pre-stretch faster speech files (usually normal speech) to roughly match the slower ones. This depends on thepysoxpackage.example/test_cut_wavs.py: Cut the initial part of WAV files; useful to remove the initial transients.