This project leverages ESPnet2 for converting text to a byte array and subsequently reconstructing the audio from the byte data.
Follow the steps below to set up and run the project:
git clone https://github.com/samiabat/ESPnet-text-2-byte-audio-buffer.gitpip install -r requirements.txtCreate a folder named 'model' in the project root directory. Place the 'config.yaml' and 'train.total_count.ave_10best.pth' files inside the 'model' folder.
Create a file named 'text.txt' in the project root directory and add the text you want to synthesize.
python3 ESPnetT2S.py- The
ESPnetTextToByteclass inESPnetT2S.pyhandles the text-to-speech conversion and byte array creation. - The
get_byte_datamethod in the class writes the byte data to a file named 'audio_byte_file.raw'. ⚠️ The byte data is infloat32format so when load the buffur file it should be infloat32.- Adjust the file paths as needed, and feel free to customize the code to suit your requirements.