Worker that extracts audio from video (for further processing in other workers).
There are 2 ways in which the worker can be run:
- Check if Docker is installed
- Make sure you have the
.env.overridefile in your local repo folder - Open your preferred terminal and navigate to the local repository folder
- To build the image, execute the following command:
docker build . -t audio-extraction-worker
- To run the worker, execute the following command:
docker compose up
All commands should be run within WSL if on Windows or within your terminal if on Linux.
- Follow the steps here (under "Adding
pyproject.tomland generating apoetry.lockbased on it") to install Poetry and the dependencies required to run the worker - Make sure you have the
.env.overridefile in your local repo folder - Install
ffmpeg. You can run this command, for example:
apt-get -y update && apt-get -y upgrade && apt-get install -y --no-install-recommends ffmpeg
- Navigate inside the
scriptsfolder then execute the following command:
./run.sh
The expected run of this worker should download the input video file if it isn't downloaded already in /data/input/, run ffmpeg with the arguments specified in .env.override, and output an audio file in /data/output/. You can also configure the transfer of the output to an S3 bucket.
If you want to test the input file download, we recommend deleting the /data/input/ folder (NOT the /data folder).
The variables unique to this worker affect the output and are the following:
-
AE_SAMPLERATE_HZ: The sampling rate of the resulting audio file. Default value is0which means the sampling rate of the input video file will be used -
AE_FILE_EXTENSION: The file extension of the output audio file. Default iswav -
AE_CONVERT_TO_MONO: Whether the audio output should be converted to mono format. Defaults ton(no/False)
They can all be modified through the .env.override file and the full list of variables can be found in .env.
You can find an example video input file in /data/input/ and the resutlting audio output file in /data/output/.
The pipeline is as follows:
./run.sh/docker compose up -> main.py
main.py checks if the configuration is correct and, if so, runs the pipeline
main.py -> run_pipeline.py
run_pipeline.py makes sure each step of the pipeline is executed successfully:
- Downloading the input file if it's not present ->
download.py - Running the audio extraction of the input ->
transcode.py - Transferring the output to S3 if configured