This script is used to extract the captions from each video. The results of this can be found here.
This the YouTube Playlist: https://www.youtube.com/playlist?list=PLkVbIsAWN2lsHdY7ldAAgtJug50pRNQv0
docker run -t \
--env CAPTIONS_REPO=github.com/[username]/youtube-captions.git \
--env GIT_USERNAME=[username] \
--env GIT_TOKEN=[git_token] \
--env GIT_EMAIL=[email] \
--env GOOGLE_API_KEY=[google_api_key] \
repairmanual/caption-extractor:latestSet env vars:
CAPTIONS_REPO=https://github.com/marosoft/youtube-captions.git
REPO_PATH=/tmp/out
GOOGLE_API_KEY=[YOUR_API_KEY]Invoke the script:
./process-new-videos.sh
Build image:
docker build -t caption-extractor .Run:
docker run -t \
--env CAPTIONS_REPO=github.com/[username]/youtube-captions.git \
--env GIT_USERNAME=[username] \
--env GIT_TOKEN=[git_token] \
--env GIT_EMAIL=[email] \
--env GOOGLE_API_KEY=[google_api_key] \
caption-extractor- Create a virtual environment (e.g.
python -m venv .env) - Activate the virtual environment (
source .env/bin/activate) pip install -r requirements.txt
It is used to retrieve the list of videos from the playlist. Obtain the API key by using Google Developer Console.
It allows 10000 units per day. https://developers.google.com/youtube/v3/getting-started#quota https://developers.google.com/youtube/v3/docs/playlistItems/list
export GOOGLE_API_KEY=[YOUR_API_KEY]
python process-new-videos.py -p PLkVbIsAWN2lsHdY7ldAAgtJug50pRNQv0 -o out