This repository contains two scripts for building a facial video clip database. The first script (yt-download.py) downloads YouTube videos in parallel and splits them into smaller clips for efficient processing. The second script (face-extraction.py) processes the clips to extract facial regions, compute embeddings, and save the data in a structured format, including metadata.
The CSV list we used to create our dataset in the paper Anchored Diffusion for Video Face Reenactment is available here.
-
yt-download.py:- Downloads YouTube videos based on a list of video IDs provided in a CSV file.
- Splits videos into smaller clips of configurable duration.
- Supports parallel downloads for faster processing.
- Saves metadata about the downloaded videos.
-
face-extraction.py:- Processes video clips to detect and extract facial regions.
- Computes CLIP embeddings, IQA scores, and other quality metrics for each clip.
- Extracts audio tracks and optionally converts facial frames into LMDB format.
- Saves structured metadata as CSV and pickle files.
-
Clone the repository:
git clone <repository_url> cd <repository_directory>
-
Create a conda environment from the provided
environment.yamlfile:conda env create -f environment.yaml conda activate yt-scraper
-
Ensure the following tools are installed:
- FFmpeg: Used for audio and video processing.
- CUDA Toolkit (if using GPU acceleration).
Use the yt-download.py script to download and split YouTube videos into smaller clips.
Command:
python yt-download.py --urls <path_to_csv> --records-dir <output_directory> --clip-duration <clip_duration_in_minutes> --num-videos <number_of_videos_to_download> --num-processes <number_of_parallel_processes>Example:
python yt-download.py --urls urls/faces/yt-@Oscars.csv --records-dir ./downloads --clip-duration 1 --num-videos 10 --num-processes 4Inputs:
--urls: Path to the CSV file containing YouTube video IDs (must have avideo_idcolumn).--records-dir: Directory to save downloaded videos and clips.--clip-duration: Duration of each split clip in minutes.--num-videos: (Optional) Limit the number of videos to download.--num-processes: Number of parallel processes for downloading and splitting.
Outputs:
- Downloaded videos saved in the specified directory.
- Clips saved in subdirectories by video ID.
- Metadata CSV file summarizing the download process.
Use the face-extraction.py script to process the video clips, extract facial regions, and compute metrics.
Command:
python face-extraction.py --input-dir <input_directory> --output-dir <output_directory> --cuda-devices <list_of_cuda_devices> --num-processes <number_of_parallel_processes> [--make-lmdb]Example:
python face-extraction.py --input-dir ./downloads --output-dir ./processed_faces --cuda-devices cuda:0 cuda:1 --num-processes 4 --make-lmdbInputs:
--input-dir: Directory containing video clips (generated byyt-download.py).--output-dir: Directory to save processed data, including extracted clips and metadata.--cuda-devices: List of CUDA devices to use for processing (e.g.,cuda:0 cuda:1).--num-processes: Number of parallel processes.--make-lmdb: (Optional) Convert extracted facial frames into LMDB format.
Outputs:
- Extracted facial clips saved in the output directory.
- LMDB files (if
--make-lmdbis specified). - Metadata saved as
metadata.csvandmetadata.pkl.
Both scripts generate metadata files summarizing their respective processes.
-
yt-download.pyMetadata:video_id: YouTube video ID.downloaded: Boolean indicating download success.failed: Boolean indicating download failure.path: Relative path to the downloaded video.num_clips: Number of clips generated.
-
face-extraction.pyMetadata:file_original: Original video file path.file_relative: Relative path to the extracted clip.lmdb_file: Path to the LMDB file (if created).audio_file: Path to the extracted audio file.clip_score: CLIP score for consistency.hyperiqa_score: HyperIQA score for quality assessment.clipiqa+_score: CLIP-IQA+ score for quality assessment.- Additional fields for frame ranges, dimensions, duration, and resolution.
If you find our code useful in your research or applications, please consider citing our paper:
@article{kligvasser2024anchored,
title={Anchored diffusion for video face reenactment},
author={Kligvasser, Idan and Cohen, Regev and Leifman, George and Rivlin, Ehud and Elad, Michael},
journal={arXiv preprint arXiv:2407.15153},
year={2024}
}This helps us track the impact of our work and motivates us to continue contributing to the community. Thank you for your support!
