This document provides an overview of the data preprocessing steps required for training.
We will cover the following key steps:
- Frame Cropping and matting: Extracting frames from videos, cropping images, and parsing the foreground.
- Albedo Extraction: Estimating albedo information.
- Keypoints Detection and Tracking: Estimating keypoints and tracking FLAME parameters across frames.
Note: For the INSTA dataset (Download Here), this step is not required, as the dataset already provides cropped and matted images. Additionally, when comparing different methods, make sure to use the same images and masks for consistency.
-
Download the face-parsing pre-trained model 79999_iter.pth from:
Download Here -
Place the downloaded file into the following directory:
preprocess/submodules/face-parsing.PyTorch/res/cp/ -
Download the RobustVideoMatting pre-trained model from: Download Here
-
Place the downloaded file into the following directory:
preprocess/submodules/RobustVideoMatting
fps=30 # Adjust according to the actual video frame rate
resize=512 # Desired image size
python preprocess/crop_and_matting.py \
--source $data_path_dir \
--name $data_name \
--fps $fps \
--image_size $resize $resize \
--matting \
--crop_image \
--mask_clothes TrueWe use Intrinsic Anything to extract albedo.
Note: For videos with local lighting, albedo extraction is required (e.g., the HDTF dataset used in our paper). Videos captured under uniform lighting conditions do not require albedo extraction.
- Download the pretrained weights for albedo extraction from Hugging Face.
- Download the files in the directory
assets/intrinsic_anything/albedo.
The folder structure should look like this:
intrinsic_anything
└── albedo
├── checkpoints
│ └── last.ckpt
└── configs
└── albedo_project.yamlbase_dir=/path/to/subject
python preprocess/submodules/IntrinsicAnything/inference.py \
--input_dir $base_dir/image \
--model_dir assets/intrinsic_anything/albedo \
--output_dir $base_dir/albedo \
--ddim 100 --batch_size 10 --image_interval 3This facial tracking process is primarily based on IMAvatar, with some modifications. This is also the pre-tracking method used in our paper.
- Download deca_model.tar and the
generic_model.pklfrom FLAME2020 (rename it togeneric_model2020.pkl). - Place them in the
preprocess/submodules/DECA/datafolder.
1. DECA FLAME Parameter Estimation
Navigate to the DECA directory and run the demo_reconstruct.py script for FLAME parameter estimation:
cd preprocess/submodules/DECA
base_dir=/path/to/subject
python demos/demo_reconstruct.py \
-i $base_dir/image \
--savefolder $base_dir/deca \
--saveCode True \
--saveVis False \
--sample_step 1 \
--render_orig False2.Face Landmark Detection
cd ../..
python keypoint_detector.py --path $base_dir3.Iris Segmentation with FDLite
python iris.py --path $base_dir4.Fit FLAME Parameters
fx=1539.67462
fy=1508.93280
cx=261.442628
cy=253.231895
resize=512
python optimize.py --path $base_dir \
--cx $cx --cy $cy --fx $fx --fy $fy \
--size $resize --n_shape 100 \
--n_expr 100 --with_translation
# Add --shape_from $another_subject_dir if sharing shape parametersWe would like to express our gratitude to the following open-source repositories and datasets that greatly contributed to this project:
- INSTA for providing preprocessed datasets.
- Intrinsic Anything for providing albedo extraction tools.
- IMAvatar for providing the basis for our facial tracking method.
- DECA: For enabling robust FLAME parameter estimation.
- FLAME for supplying the FLAME model for facial parameter estimation.
- RobustVideoMatting: For providing a video matting model, which we used for background removal.
- face-parsing.PyTorch: For enabling semantic face parsing, which was essential in data preprocessing.
We also thank the authors and contributors of the tools and models we used throughout this research.