PICS: Pairwise Image Compositing with Spatial Interactions

Check out our Project Page for more visual demos!

⏩ Updates

02/08/2026

Release training and inference code.
Release training data.

🚧 TODO List

Release training and inference code
Release datasets (LVIS, Objects365, etc. in WebDataset format)
Release pretrained models (coming soon)
Release any-object compositing code

📦 Installation

Prerequisites

System: Linux (Tested on Ubuntu 20.04/22.04).
Hardware:
- GPU: NVIDIA GPU with at least 40GB VRAM (e.g., A6000, A100, H100).
- RAM: Minimum 64GB system memory recommended.
Software:
- Conda is recommended.
- Python 3.10 or higher.

Environment setup

Create a new conda environment named PICS and install the dependencies:

conda env create --file=PICS.yml
conda activate PICS

Weights preparation

DINOv2: Download ViT-g/14 and place it at: checkpoints/dinov2_vitg14_pretrain.pth
PICS Checkpoints: (Links will be updated once uploaded to Google Drive/Hugging Face).

🤖 Pretrained Models

Coming soon! We are currently finalizing the model weights for public release.

📚 Dataset

Our training set is a mixture of LVIS, VITON-HD, Objects365, Cityscapes, Mapillary Vistas and BDD100K. We provide the processed two-object compositing data in WebDataset format (.tar shards) below:

Model	#Sample	Size	Download
LVIS	34,160	7.98GB	Download
VITON-HD	11,647	2.53GB	Download
Objects365	940,764	243GB	Download
Cityscapes	536	1.21GB	Download
Mapillary Vistas	603	582MB	Download
BDD100K	1,012	204MB	Download

Data organization

PICS/
├── data/
    ├── train/
        ├── LVIS/
            ├── 00000.tar
            ├── ...
        ├── VITONHD/
        ├── Objects365/
        ├── Cityscapes/
        ├── MapillaryVistas/
        ├── BDD100K/

Data preparation instruction

We provide a script using SAM to extract high-quality object silhouettes for the Objects365 dataset. To process a specific range of data shards, run:

python scripts/annotate_sam.py --is_train --index_low 00000 --index_high 10000

To process raw data (e.g., LVIS), run the following command. Replace /path/to/raw_data with your actual local data path:

python -m datasets.lvis \
    --dataset_dir "/path/to/raw_data" \
    --construct_dataset_dir "data/train/LVIS" \
    --area_ratio 0.02 \
    --is_build_data \
    --is_train

Training

To train a model on the whole dataset:

python run_train.py \
    --root_dir 'LOGS/whole_data' \
    --batch_size 16 \
    --logger_freq 1000 \
    --is_joint

Inference

python run_test.py \
    --input "sample" \
    --output "results/sample" \
    --obj_thr 2

⚖️ License

This project is licensed under the terms of the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
assets		assets
cldm		cldm
configs		configs
datasets		datasets
dinov2		dinov2
ldm		ldm
sample/pen_penholder		sample/pen_penholder
scripts		scripts
util		util
.gitignore		.gitignore
PICS.yml		PICS.yml
Readme.md		Readme.md
run_test.py		run_test.py
run_train.py		run_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PICS: Pairwise Image Compositing with Spatial Interactions

⏩ Updates

🚧 TODO List

📦 Installation

Prerequisites

Environment setup

Weights preparation

🤖 Pretrained Models

📚 Dataset

Data organization

Data preparation instruction

Training

Inference

⚖️ License

📜 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

RyanHangZhou/PICS

Folders and files

Latest commit

History

Repository files navigation

PICS: Pairwise Image Compositing with Spatial Interactions

⏩ Updates

🚧 TODO List

📦 Installation

Prerequisites

Environment setup

Weights preparation

🤖 Pretrained Models

📚 Dataset

Data organization

Data preparation instruction

Training

Inference

⚖️ License

📜 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages