Skip to content

PyTorch Implementation of "PICS: Pairwise Image Compositing with Spatial Interactions", ICLR 2026

Notifications You must be signed in to change notification settings

RyanHangZhou/PICS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PICS: Pairwise Image Compositing with Spatial Interactions

Check out our Project Page for more visual demos!

⏩ Updates

02/08/2026

  • Release training and inference code.
  • Release training data.

🚧 TODO List

  • Release training and inference code
  • Release datasets (LVIS, Objects365, etc. in WebDataset format)
  • Release pretrained models (coming soon)
  • Release any-object compositing code

📦 Installation

Prerequisites

  • System: Linux (Tested on Ubuntu 20.04/22.04).
  • Hardware:
    • GPU: NVIDIA GPU with at least 40GB VRAM (e.g., A6000, A100, H100).
    • RAM: Minimum 64GB system memory recommended.
  • Software:
    • Conda is recommended.
    • Python 3.10 or higher.

Environment setup

Create a new conda environment named PICS and install the dependencies:

conda env create --file=PICS.yml
conda activate PICS

Weights preparation

  1. DINOv2: Download ViT-g/14 and place it at: checkpoints/dinov2_vitg14_pretrain.pth
  2. PICS Checkpoints: (Links will be updated once uploaded to Google Drive/Hugging Face).

🤖 Pretrained Models

Coming soon! We are currently finalizing the model weights for public release.

📚 Dataset

Our training set is a mixture of LVIS, VITON-HD, Objects365, Cityscapes, Mapillary Vistas and BDD100K. We provide the processed two-object compositing data in WebDataset format (.tar shards) below:

Model #Sample Size Download
LVIS 34,160 7.98GB Download
VITON-HD 11,647 2.53GB Download
Objects365 940,764 243GB Download
Cityscapes 536 1.21GB Download
Mapillary Vistas 603 582MB Download
BDD100K 1,012 204MB Download

Data organization

PICS/
├── data/
    ├── train/
        ├── LVIS/
            ├── 00000.tar
            ├── ...
        ├── VITONHD/
        ├── Objects365/
        ├── Cityscapes/
        ├── MapillaryVistas/
        ├── BDD100K/

Data preparation instruction

We provide a script using SAM to extract high-quality object silhouettes for the Objects365 dataset. To process a specific range of data shards, run:

python scripts/annotate_sam.py --is_train --index_low 00000 --index_high 10000

To process raw data (e.g., LVIS), run the following command. Replace /path/to/raw_data with your actual local data path:

python -m datasets.lvis \
    --dataset_dir "/path/to/raw_data" \
    --construct_dataset_dir "data/train/LVIS" \
    --area_ratio 0.02 \
    --is_build_data \
    --is_train

Training

To train a model on the whole dataset:

python run_train.py \
    --root_dir 'LOGS/whole_data' \
    --batch_size 16 \
    --logger_freq 1000 \
    --is_joint

Inference

python run_test.py \
    --input "sample" \
    --output "results/sample" \
    --obj_thr 2

⚖️ License

This project is licensed under the terms of the MIT license.

📜 Citation

About

PyTorch Implementation of "PICS: Pairwise Image Compositing with Spatial Interactions", ICLR 2026

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors