HAMMER: Harnessing MLLMs via Cross-Modal Integration for Intention-Driven 3D Affordance Grounding

CVPR 2026

Lei Yao, Yong Chen, Yuejiao Su, Yi Wang, Moyun Liu, Lap-Pui Chau

Feel free to contact us if you have any questions regarding the code. You can reach us via email or open an issue.

📝 To-Do List

Environment installation instructions.
Instructions for processing dataset.
Release trained weights.
Release training code.
Release evaluation code.

🌟 Pipeline

🪜 Installation

# For CUDA 11.8
conda create -n hammer python==3.10
conda activate hammer

# requirements.txt is generated by wandb; 
# if issues occur, manually install missing packages or contact the author.
pip install -r requirements.txt

# Alternative method to install pointnet2_ops and KNN_CUDA
pip install "git+https://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"
pip install --upgrade https://github.com/unlimblue/KNN_CUDA/releases/download/0.2/KNN_CUDA-0.2-py3-none-any.whl


#--------------------------------------#
# For CUDA 12.8
conda create -n hammer python==3.12
conda activate hammer

pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128
pip install wandb tensorboard peft deepspeed scikit-learn
pip install transformer==4.54.1

🔍 Data Preprocessing

We provide the preprocessed data for PIADv1, PIADv2 and PIADv1-C datasets. Please download the preprocessed data from hammer and place them in the data folder.

The dataset structure should be as follows:

├── PIADv1/
│   ├── Seen/
│   │   ├── Img/
│   │   ├── Point/
│   │   ├── Img_train.txt
│   │   ├── Img_test.txt
│   │   ├── Point_Extracted_train.txt
│   │   ├── Point_test.txt
│   ├── Unseen/
├── PIADv2/
│   ├── Seen/
│   │   ├── Img/
│   │   ├── Point/
│   │   ├── Img_train.txt
│   │   ├── Img_test.txt
│   │   ├── Img_val.txt
│   │   ├── Point_train.txt
│   │   ├── Point_test.txt
│   │   ├── Point_val.txt

We also provide the preprocess code for PIADv1 train split in tools/preprocess_piadv1.py. We use a different dataset logic for PIADv2 compared to GREAT, you can refer to src/utils/afford_dataset.py for the difference.

🚀 Training

# -d for datasets, -p for splits, -g for GPU nums, 
# -l for learning rate, -e for epoches, -n for name
# for more details, see scripts/train.sh
bash scripts/train.sh -d PIADv1 -p Seen -g 4 -b 64 -l 0.0001 -e 30 -n exp1

🛸 Inference

bash scripts/eval.sh

📚 License

This repository is released under the MIT license.

👏 Acknowledgement

The research work described in this paper was conducted in the JC STEM Lab of Machine Learning and Computer Vision funded by The Hong Kong Jockey Club Charities Trust. This research received partially support from the Global STEM Professorship Scheme from the Hong Kong Special Administrative Region.

Our code is primarily built upon GLOVER, VLMSAM and GREAT.

📝 Citation

@inproceedings{yao2026hammer,
  title={HAMMER: Harnessing MLLMs via Cross-Modal Integration for Intention-Driven 3D Affordance Grounding},
  author={Yao, Lei and Chen, Yong and Su, Yuejiao and Wang, Yi and Liu, Moyun and Chau, Lap-Pui},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
assets		assets
docs		docs
scripts		scripts
src		src
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HAMMER: Harnessing MLLMs via Cross-Modal Integration for Intention-Driven 3D Affordance Grounding

📝 To-Do List

🌟 Pipeline

🪜 Installation

🔍 Data Preprocessing

🚀 Training

🛸 Inference

📚 License

👏 Acknowledgement

📝 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

HAMMER: Harnessing MLLMs via Cross-Modal Integration for Intention-Driven 3D Affordance Grounding

📝 To-Do List

🌟 Pipeline

🪜 Installation

🔍 Data Preprocessing

🚀 Training

🛸 Inference

📚 License

👏 Acknowledgement

📝 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 1

Languages

Packages