Skip to content

iLearn-Lab/ICASSP26-HINT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

(ICASSP 2026) HINT: Composed Image Retrieval with Dual-Path Compositional Contextualized Network

MingyuΒ Zhang1, ZixuΒ Li1, ZhiweiΒ Chen1, ZhihengΒ Fu1, XiaoweiΒ Zhu1, JiajiaΒ Nie1, YinweiΒ Wei1 YupengΒ Hu1βœ‰,
1School of Software, Shandong University Β Β Β 
βœ‰Β Corresponding authorΒ Β 

ICASSP 2026 arXiv Paper page Author Page PyTorch Python stars

Accepted by ICASSP 2026: A novel contextualized network tackling the neglect of contextual information in Composed Image Retrieval (CIR) by amplifying similarity differences between matching and non-matching samples.

πŸ“Œ Introduction

HINT (dual-patH composItional coNtextualized neTwork) is our proposed framework for Composed Image Retrieval (CIR), accepted by ICASSP 2026. Although existing methods have made significant progress, they often neglect contextual information in discriminating matching samples. To address the implicit dependencies and the lack of a differential amplification mechanism, HINT systematically models contextual structure to improve the upper performance of CIR models in complex scenarios.

⬆ Back to top

πŸ“’ News

  • [2026-03-26] πŸš€ Initial setup for the HINT repository. Source code is scheduled for release in April 2026.
  • [2026-01-18] πŸ”₯ Our paper "HINT: COMPOSED IMAGE RETRIEVAL WITH DUAL-PATH COMPOSITIONAL CONTEXTUALIZED NETWORK" has been accepted by ICASSP 2026!

⬆ Back to top

✨ Key Features

  • 🧠 Dual Context Extraction (DCE): Extracts both intra-modal context and cross-modal context, enhancing joint semantic representation by integrating multimodal contextual information.
  • πŸ“ Quantification of Contextual Relevance (QCR): Evaluates the relevance between cross-modal contextual information and the target image semantics, enabling the quantification of implicit dependencies.
  • πŸ›‘οΈ Dual-Path Consistency Constraints (DPCC): Optimizes the training process by constraining the representation consistency between multimodal fusion features and the target, ensuring the stable enhancement of similarity for matching instances while lowering the similarity for non-matching instances.
  • πŸ† Outstanding Performance: Achieves competitive results on major metrics across two CIR benchmark datasets, FashionIQ and CIRR, demonstrating strong cross-domain generalization ability.

⬆ Back to top

πŸ—οΈ Architecture

HINT architecture

Figure 1. HINT framework consists of three modules: (a) Dual Context Extraction, (b) Quantification of Contextual Relevance, (c) Dual-Path Consistency Constraints.

⬆ Back to top

πŸƒβ€β™‚οΈ Experiment-Results

CIR Task Performance

Experimental Results

Table 1. Performance comparison on FashionIQ and CIRR datasets. HINT achieves a notable relative increase of approximately 9.74% in average R@10 on FashionIQ, and a 1.74% improvement in R@1 on the CIRR test set.

Experimental Results on FashionIQ and CIRR

⬆ Back to top


Table of Contents


πŸ“¦ Install

1. Clone the repository

git clone https://github.com/zh-mingyu.github.io/HINT.git
cd HINT

2. Setup Python Environment

The code is evaluated on Python 3.8.10 and CUDA 12.6. We recommend using Anaconda:

conda create -n habit python=3.8
conda activate habit

# Install PyTorch (The evaluated environment uses Torch 2.1.0 with CUDA 12.1 compatibility)
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url [https://download.pytorch.org/whl/cu121](https://download.pytorch.org/whl/cu121)

# Install core dependencies
pip install open-clip-torch==2.24.0 scikit-learn==1.3.2 transformers==4.25.0 salesforce-lavis==1.0.2 timm==0.9.16

⬆ Back to top


πŸ“‚ Data Preparation

We evaluated our framework on two standard datasets: FashionIQ and CIRR. Please download the datasets first.

Click to expand: FashionIQ Dataset Directory Structure

Please follow the official instructions to download the FashionIQ dataset. Once downloaded, ensure the folder structure looks like this:

β”œβ”€β”€ FashionIQ
β”‚   β”œβ”€β”€ captions
β”‚   β”‚   β”œβ”€β”€ cap.dress.[train | val | test].json
β”‚   β”‚   β”œβ”€β”€ cap.toptee.[train | val | test].json
β”‚   β”‚   β”œβ”€β”€ cap.shirt.[train | val | test].json
β”‚   β”œβ”€β”€ image_splits
β”‚   β”‚   β”œβ”€β”€ split.dress.[train | val | test].json
β”‚   β”‚   β”œβ”€β”€ split.toptee.[train | val | test].json
β”‚   β”‚   β”œβ”€β”€ split.shirt.[train | val | test].json
β”‚   β”œβ”€β”€ dress
β”‚   β”‚   β”œβ”€β”€ [B000ALGQSY.jpg | B000AY2892.jpg | B000AYI3L4.jpg |...]
β”‚   β”œβ”€β”€ shirt
β”‚   β”‚   β”œβ”€β”€ [B00006M009.jpg | B00006M00B.jpg | B00006M6IH.jpg | ...]
β”‚   β”œβ”€β”€ toptee
β”‚   β”‚   β”œβ”€β”€ [B0000DZQD6.jpg | B000A33FTU.jpg | B000AS2OVA.jpg | ...]
Click to expand: CIRR Dataset Directory Structure

Please follow the official instructions to download the CIRR dataset. Once downloaded, ensure the folder structure looks like this:

β”œβ”€β”€ CIRR
β”‚   β”œβ”€β”€ train
β”‚   β”‚   β”œβ”€β”€ [0 | 1 | 2 | ...]
β”‚   β”‚   β”‚   β”œβ”€β”€ [train-10108-0-img0.png | train-10108-0-img1.png | ...]
β”‚   β”œβ”€β”€ dev
β”‚   β”‚   β”œβ”€β”€ [dev-0-0-img0.png | dev-0-0-img1.png | ...]
β”‚   β”œβ”€β”€ test1
β”‚   β”‚   β”œβ”€β”€ [test1-0-0-img0.png | test1-0-0-img1.png | ...]
β”‚   β”œβ”€β”€ cirr
β”‚   β”œβ”€β”€ captions
β”‚   β”‚   β”œβ”€β”€ cap.rc2.[train | val | test1].json
β”‚   β”œβ”€β”€ image_splits
β”‚   β”‚   β”œβ”€β”€ split.rc2.[train | val | test1].json

⬆ Back to top


πŸš€ Quick Start

1. Training

Our model is trained using the AdamW optimizer. The hyper-parameter $\lambda$ for the loss function is set to 0.2.

Training on FashionIQ:

python train.py \
    --dataset fashioniq \
    --fashioniq_path "/path/to/FashionIQ/" \
    --model_dir "./checkpoints/fashioniq_hint" \
    --batch_size 128 \
    --num_epochs 10 \
    --lr 2e-5

Training on CIRR:

python train.py \
    --dataset cirr \
    --cirr_path "/path/to/CIRR/" \
    --model_dir "./checkpoints/cirr_hint" \
    --batch_size 128 \
    --num_epochs 10 \
    --lr 2e-5

πŸ’‘ Tips: > - Our model is based on the powerful BLIP-2 architecture. It is highly recommended to run the training on GPUs with sufficient memory (e.g., NVIDIA A40 48G / V100 32G).

  • The best model weights and evaluation metrics generated during training will be automatically saved in the best_model.pt and metrics_best.json files within your specified --model_dir.

2. Testing

To generate the prediction files on the CIRR dataset for submission to the CIRR Evaluation Server, run the testing script:

python src/cirr_test_submission.py checkpoints/cirr_hint/

(The corresponding script will automatically output .json based on the generated best checkpoints in the folder for online evaluation.)

⬆ Back to top


πŸ“ Project Structure

Our code is deeply customized based on the LAVIS framework. The core implementations are centralized in the following files:

HINT/
β”œβ”€β”€ lavis/
β”‚   β”œβ”€β”€ models/
β”‚   β”‚   └── blip2_models/
β”‚   β”‚       └── HINT.py      # 🧠 Core model implementation: Includes DCE, QCR and DPCC modules
β”œβ”€β”€ train.py                  # πŸš€ Training entry point: Controls noise_ratio injection and training loops
β”œβ”€β”€ datasets.py 
β”œβ”€β”€ test.py 
β”œβ”€β”€ utils.py 
β”œβ”€β”€ data_utils.py 
β”œβ”€β”€ cirr_test_submission.py   # Auxiliary scripts
β”œβ”€β”€ datasets/                 # Dataset loading and processing logic
└── README.md

🀝 Acknowledgement

The implementation of this project utilizes the pre-trained vision-language features from BLIP-2 and references the LAVIS framework. We express our sincere gratitude to these open-source contributions!

⬆ Back to top

βœ‰οΈ Contact

For any questions, issues, or feedback, please open an issue on GitHub or reach out to us at mingyuzhang@mail.sdu.edu.cn.

⬆ Back to top

πŸ”— Related Projects

Ecosystem & Other Works from our Team

ConeSep
ConeSep (CVPR'26)
Web | Code |
Air-Know
Air-Know (CVPR'26)
Web | Code |
ReTrack
ReTrack (AAAI'26)
Web | Code | Paper
INTENT
INTENT (AAAI'26)
Web | Code | Paper
HUD
HUD (ACM MM'25)
Web | Code | Paper
OFFSET
OFFSET (ACM MM'25)
Web | Code | Paper
ENCODER
ENCODER (AAAI'25)
Web | Code | Paper
HABIT
HABIT (AAAI'26)
Web | Code | Paper

πŸ“β­οΈ Citation

If you find our work or this code useful in your research, please consider leaving a Star⭐️ or CitingπŸ“ our paper πŸ₯°. Your support is our greatest motivation!

@inproceedings{HINT2026,
  title={HINT: COMPOSED IMAGE RETRIEVAL WITH DUAL-PATH COMPOSITIONAL CONTEXTUALIZED NETWORK},
  author={Zhang, Mingyu and Li, Zixu and Chen, Zhiwei and Fu, Zhiheng and Zhu, Xiaowei and Nie, Jiajia and Wei, Yinwei and Hu, Yupeng},
  booktitle={Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2026}
}

⬆ Back to top

About

[ICASSP 2026] Official repository of ICASSP 2026 - HINT: Composed Image Retrieval with dual-patH composItional coNtextualized neTwork.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages