(ICASSP 2026) HINT: Composed Image Retrieval with Dual-Path Compositional Contextualized Network

Mingyu Zhang¹, Zixu Li¹, Zhiwei Chen¹, Zhiheng Fu¹, Xiaowei Zhu¹, Jiajia Nie¹, Yinwei Wei¹ Yupeng Hu^1✉,

¹School of Software, Shandong University
^✉Corresponding author

Accepted by ICASSP 2026: A novel contextualized network tackling the neglect of contextual information in Composed Image Retrieval (CIR) by amplifying similarity differences between matching and non-matching samples.

📌 Introduction

HINT (dual-patH composItional coNtextualized neTwork) is our proposed framework for Composed Image Retrieval (CIR), accepted by ICASSP 2026. Although existing methods have made significant progress, they often neglect contextual information in discriminating matching samples. To address the implicit dependencies and the lack of a differential amplification mechanism, HINT systematically models contextual structure to improve the upper performance of CIR models in complex scenarios.

⬆ Back to top

📢 News

[2026-03-26] 🚀 Initial setup for the HINT repository. Source code is scheduled for release in April 2026.
[2026-01-18] 🔥 Our paper "HINT: COMPOSED IMAGE RETRIEVAL WITH DUAL-PATH COMPOSITIONAL CONTEXTUALIZED NETWORK" has been accepted by ICASSP 2026!

⬆ Back to top

✨ Key Features

🧠 Dual Context Extraction (DCE): Extracts both intra-modal context and cross-modal context, enhancing joint semantic representation by integrating multimodal contextual information.
📏 Quantification of Contextual Relevance (QCR): Evaluates the relevance between cross-modal contextual information and the target image semantics, enabling the quantification of implicit dependencies.
🛡️ Dual-Path Consistency Constraints (DPCC): Optimizes the training process by constraining the representation consistency between multimodal fusion features and the target, ensuring the stable enhancement of similarity for matching instances while lowering the similarity for non-matching instances.
🏆 Outstanding Performance: Achieves competitive results on major metrics across two CIR benchmark datasets, FashionIQ and CIRR, demonstrating strong cross-domain generalization ability.

⬆ Back to top

🏗️ Architecture

Figure 1. HINT framework consists of three modules: (a) Dual Context Extraction, (b) Quantification of Contextual Relevance, (c) Dual-Path Consistency Constraints.

⬆ Back to top

🏃‍♂️ Experiment-Results

CIR Task Performance

Experimental Results

Table 1. Performance comparison on FashionIQ and CIRR datasets. HINT achieves a notable relative increase of approximately 9.74% in average R@10 on FashionIQ, and a 1.74% improvement in R@1 on the CIRR test set.

⬆ Back to top

📌 Introduction
📢 News
✨ Key Features
🏗️ Architecture
🏃‍♂️ Experiment-Results
- CIR Task Performance
  - Experimental Results
Table of Contents
📦 Install
📂 Data Preparation
🚀 Quick Start
- 1. Training
- 2. Testing
📁 Project Structure
🤝 Acknowledgement
✉️ Contact
🔗 Related Projects
📝⭐️ Citation

📦 Install

1. Clone the repository

git clone https://github.com/zh-mingyu.github.io/HINT.git
cd HINT

2. Setup Python Environment

The code is evaluated on Python 3.8.10 and CUDA 12.6. We recommend using Anaconda:

conda create -n habit python=3.8
conda activate habit

# Install PyTorch (The evaluated environment uses Torch 2.1.0 with CUDA 12.1 compatibility)
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url [https://download.pytorch.org/whl/cu121](https://download.pytorch.org/whl/cu121)

# Install core dependencies
pip install open-clip-torch==2.24.0 scikit-learn==1.3.2 transformers==4.25.0 salesforce-lavis==1.0.2 timm==0.9.16

⬆ Back to top

📂 Data Preparation

We evaluated our framework on two standard datasets: FashionIQ and CIRR. Please download the datasets first.

Click to expand: FashionIQ Dataset Directory Structure

Please follow the official instructions to download the FashionIQ dataset. Once downloaded, ensure the folder structure looks like this:

├── FashionIQ
│   ├── captions
│   │   ├── cap.dress.[train | val | test].json
│   │   ├── cap.toptee.[train | val | test].json
│   │   ├── cap.shirt.[train | val | test].json
│   ├── image_splits
│   │   ├── split.dress.[train | val | test].json
│   │   ├── split.toptee.[train | val | test].json
│   │   ├── split.shirt.[train | val | test].json
│   ├── dress
│   │   ├── [B000ALGQSY.jpg | B000AY2892.jpg | B000AYI3L4.jpg |...]
│   ├── shirt
│   │   ├── [B00006M009.jpg | B00006M00B.jpg | B00006M6IH.jpg | ...]
│   ├── toptee
│   │   ├── [B0000DZQD6.jpg | B000A33FTU.jpg | B000AS2OVA.jpg | ...]

Click to expand: CIRR Dataset Directory Structure

Please follow the official instructions to download the CIRR dataset. Once downloaded, ensure the folder structure looks like this:

├── CIRR
│   ├── train
│   │   ├── [0 | 1 | 2 | ...]
│   │   │   ├── [train-10108-0-img0.png | train-10108-0-img1.png | ...]
│   ├── dev
│   │   ├── [dev-0-0-img0.png | dev-0-0-img1.png | ...]
│   ├── test1
│   │   ├── [test1-0-0-img0.png | test1-0-0-img1.png | ...]
│   ├── cirr
│   ├── captions
│   │   ├── cap.rc2.[train | val | test1].json
│   ├── image_splits
│   │   ├── split.rc2.[train | val | test1].json

⬆ Back to top

🚀 Quick Start

1. Training

Our model is trained using the AdamW optimizer. The hyper-parameter $\lambda$ for the loss function is set to 0.2.

Training on FashionIQ:

python train.py \
    --dataset fashioniq \
    --fashioniq_path "/path/to/FashionIQ/" \
    --model_dir "./checkpoints/fashioniq_hint" \
    --batch_size 128 \
    --num_epochs 10 \
    --lr 2e-5

Training on CIRR:

python train.py \
    --dataset cirr \
    --cirr_path "/path/to/CIRR/" \
    --model_dir "./checkpoints/cirr_hint" \
    --batch_size 128 \
    --num_epochs 10 \
    --lr 2e-5

💡 Tips: > - Our model is based on the powerful BLIP-2 architecture. It is highly recommended to run the training on GPUs with sufficient memory (e.g., NVIDIA A40 48G / V100 32G).

The best model weights and evaluation metrics generated during training will be automatically saved in the best_model.pt and metrics_best.json files within your specified --model_dir.

2. Testing

To generate the prediction files on the CIRR dataset for submission to the CIRR Evaluation Server, run the testing script:

python src/cirr_test_submission.py checkpoints/cirr_hint/

(The corresponding script will automatically output .json based on the generated best checkpoints in the folder for online evaluation.)

⬆ Back to top

📁 Project Structure

Our code is deeply customized based on the LAVIS framework. The core implementations are centralized in the following files:

HINT/
├── lavis/
│   ├── models/
│   │   └── blip2_models/
│   │       └── HINT.py      # 🧠 Core model implementation: Includes DCE, QCR and DPCC modules
├── train.py                  # 🚀 Training entry point: Controls noise_ratio injection and training loops
├── datasets.py 
├── test.py 
├── utils.py 
├── data_utils.py 
├── cirr_test_submission.py   # Auxiliary scripts
├── datasets/                 # Dataset loading and processing logic
└── README.md

🤝 Acknowledgement

The implementation of this project utilizes the pre-trained vision-language features from BLIP-2 and references the LAVIS framework. We express our sincere gratitude to these open-source contributions!

⬆ Back to top

✉️ Contact

For any questions, issues, or feedback, please open an issue on GitHub or reach out to us at mingyuzhang@mail.sdu.edu.cn.

⬆ Back to top

🔗 Related Projects

Ecosystem & Other Works from our Team

ConeSep (CVPR'26) Web \| Code \|	Air-Know (CVPR'26) Web \| Code \|	ReTrack (AAAI'26) Web \| Code \| Paper
INTENT (AAAI'26) Web \| Code \| Paper	HUD (ACM MM'25) Web \| Code \| Paper	OFFSET (ACM MM'25) Web \| Code \| Paper
ENCODER (AAAI'25) Web \| Code \| Paper	HABIT (AAAI'26) Web \| Code \| Paper

📝⭐️ Citation

If you find our work or this code useful in your research, please consider leaving a Star⭐️ or Citing📝 our paper 🥰. Your support is our greatest motivation!

@inproceedings{HINT2026,
  title={HINT: COMPOSED IMAGE RETRIEVAL WITH DUAL-PATH COMPOSITIONAL CONTEXTUALIZED NETWORK},
  author={Zhang, Mingyu and Li, Zixu and Chen, Zhiwei and Fu, Zhiheng and Zhu, Xiaowei and Nie, Jiajia and Wei, Yinwei and Hu, Yupeng},
  booktitle={Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2026}
}

⬆ Back to top

ConeSep (CVPR'26) Web \| Code \|	Air-Know (CVPR'26) Web \| Code \|	ReTrack (AAAI'26) Web \| Code \| Paper
INTENT (AAAI'26) Web \| Code \| Paper	HUD (ACM MM'25) Web \| Code \| Paper	OFFSET (ACM MM'25) Web \| Code \| Paper
ENCODER (AAAI'25) Web \| Code \| Paper	HABIT (AAAI'26) Web \| Code \| Paper

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

(ICASSP 2026) HINT: Composed Image Retrieval with Dual-Path Compositional Contextualized Network

📌 Introduction

📢 News

✨ Key Features

🏗️ Architecture

🏃‍♂️ Experiment-Results

CIR Task Performance

Experimental Results

Table of Contents

📦 Install

📂 Data Preparation

🚀 Quick Start

1. Training

2. Testing

📁 Project Structure

🤝 Acknowledgement

✉️ Contact

🔗 Related Projects

📝⭐️ Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.idea		.idea
assets		assets
lavis		lavis
LICENSE		LICENSE
README.md		README.md
cirr_test_submission.py		cirr_test_submission.py
data_utils.py		data_utils.py
datasets.py		datasets.py
test.py		test.py
train.py		train.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

(ICASSP 2026) HINT: Composed Image Retrieval with Dual-Path Compositional Contextualized Network

📌 Introduction

📢 News

✨ Key Features

🏗️ Architecture

🏃‍♂️ Experiment-Results

CIR Task Performance

Experimental Results

Table of Contents

📦 Install

📂 Data Preparation

🚀 Quick Start

1. Training

2. Testing

📁 Project Structure

🤝 Acknowledgement

✉️ Contact

🔗 Related Projects

📝⭐️ Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages