(AAAI 2026) HABIT: Chrono-Synergia Robust Progressive Learning Framework for Composed Image Retrieval

Zixu Li¹, Yupeng Hu^1✉, Zhiwei Chen¹, Shiqi Zhang¹, Qinlei Huang¹, Zhiheng Fu¹, Yinwei Wei¹

¹School of Software, Shandong University
^✉Corresponding author

Accepted by AAAI 2026: A robust progressive learning framework tackling the Noise Triplet Correspondence (NTC) problem in Composed Image Retrieval (CIR).

📌 Introduction

HABIT (cHrono-synergiA roBust progressIve learning framework for composed image reTrieval) is our proposed robust learning framework for Composed Image Retrieval, accepted by AAAI 2026. Based on an in-depth analysis of the "Noisy Triplet Correspondence (NTC)" problem in real-world retrieval scenarios, HABIT effectively addresses the shortcomings of existing methods in precisely estimating composed semantic discrepancies and progressively adapting to modification discrepancies.

⬆ Back to top

📢 News

[2026-03-20] 🚀 Official paper is released at AAAI 2026.

[2026-03-18] 🚀 Released all codes for HABIT.
[2025-11-08] 🔥 Our paper "HABIT: Chrono-Synergia Robust Progressive Learning Framework for Composed Image Retrieval" has been accepted by AAAI 2026!

⬆ Back to top

✨ Key Features

🧠 Mutual Knowledge Estimation (MKE): Precisely quantifies sample cleanliness by computing the Transition Rate of mutual knowledge between composed features and target images, effectively identifying clean samples that align with modification semantics.
⏳ Dual-consistency Progressive Learning (DPL): Introduces a collaborative mechanism between historical and current models to simulate human habit formation (retaining good habits and calibrating bad ones), enabling robust learning against noisy data interference.
🛡️ Highly Robust to NTC: Maintains State-of-the-Art (SOTA) retrieval performance under various Noise Triplet Correspondence (NTC) settings with different noise ratios (0%, 20%, 50%, 80%).

⬆ Back to top

🏗️ Architecture

Figure 1. HABIT consists of two modules: (a) Mutual Knowledge Estimation and (b) Dual-consistency Progressive Learning.

⬆ Back to top

🏃‍♂️ Experiment-Results

CIR Task Performance

💡 Note for Fully-Supervised CIR Benchmarking:
🎯 The 0% noise setting in the table below is equivalent to the traditional fully-supervised CIR paradigm. We highlight this 0% block to facilitate direct and fair comparisons for researchers working on conventional supervised methods.

FIQ:

Table 1. Performance comparison on FashionIQ in terms of R@K (%). The best result under each noise ratio is highlighted in bold, while the second-best result is underlined.

CIRR：

Table 2. Performance comparison on the CIRR test set in terms of R@K (%) and Rsub@K (%). The best and second-best results are highlighted in bold and underlined, respectively.

⬆ Back to top

📦 Install

1. Clone the repository

git clone https://github.com/Lee-zixu/HABIT
cd HABIT

2. Setup Python Environment

The code is evaluated on Python 3.8.10 and CUDA 12.6. We recommend using Anaconda to create an isolated virtual environment:

conda create -n habit python=3.8
conda activate habit

# Install PyTorch (The evaluated environment uses Torch 2.1.0 with CUDA 12.1 compatibility)
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url [https://download.pytorch.org/whl/cu121](https://download.pytorch.org/whl/cu121)

# Install core dependencies
pip install open-clip-torch==2.24.0 scikit-learn==1.3.2 transformers==4.25.0 salesforce-lavis==1.0.2 timm==0.9.16

Note: Key dependencies include salesforce-lavis for the base architecture, open-clip-torch for vision-language features, and scikit-learn for DBSCAN clustering during Noise Discrimination.

⬆ Back to top

📂 Data Preparation

We evaluated our framework on two standard datasets: FashionIQ and CIRR. Please download the datasets first.

Click to expand: FashionIQ Dataset Directory Structure

Please follow the official instructions to download the FashionIQ dataset. Once downloaded, ensure the folder structure looks like this:

├── FashionIQ
│   ├── captions
│   │   ├── cap.dress.[train | val | test].json
│   │   ├── cap.toptee.[train | val | test].json
│   │   ├── cap.shirt.[train | val | test].json
│   ├── image_splits
│   │   ├── split.dress.[train | val | test].json
│   │   ├── split.toptee.[train | val | test].json
│   │   ├── split.shirt.[train | val | test].json
│   ├── dress
│   │   ├── [B000ALGQSY.jpg | B000AY2892.jpg | B000AYI3L4.jpg |...]
│   ├── shirt
│   │   ├── [B00006M009.jpg | B00006M00B.jpg | B00006M6IH.jpg | ...]
│   ├── toptee
│   │   ├── [B0000DZQD6.jpg | B000A33FTU.jpg | B000AS2OVA.jpg | ...]

Click to expand: CIRR Dataset Directory Structure

Please follow the official instructions to download the CIRR dataset. Once downloaded, ensure the folder structure looks like this:

├── CIRR
│   ├── train
│   │   ├── [0 | 1 | 2 | ...]
│   │   │   ├── [train-10108-0-img0.png | train-10108-0-img1.png | ...]
│   ├── dev
│   │   ├── [dev-0-0-img0.png | dev-0-0-img1.png | ...]
│   ├── test1
│   │   ├── [test1-0-0-img0.png | test1-0-0-img1.png | ...]
│   ├── cirr
│   ├── captions
│   │   ├── cap.rc2.[train | val | test1].json
│   ├── image_splits
│   │   ├── split.rc2.[train | val | test1].json

⬆ Back to top

🚀 Quick Start

1. Training under Noisy Settings

In our implementation, we introduce the noise_ratio parameter to simulate varying degrees of NTC (Noise Triplet Correspondence) interference. You can reproduce the experimental results from the paper by modifying the --noise_ratio parameter (default options evaluated are 0.0, 0.2, 0.5, 0.8).

Training on FashionIQ:

python train.py \
    --dataset fashioniq \
    --fashioniq_path "/path/to/FashionIQ/" \
    --model_dir "./checkpoints/fashioniq_noise0.2" \
    --noise_ratio 0.2 \
    --batch_size 256 \
    --num_epochs 20 \
    --lr 2e-5

Training on CIRR:

python train.py \
    --dataset cirr \
    --cirr_path "/path/to/CIRR/" \
    --model_dir "./checkpoints/cirr_noise0.5" \
    --noise_ratio 0.5 \
    --batch_size 256 \
    --num_epochs 20 \
    --lr 2e-5

💡 Tips: > - Our model is based on the powerful BLIP-2 architecture. It is highly recommended to run the training on GPUs with sufficient memory (e.g., NVIDIA A40 48G / V100 32G).

The best model weights and evaluation metrics generated during training will be automatically saved in the best_model.pt and metrics_best.json files within your specified --model_dir.

2. Testing

To generate the prediction files on the CIRR dataset for submission to the CIRR Evaluation Server, run the following command:

python src/cirr_test_submission.py checkpoints/cirr_noise0.5/

(The corresponding script will automatically output .json based on the generated best checkpoints in the folder for online evaluation.)

⬆ Back to top

📁 Project Structure

Our code is deeply customized based on the LAVIS framework. The core implementations are centralized in the following files:

HABIT/
├── lavis/
│   ├── models/
│   │   └── blip2_models/
│   │       └── HABIT.py      # 🧠 Core model implementation: Includes MKE, DPL modules, and loss functions
├── train.py                  # 🚀 Training entry point: Controls noise_ratio injection and training loops
├── datasets.py 
├── test.py 
├── utils.py 
├── data_utils.py 
├── cirr_test_submission.py   # Auxiliary scripts
├── datasets/                 # Dataset loading and processing logic
└── README.md

🤝 Acknowledgement

The implementation of this project references the LAVIS framework and the noise setting concepts from TME. We express our sincere gratitude to these open-source contributions!

⬆ Back to top

✉️ Contact

For any questions, issues, or feedback, please open an issue on GitHub or reach out to me at lizixu.cs@gmail.com.

⬆ Back to top

🔗 Related Projects

Ecosystem & Other Works from our Team

TEMA (ACL'26) Web \| Code \|	ConeSep (CVPR'26) Web \| Code \|	Air-Know (CVPR'26) Web \| Code \|
ReTrack (AAAI'26) Web \| Code \| Paper	INTENT (AAAI'26) Web \| Code \| Paper	HUD (ACM MM'25) Web \| Code \| Paper
OFFSET (ACM MM'25) Web \| Code \| Paper	ENCODER (AAAI'25) Web \| Code \| Paper

📝⭐️ Citation

If you find our work or this code useful in your research, please consider leaving a Star⭐️ or Citing📝 our paper 🥰. Your support is our greatest motivation!

@inproceedings{HABIT,
  title={HABIT: Chrono-Synergia Robust Progressive Learning Framework for Composed Image Retrieval},
  author={Li, Zixu and Hu, Yupeng and Chen, Zhiwei and Zhang, Shiqi and Huang, Qinlei and Fu, Zhiheng and Wei, Yinwei},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2026}
}

⬆ Back to top

🫡 Support & Contributing

We welcome all forms of contributions! If you have any questions, ideas, or find a bug, please feel free to:

Open an Issue for discussions or bug reports.
Submit a Pull Request to improve the codebase.

⬆ Back to top

📄 License

This project is released under the terms of the LICENSE file included in this repository.

TEMA (ACL'26) Web \| Code \|	ConeSep (CVPR'26) Web \| Code \|	Air-Know (CVPR'26) Web \| Code \|
ReTrack (AAAI'26) Web \| Code \| Paper	INTENT (AAAI'26) Web \| Code \| Paper	HUD (ACM MM'25) Web \| Code \| Paper
OFFSET (ACM MM'25) Web \| Code \| Paper	ENCODER (AAAI'25) Web \| Code \| Paper

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

(AAAI 2026) HABIT: Chrono-Synergia Robust Progressive Learning Framework for Composed Image Retrieval

📌 Introduction

📢 News

✨ Key Features

🏗️ Architecture

🏃‍♂️ Experiment-Results

CIR Task Performance

FIQ:

CIRR：

Table of Contents

📦 Install

📂 Data Preparation

🚀 Quick Start

1. Training under Noisy Settings

2. Testing

📁 Project Structure

🤝 Acknowledgement

✉️ Contact

🔗 Related Projects

📝⭐️ Citation

🫡 Support & Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
assets		assets
lavis		lavis
LICENSE		LICENSE
README.md		README.md
cirr_test_submission.py		cirr_test_submission.py
data_utils.py		data_utils.py
datasets.py		datasets.py
test.py		test.py
train.py		train.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

(AAAI 2026) HABIT: Chrono-Synergia Robust Progressive Learning Framework for Composed Image Retrieval

📌 Introduction

📢 News

✨ Key Features

🏗️ Architecture

🏃‍♂️ Experiment-Results

CIR Task Performance

FIQ:

CIRR：

Table of Contents

📦 Install

📂 Data Preparation

🚀 Quick Start

1. Training under Noisy Settings

2. Testing

📁 Project Structure

🤝 Acknowledgement

✉️ Contact

🔗 Related Projects

📝⭐️ Citation

🫡 Support & Contributing

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages