(AAAI 2026) HABIT: Chrono-Synergia Robust Progressive Learning Framework for Composed Image Retrieval
βΒ Corresponding authorΒ Β
Accepted by AAAI 2026: A robust progressive learning framework tackling the Noise Triplet Correspondence (NTC) problem in Composed Image Retrieval (CIR).
HABIT (cHrono-synergiA roBust progressIve learning framework for composed image reTrieval) is our proposed robust learning framework for Composed Image Retrieval, accepted by AAAI 2026. Based on an in-depth analysis of the "Noisy Triplet Correspondence (NTC)" problem in real-world retrieval scenarios, HABIT effectively addresses the shortcomings of existing methods in precisely estimating composed semantic discrepancies and progressively adapting to modification discrepancies.
- [2026-03-20] π Official paper is released at AAAI 2026.
- [2026-03-18] π Released all codes for HABIT.
- [2025-11-08] π₯ Our paper "HABIT: Chrono-Synergia Robust Progressive Learning Framework for Composed Image Retrieval" has been accepted by AAAI 2026!
- π§ Mutual Knowledge Estimation (MKE): Precisely quantifies sample cleanliness by computing the Transition Rate of mutual knowledge between composed features and target images, effectively identifying clean samples that align with modification semantics.
- β³ Dual-consistency Progressive Learning (DPL): Introduces a collaborative mechanism between historical and current models to simulate human habit formation (retaining good habits and calibrating bad ones), enabling robust learning against noisy data interference.
- π‘οΈ Highly Robust to NTC: Maintains State-of-the-Art (SOTA) retrieval performance under various Noise Triplet Correspondence (NTC) settings with different noise ratios (0%, 20%, 50%, 80%).
Table 1. Performance comparison on FashionIQ in terms of R@K (%). The best result under each noise ratio is highlighted in bold, while the second-best result is underlined. Table 2. Performance comparison on the CIRR test set in terms of R@K (%) and Rsub@K (%). The best and second-best results are highlighted in bold and underlined, respectively.π‘ Note for Fully-Supervised CIR Benchmarking:
π― The 0% noise setting in the table below is equivalent to the traditional fully-supervised CIR paradigm. We highlight this0%block to facilitate direct and fair comparisons for researchers working on conventional supervised methods.
- Introduction
- News
- Key Features
- Architecture
- Experiment Results
- Install
- Data Preparation
- Quick Start
- Project Structure
- Acknowledgement
- Contact
- Citation
- Support & Contributing
1. Clone the repository
git clone https://github.com/Lee-zixu/HABIT
cd HABIT2. Setup Python Environment
The code is evaluated on Python 3.8.10 and CUDA 12.6. We recommend using Anaconda to create an isolated virtual environment:
conda create -n habit python=3.8
conda activate habit
# Install PyTorch (The evaluated environment uses Torch 2.1.0 with CUDA 12.1 compatibility)
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url [https://download.pytorch.org/whl/cu121](https://download.pytorch.org/whl/cu121)
# Install core dependencies
pip install open-clip-torch==2.24.0 scikit-learn==1.3.2 transformers==4.25.0 salesforce-lavis==1.0.2 timm==0.9.16Note: Key dependencies include
salesforce-lavisfor the base architecture,open-clip-torchfor vision-language features, andscikit-learnfor DBSCAN clustering during Noise Discrimination.
We evaluated our framework on two standard datasets: FashionIQ and CIRR. Please download the datasets first.
Click to expand: FashionIQ Dataset Directory Structure
Please follow the official instructions to download the FashionIQ dataset. Once downloaded, ensure the folder structure looks like this:
βββ FashionIQ
β βββ captions
β β βββ cap.dress.[train | val | test].json
β β βββ cap.toptee.[train | val | test].json
β β βββ cap.shirt.[train | val | test].json
β βββ image_splits
β β βββ split.dress.[train | val | test].json
β β βββ split.toptee.[train | val | test].json
β β βββ split.shirt.[train | val | test].json
β βββ dress
β β βββ [B000ALGQSY.jpg | B000AY2892.jpg | B000AYI3L4.jpg |...]
β βββ shirt
β β βββ [B00006M009.jpg | B00006M00B.jpg | B00006M6IH.jpg | ...]
β βββ toptee
β β βββ [B0000DZQD6.jpg | B000A33FTU.jpg | B000AS2OVA.jpg | ...]
Click to expand: CIRR Dataset Directory Structure
Please follow the official instructions to download the CIRR dataset. Once downloaded, ensure the folder structure looks like this:
βββ CIRR
β βββ train
β β βββ [0 | 1 | 2 | ...]
β β β βββ [train-10108-0-img0.png | train-10108-0-img1.png | ...]
β βββ dev
β β βββ [dev-0-0-img0.png | dev-0-0-img1.png | ...]
β βββ test1
β β βββ [test1-0-0-img0.png | test1-0-0-img1.png | ...]
β βββ cirr
β βββ captions
β β βββ cap.rc2.[train | val | test1].json
β βββ image_splits
β β βββ split.rc2.[train | val | test1].json
In our implementation, we introduce the noise_ratio parameter to simulate varying degrees of NTC (Noise Triplet Correspondence) interference. You can reproduce the experimental results from the paper by modifying the --noise_ratio parameter (default options evaluated are 0.0, 0.2, 0.5, 0.8).
Training on FashionIQ:
python train.py \
--dataset fashioniq \
--fashioniq_path "/path/to/FashionIQ/" \
--model_dir "./checkpoints/fashioniq_noise0.2" \
--noise_ratio 0.2 \
--batch_size 256 \
--num_epochs 20 \
--lr 2e-5Training on CIRR:
python train.py \
--dataset cirr \
--cirr_path "/path/to/CIRR/" \
--model_dir "./checkpoints/cirr_noise0.5" \
--noise_ratio 0.5 \
--batch_size 256 \
--num_epochs 20 \
--lr 2e-5π‘ Tips: > - Our model is based on the powerful BLIP-2 architecture. It is highly recommended to run the training on GPUs with sufficient memory (e.g., NVIDIA A40 48G / V100 32G).
- The best model weights and evaluation metrics generated during training will be automatically saved in the
best_model.ptandmetrics_best.jsonfiles within your specified--model_dir.
To generate the prediction files on the CIRR dataset for submission to the CIRR Evaluation Server, run the following command:
python src/cirr_test_submission.py checkpoints/cirr_noise0.5/(The corresponding script will automatically output .json based on the generated best checkpoints in the folder for online evaluation.)
Our code is deeply customized based on the LAVIS framework. The core implementations are centralized in the following files:
HABIT/
βββ lavis/
β βββ models/
β β βββ blip2_models/
β β βββ HABIT.py # π§ Core model implementation: Includes MKE, DPL modules, and loss functions
βββ train.py # π Training entry point: Controls noise_ratio injection and training loops
βββ datasets.py
βββ test.py
βββ utils.py
βββ data_utils.py
βββ cirr_test_submission.py # Auxiliary scripts
βββ datasets/ # Dataset loading and processing logic
βββ README.md
The implementation of this project references the LAVIS framework and the noise setting concepts from TME. We express our sincere gratitude to these open-source contributions!
For any questions, issues, or feedback, please open an issue on GitHub or reach out to me at lizixu.cs@gmail.com.
Ecosystem & Other Works from our Team
![]() TEMA (ACL'26) Web | Code | |
![]() ConeSep (CVPR'26) Web | Code | |
![]() Air-Know (CVPR'26) Web | Code | |
![]() ReTrack (AAAI'26) Web | Code | Paper |
![]() INTENT (AAAI'26) Web | Code | Paper |
![]() HUD (ACM MM'25) Web | Code | Paper |
![]() OFFSET (ACM MM'25) Web | Code | Paper |
![]() ENCODER (AAAI'25) Web | Code | Paper |
If you find our work or this code useful in your research, please consider leaving a StarβοΈ or Citingπ our paper π₯°. Your support is our greatest motivation!
@inproceedings{HABIT,
title={HABIT: Chrono-Synergia Robust Progressive Learning Framework for Composed Image Retrieval},
author={Li, Zixu and Hu, Yupeng and Chen, Zhiwei and Zhang, Shiqi and Huang, Qinlei and Fu, Zhiheng and Wei, Yinwei},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2026}
}We welcome all forms of contributions! If you have any questions, ideas, or find a bug, please feel free to:
- Open an Issue for discussions or bug reports.
- Submit a Pull Request to improve the codebase.
This project is released under the terms of the LICENSE file included in this repository.











