ICASSP 2025 paper on curriculum-based masked image modeling for self-supervised visual representation learning.
Hao Liu1, Kun Wang1, Yudong Han1, Haocong Wang1, Yupeng Hu1, Chunxiao Wang2, Liqiang Nie3
1 School of Software, Shandong University, Jinan, China
2 Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
3 School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
- Paper (IEEE Xplore): CurMIM: Curriculum Masked Image Modeling
- Updates
- Introduction
- Highlights
- Method Overview
- Project Structure
- Installation
- Checkpoints
- Dataset
- Usage
- Results
- Citation
- Acknowledgement
- License
- Contact
- [04/2026] Initial public code release.
This repository contains the implementation of CurMIM: Curriculum Masked Image Modeling (ICASSP 2025).
Masked Image Modeling (MIM) usually applies a fixed masking strategy during pretraining. CurMIM introduces a curriculum-style strategy to progressively adjust masking behavior, helping the model learn from easier to harder reconstruction targets and improving representation quality.
This repository currently provides:
- pretraining code
- finetuning / linear probing code
- training utilities and distributed training scripts
- Curriculum-based masking for MIM pretraining
- MAE-style pretraining + ViT finetuning workflow
- Support for pretrain, finetune, and linear probe pipelines
.
|-- asset/ # framework figure and visual assets
|-- util/ # data, optimization, lr schedule, misc utils
|-- convGRU.py # ConvGRU module used in masking dynamics
|-- models_mae.py # MAE backbone and pretraining model
|-- models_vit.py # ViT classification model
|-- vision_transformer.py # transformer utilities
|-- engine_pretrain.py # pretraining loop
|-- engine_finetune.py # finetuning/evaluation loop
|-- main_pretrain.py # entry for MIM pretraining
|-- main_finetune.py # entry for finetuning
|-- main_linprobe.py # entry for linear probing
|-- submitit_pretrain.py # distributed launcher (submitit)
|-- submitit_finetune.py # distributed launcher (submitit)
|-- submitit_linprobe.py # distributed launcher (submitit)
|-- README.md
git clone https://github.com/iLearn-Lab/ICASSP25-CurMIM.git
cd CurMIMpython -m venv .venv
source .venv/bin/activate # Linux / Mac
# .venv\Scripts\activate # Windowspip install torch torchvision timm==0.3.2 tensorboardThe cloud links of checkpoints: Google Drive and Hugging Face.
Follow MAE 's dataset preparation for ImageNet.
python -m torch.distributed.launch --nproc_per_node {GPU_number} ./main_pretrain.py --batch_size 128 \
--accum_iter 2 \
--model {model_type} \
--mask_ratio 0.75 --epochs 300 --warmup_epochs 40 \
--blr 4e-4 --weight_decay 0.05 \
--data_path ../path --output_dir ./output_dir/!python -m torch.distributed.launch --nproc_per_node={GPU_number} ./main_finetune.py \
--batch_size 128 \
--nb_classes {nb_classes} \
--model {model_type} \
--finetune ./checkpoint.pth \
--epochs 100 \
--blr 1e-3 --layer_decay 0.65 --output_dir ./finetune \
--weight_decay 0.05 --drop_path 0.1 --mixup 0.8 --cutmix 1.0 --reprob 0.25 \
--dist_eval --data_path ../data/Fine-tuning performance for models pre-trained on ImageNet-1K (top) and miniImageNet (bottom).
@inproceedings{liu2025curmim,
title={CurMIM: Curriculum Masked Image Modeling},
author={Liu, Hao and Wang, Kun and Han, Yudong and Wang, Haocong and Hu, Yupeng and Wang, Chunxiao and Nie, Liqiang},
booktitle={2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={1--5},
year={2025},
doi={10.1109/ICASSP49660.2025.10890877}
}- Thanks to the MAE and ViT open-source community for strong baselines and tooling.
- Thanks to all collaborators and contributors of this project.
This project is released under the Apache License 2.0.
If you have any questions, feel free to contact me at liuh90210@gmail.com.

