CNN-JEPA: Self-Supervised Pretraining Convolutional Neural Networks Using Joint Embedding Predictive Architecture
This repository is the official implementation of CNN-JEPA: Self-Supervised Pretraining Convolutional Neural Networks Using Joint Embedding Predictive Architecture
[arXiv preprint] [Official Publication (IEEE Xplore)]
If you use our code or results, please cite our paper and consider giving this repo a ⭐ :
@INPROCEEDINGS{kalapos2024cnnjepa,
author={Kalapos, András and Gyires-Tóth, Bálint},
booktitle={2024 International Conference on Machine Learning and Applications (ICMLA)},
title={CNN-JEPA: Self-Supervised Pretraining Convolutional Neural Networks Using Joint Embedding Predictive Architecture},
year={2024},
pages={1111-1114},
doi={10.1109/ICMLA61862.2024.00169}}
[1] K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, Masked Autoencoders Are Scalable Vision Learners, presented at the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 16000–16009. [paper]
[2] K. Tian, Y. Jiang, Q. Diao, C. Lin, L. Wang, and Z. Yuan, Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling, presented at The Eleventh International Conference on Learning Representations, Sep. 2022. [paper] [code]
[3] M. Assran et al., Self-Supervised Learning From Images With a Joint-Embedding Predictive Architecture, presented at the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 15619–15629. [paper] [code]
Configs are provided for ImageNet-100 and ImageNet-1k.
PYTHONPATH=. python pretrain/train_ijepacnn.py --config-name ijepacnn_imagenet.yamlBaseline implementations of the following pretraining approaches are also provided:
We recommend using the provided Docker container to run the code.
- Create a keypair, copy the public key to the root of this repo, and edit the Dockerfile accordingly.
- Run
make ssh. - Connect on port 2222
ssh root@<hostname> -i <private_key_path> -p 2222.
Alternatively, to run the container without starting an ssh server, run make run.
To customize Docker build and run, edit the Makefile or the Dockerfile.
⚠️ make sshandmake runstart the container with the--rmflag! Only contents of the/workspacepersist if the container is stopped (via a simple volume mount)!
Install the requirements with pip install -r requirements.txt.
To achieve optimal performance on our HPC cluster, we store the datasets in HDF5 format. If
torchvision.datasets.ImageFolder datasets are efficient on your system, you can use them instead, by editing lines 182-183 in pretrain/trainer_common.py.
To use the datasets in HDF5 format, you need to first download the datasets, extract them to their default ImageFolder format, then convert them to the HDF5 format we use. For the conversion, we provide a function in data/hdf5_imagefolder.py.
Download the datasets from the following links:
Our implementation is based on:
- SparK
- The official I-JEPA implementation that pretrains Vision Transformers
