This repository contains code for mapping embedding space of one text encoder to another. We evaluate these mapping ("aligners") for different applications, including: mapping unimodal text encoder to CLIP's text encoder and evaluating on common multimodal tasks; mapping text encoders to an embedding space invertible by Vec2Text and inverting their embeddings to text.
This project was done as part of the course Advanced Topics in Deep Learning, TAU.
Run on isolated Python 3.8+ environment, and ensure submodules are updated.
Install the forked packages:
>> cd CLIP_benchmark
>> python setup.py install
>> cd vec2text
>> python setup.py installNext, install the original CLIP-benchmark repository:
>> cd CLIP_benchmark
>> pip install -e .Training an aligner requires:
- Creating datasets for source and target text-encoders embeddings; available via
slurm-jobs/make_dataset.slurm. - training the aligner; available via
slurm-jobs/train_to-{clip,text}.slurm.
The following describes how to evaluate an existing aligner, located in ./out/{aligner_dir}/, on different tasks.
clip_benchmark eval --dataset cifar10 cifar100 imagenet1k --task zeroshot_classification --model source source+aligner target --pretrained NONE \
--model_type our_experimental_models --model_cache_dir "${OUT_DIR}" \
--output "${OUT_DIR}/benchmark_{dataset}_{model}_{task}.json" --batch_size 1024clip_benchmark eval --dataset wds/flickr8k wds/flickr30k wds/mscoco_captions \
--task zeroshot_retrieval --model source+aligner target --pretrained NONE \
--dataset_root "https://huggingface.co/datasets/clip-benchmark/wds_{dataset_cleaned}/tree/main" \
--model_type our_experimental_models --model_cache_dir ${OUT_DIR} \
--output "${OUT_DIR}/benchmark_{dataset}_{model}_{task}.json" --batch_size 1024python cli.py evaluate text_inversion__nq ${OUT_DIR}