Create a new conda enviornment from scratch:
module load miniconda3/24.1.2-py310 # for OSC
module load conda # for Anvil
conda create -n DRIP python=3.11 -y
conda activate DRIP
python -m pip install -r requirements.txtactivate an existing one:
module load miniconda3/24.1.2-py310 # for OSC
module load conda # for Anvil
conda deactivate
conda activate DRIPrunning experiments:
sbatch scripts/task1/finetune_imagenet.shboundary visualization & attention map analysis:
for ImageNet
python src/boundary_visual_IN.pyGFLOPs measurement:
# DRIP
python src/FLOP.py --mode DRIP --compression_rate 0.25
# Fixed pooling
python src/FLOP.py --mode fixed_pooling --compression_rate 0.25
# original ViT
python src/FLOP.py --mode ViTexamples:
| boundaries | attention maps |
|---|---|
| imagenet_DRIP_4x_01_warmup2 | model_299.pth |
![]() |
![]() |
| imagenet_DRIP_4x_half_LR_no_warmup | model_186.pth |
![]() |
![]() |
Go to file src/LLaVA_wrapper/llava_local/model/multimodal_encoder/builder.py to configure merging strategies(ViT/original, Fixed/fixed pooling, DRIP/dynamic tokenization) and corresponding compression rate (0.5/2x, 0.25/4x, 0.1/10x):
MERGE_STRATEGY = "Fixed" # "ViT" or "DRIP" "Fixed" and more!
COMPRESSION_RATE = 0.25Additional note: the ViT backbone from LLaVA checkpoint is openai/clip-vit-large-patch14-336.
Then we are good to move onto benchmark experiments.
General VQA (4):
# SQA
sbatch scripts/task3/eval/eval_SQA.sh
# MM-Bench
sbatch scripts/task3/eval/eval_MMBench.sh
# MME
sbatch scripts/task3/eval/eval_MME.sh
# VQAv2 [🚨LONG🚨]
# need to submit the result json file to:
# https://eval.ai/web/challenges/challenge-page/830
sbatch scripts/task3/eval/eval_VQAv2.shReasoning (1):
# GQA
sbatch scripts/task3/eval/eval_GQA.shOCR (1):
# TextVQA
sbatch scripts/task3/eval/eval_textVQA.shHallucination (1):
# POPE
sbatch scripts/task3/eval/eval_POPE.shFree Response (1):
# LLaVA-in-the-wild
sbatch scripts/task3/eval/eval_in_the_wild.shBefore anything, make sure flash attention is installed:
# install
sbatch flash_attn.sh
# test
sbatch test_flash_attn.sh
# what to expect:
# torch.Size([1, 128, 8, 64]) torch.float16 cuda:0# ascend
sbatch scripts/task3/pretrain_ascend.sh
# anvilsbatch scripts/task3/finetune.shIf you have any questions or suggestions, feel free to contact:
- Yusen Peng (peng.1007@osu.edu)
- Sachin Kumar (kumar.1145@osu.edu)
Or describe it in Issues.





