ACE-LoRA
_{Graph-Attentive Context Enhancement for Parameter-Efficient Adaptation of Medical Vision-Language Models}

M. Arda Aydın^1,2 · Melih B. Yılmaz^1,2 · Aykut Koç^1,2 · Tolga Çukur^1,2

¹Bilkent University ²UMRAM

Abstract: The success of CLIP-like vision-language models (VLMs) on natural images has inspired medical counterparts, yet existing approaches largely fall into two extremes: specialist models trained on single-domain data, which capture domain-specific details but generalize poorly, and generalist medical VLMs trained on multi-domain data, which retain broad semantics but dilute fine-grained diagnostic cues. Bridging this specialization–generalization trade-off remains challenging. To address this problem, we propose ACE-LoRA, a parameter-efficient adaptation framework for generalist medical VLMs that maintains robust zero-shot generalization. ACE-LoRA integrates Low-Rank Adaptation (LoRA) modules into frozen image-text encoders and introduces an Attention-based Context Enhancement Hypergraph Neural Network (ACE-HGNN) module that captures higher-order contextual interactions beyond pairwise similarity to enrich global representations with localized diagnostic cues, addressing a key limitation of prior Parameter-Efficient Fine-Tuning (PEFT) methods that overlook fine-grained details. To further enhance cross-modal alignment, we formulate a label-guided InfoNCE loss to effectively suppress false negatives between semantically related image-text pairs. Despite adding only 0.95M trainable parameters, ACE-LoRA consistently outperforms state-of-the-art medical VLMs and PEFT baselines across zero-shot classification, segmentation, and detection benchmarks spanning multiple domains.

🎉 News

2026/03/18 Our paper and code are publicly available.

⚙️ Installation

We used Python 3.10.18 and PyTorch 2.1.0 using cuda 11.8 in our framework.

Create a Conda environment:

conda create -n env_name python=3.10.18

Install PyTorch:

conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia

Install remaining dependencies:

pip install -r requirements.txt

💾 Datasets

MIMIC-CXR: For pretraining, we use the MIMIC-CXR dataset and exclude lateral images. Access to the dataset is available at the following link (note that you must satisfy the dataset provider’s requirements to download the data): [link]

NIH Chest X-ray: For validation, we use the NIH Chest X-ray dataset. The dataset can be accessed at the following link: [link]. After downloading, run dataset_prep/chestx-ray_14_prep.py to split the data and prepare it in the required format.

CheXpert 5x200: For zero-shot classification, we use the CheXpert 5×200 dataset. The dataset can be accessed at the following link: [link].

RSNA: We use the RSNA dataset for both zero-shot classification and object detection. The dataset can be accessed at the following link: [link]. After downloading, run dataset_prep/rsna_dataset_create.py to split the data and prepare it in the required format for both tasks.

SIIM: We use the SIIM dataset for both zero-shot classification and semantic segmentation. The dataset can be accessed at the following link: [link]. After downloading, run dataset_prep/SIIM_generate_class_labels.py to prepare the data for zero-shot classification, and dataset_prep/SIIM_generate_mask.py for semantic segmentation.

🚀 Training

To fine-tune the model, run the following command for single-GPU training:

python train.py

For multi-GPU training, run:

python train_multi_gpu.py

Note that training arguments and paths can be specified using run_utils.py or run_utils_multi_gpu.py.

🧪 Evaluation

We also provide the ACE-LoRA weights trained on the MIMIC-CXR dataset (checkpoint/ACE_LoRA.pt). After downloading the datasets, you can directly execute the evaluation scripts for your chosen dataset (RSNA, SIIM, or CheXpert) located in the zero_shot_eval folder. For example, after setting the appropriate paths, you can run the following command to evaluate the zero-shot performance of ACE-LoRA on the CheXpert 5×200 dataset:

python zero_shot_eval/zero_shot_eval_ace_lora_chexpert.py

🤝 Acknowledgments

This implementation builds upon CLIP-LoRA and LoRA. We gratefully acknowledge their valuable contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
checkpoint		checkpoint
dataset_prep		dataset_prep
figs		figs
loralib		loralib
zero_shot_eval		zero_shot_eval
LICENSE.txt		LICENSE.txt
README.md		README.md
bert_modeling_bert_self_attn_patch.py		bert_modeling_bert_self_attn_patch.py
dataset.py		dataset.py
lora.py		lora.py
loss.py		loss.py
open_clip_patch.py		open_clip_patch.py
prompt_templates.py		prompt_templates.py
requirements.txt		requirements.txt
run_utils.py		run_utils.py
run_utils_multi_gpu.py		run_utils_multi_gpu.py
timm_vit_return_attn_patch.py		timm_vit_return_attn_patch.py
train.py		train.py
train_multi_gpu.py		train_multi_gpu.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ACE-LoRA
_{Graph-Attentive Context Enhancement for Parameter-Efficient Adaptation of Medical Vision-Language Models}

🎉 News

⚙️ Installation

💾 Datasets

🚀 Training

🧪 Evaluation

🤝 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ACE-LoRA Graph-Attentive Context Enhancement for Parameter-Efficient Adaptation of Medical Vision-Language Models

🎉 News

⚙️ Installation

💾 Datasets

🚀 Training

🧪 Evaluation

🤝 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

ACE-LoRA
_{Graph-Attentive Context Enhancement for Parameter-Efficient Adaptation of Medical Vision-Language Models}

Packages