Skip to content

iLearn-Lab/NeurIPS25-SymMPO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SymMPO: Mitigating Hallucination Through Theory-Consistent Symmetric Multimodal Preference Optimization

Wenqi Liu1, Xuemeng Song2, Jiaxi Li3, Yinwei Wei1, Zheng Na4, Jianhua Yin1, Liqiang Nie5
1Shandong University    2Southern University of Science and Technology    3University of Georgia   
4National University of Singapore    5Harbin Institute of Technology, Shenzhen   


Updates

  • [04/2026] Formatted README.
  • [09/2025] We release the training code, model weights, and dataset.
  • [09/2025] SymMPO was accepted by NeurIPS 2025! 🎉🎉🎉
  • [06/2025] We release the arXiv paper.

Introduction

We present SymMPO, a framework for mitigating hallucination in multimodal large language models (MLLMs). Our method introduces a theory-consistent symmetric multimodal preference optimization approach that addresses the hallucination problem from a principled perspective. This repository provides the official implementation, pretrained checkpoints, and evaluation scripts built on top of LLaVA.


Project Structure

.
├── asset/                  # Figures
├── eval/                   # Evaluation scripts (HallusionBench, Object-HalBench, MMHal, AMBER, MMSTAR)
├── llava/                  # LLaVA model code (architecture, encoders, projectors)
├── muffin/                 # Training & data utilities
├── script/
│   ├── train/              # Training scripts (full / LoRA)
│   └── eval/               # Evaluation shell scripts
├── run.sh                  # Quick-start training entry
├── requirements.txt
└── README.md

Installation

Our codebase requires CUDA version 11.8.

conda create -n symmpo python=3.10 -y
conda activate symmpo
pip install -r requirements.txt

Checkpoints / Models

Additionally, download the following pretrained models:

Model Link
LLaVA-v1.5-7B liuhaotian/llava-v1.5-7b
CLIP ViT-L/14@336 openai/clip-vit-large-patch14-336

After downloading, update the model paths:

  1. Set the LLaVA model path in the 3rd line of run.sh.
  2. Set the CLIP model path in:
    • The 4th line of run.sh
    • The 6th line of llava/model/multimodal_encoder/builder.py
    • The 14th line of llava/model/multimodal_encoder/clip_encoder.py

Dataset

Download and place the data according to the path specified in run.sh.


Usage

Training

bash run.sh

The default configuration in run.sh:

bash script/train/llava15_train_main.sh \
    SymMPO_test \
    "[Path of your LLaVA model]" \
    "[Path of your vision tower model]" \
    demo_data/similar \
    0,1,2,3 \
    5e-6 \
    0.5

Evaluation

During evaluation, HallusionBench / Object-HalBench / MMHal-Bench require assessment using DeepSeek-V3 / GPT-3.5 / GPT-4.

HallusionBench

  1. Download Questions and Annotations and Figures.
  2. Run evaluation:
bash script/eval/eval_hallusion.sh [ckpt_path] [base_path or "No"] [YOUR_DEEPSEEK_API_KEY] [GPU_ID]

We default to DeepSeek-V3. Replace {YOUR_DEEPSEEK_API_KEY} with a valid key, or modify line 48 in eval/hallusion_evaluation.py.

Object-HalBench

  1. Download data from COCO.
  2. Install supplement models:
import nltk
nltk.download('wordnet')
nltk.download('punkt')
python -m spacy download en_core_web_trf
  1. Run evaluation:
bash script/eval/eval_objhal.sh [ckpt_path] [base_path or "No"] [YOUR_OPENAI_API_KEY] [GPU_ID]

We default to gpt-3.5-turbo-0125. Replace {YOUR_OPENAI_API_KEY} with a valid key, or modify line 51 in eval/gpt4_grpc.py.

MMHal-Bench

  1. Download data from MMHal-Bench.
  2. Run evaluation:
bash script/eval/eval_mmhal.sh [ckpt_path] [base_path or "No"] [YOUR_OPENAI_API_KEY] [GPU_ID]

We default to gpt-4-1106-preview. Replace {YOUR_OPENAI_API_KEY} with a valid key, or modify line 51 in eval/gpt4_grpc.py.

AMBER

  1. Download AMBER data and images.
  2. Install supplement model:
python -m spacy download en_core_web_lg
  1. Run evaluation:
bash script/eval/eval_amber.sh [ckpt_path] [base_path or "No"] [GPU_ID] [data_dir]

MMSTAR

  1. Download data from MMSTAR.
  2. Run evaluation:
bash script/eval/eval_mmstar.sh [ckpt_path] [base_path or "No"] [GPU_ID] [data_dir]

Citation

If you find our work helpful, please consider citing:

@inproceedings{
  liu2025mitigating,
  title={Mitigating Hallucination Through Theory-Consistent Symmetric Multimodal Preference Optimization},
  author={Wenqi Liu and Xuemeng Song and Jiaxi Li and Yinwei Wei and Na Zheng and Jianhua Yin and Liqiang Nie},
  booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
  year={2025},
  url={https://openreview.net/forum?id=tIW29IpCwG}
}

Acknowledgement

  • TPO and RLAIF-V: This work extends the implementations provided by these projects.
  • LLaVA: The training process was carried out on the LLaVA model.

About

[NeurIPS 2025] Mitigating Hallucination Through Theory-Consistent Symmetric Multimodal Preference Optimization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages