Skip to content

osu-srml/NH-Fair

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NH-Fair: Benchmarking Bias Mitigation Toward Fairness Without Harm

Official code for "Benchmarking Bias Mitigation Toward Fairness Without Harm from Vision to LVLMs" (ICLR 2026).

[Paper]

NH-Fair is a unified fairness benchmark that covers both vision models and large vision-language models (LVLMs) under standardized data, metrics, and training protocols. It provides a tuning-aware, sweep-first pipeline for rigorous, harm-aware fairness evaluation.

Installation

From the release/ directory (this folder):

pip install -e .

You can also use uv: run uv sync (and optional extras such as uv sync --extra lavis or uv sync --extra llm as defined in pyproject.toml), then e.g. uv run python -m release_benchmark.cli.train --help.

Run CLIs as python -m release_benchmark.cli.<command> from any working directory after installation, or set PYTHONPATH=src when running from release/ without installing.

Key Features

  • 7 datasets: CelebA, UTKFace, FairFace, FACET, Waterbirds, HAM10000, Fitzpatrick17k
  • Multiple bias mitigation methods: ERM, GroupDRO, LAFTR, DFR, FairMixup, CLIP-based, OxonFair, and more
  • LVLMs: Qwen2.5-VL, LLaMA 3.2/4, Gemma 3, LLaVA-NeXT — local transformers or OpenAI-compatible gateway
  • Standardized metrics: Accuracy, AUC, DP, EqOpp, EqOdd, worst-group accuracy, accuracy gap

Dataset Setup

Each dataset has a dedicated preprocessing script under data/<dataset>/preprocess.py. See data/README.md for detailed download links and step-by-step instructions.

Quick start:

cd data

# Download raw files into <dataset>/raw/, then run:
python celeba/preprocess.py   --raw_dir celeba/raw   --output_dir celeba
python utk/preprocess.py      --raw_dir utk/raw      --output_dir utk
python fairface/preprocess.py --raw_dir fairface/raw  --output_dir fairface
python facet/preprocess.py    --raw_dir facet/raw     --output_dir facet    --num_workers 8
python waterbirds/preprocess.py --raw_dir waterbirds/raw --output_dir waterbirds
python ham/preprocess.py      --raw_dir ham/raw       --output_dir ham
python fitz/preprocess.py     --raw_dir fitz/raw      --output_dir fitz     --num_workers 8
Dataset Source Sensitive Attr Target
CelebA torchvision Gender 40 binary attributes
UTKFace UTKFace Race / Gender Gender / Race
FairFace FairFace Race / Gender Gender / Race
FACET FACET Gender Face visibility
Waterbirds Waterbirds Background Bird type
HAM10000 HAM10000 Sex / Age Diagnosis
Fitzpatrick17k Fitz17k Skin type Diagnosis

Usage

Supervised Training

python -m release_benchmark.cli.train \
  --dataset celeba --method erm --sa sex --ta 33 \
  --model resnet18 --pretrain 1 --lr 0.001 --bs 128 \
  --epochs 30 --gpu 0

Zero-shot LLM Evaluation

python -m release_benchmark.cli.zeroshot \
  --dataset celeba --method qwen --model Qwen/Qwen2.5-VL-7B-Instruct --sa sex --ta 33 \
  --image_direct --bs 1 --gpu 0

# Same LVLM via OpenAI-compatible server (start one with scripts/launch_llm_gateway_qwen.sh, or use vendors' API)
python -m release_benchmark.cli.zeroshot \
  --dataset celeba --method qwen --vlm_backend gateway \
  --llm_gateway_url http://127.0.0.1:8000/v1 --model Qwen/Qwen2.5-VL-7B-Instruct \
  --sa sex --ta 33 --image_direct --bs 1 --gpu 0

python -m release_benchmark.cli.zeroshot \
  --dataset waterbirds --method blip2 --bs 32 --gpu 0

python -m release_benchmark.cli.zeroshot \
  --dataset celeba --method clip --model vitb16 --sa sex --ta 33 --bs 32 --gpu 0

Sweep (Hyperparameter Search via W&B)

python -m release_benchmark.cli.sweep \
  --dataset celeba --method erm --sa sex --ta 33 --gpu auto

Implemented Methods

Category Methods
Baseline ERM, RandAug, Resample
Fairness GroupDRO, LAFTR, DFR, GapReg, MCDP, FairMixup
Data-centric FIS, BM (Bias Mimicking), FSCL+
CLIP-based CLIP, CLIP-SFID, CLIP-Fairer, BLIP2
Post-hoc OxonFair, Decoupled
LVLMs Qwen2.5-VL, LLaMA 3.2/4, Gemma 3, LLaVA-NeXT

Project Structure

release/
├── data/
│   ├── manifests/        # Dataset manifest examples
│   ├── celeba/           # CelebA preprocessing
│   ├── utk/              # UTKFace preprocessing
│   ├── fairface/         # FairFace preprocessing
│   ├── facet/            # FACET preprocessing
│   ├── waterbirds/       # Waterbirds preprocessing
│   ├── ham/              # HAM10000 preprocessing
│   └── fitz/             # Fitzpatrick17k preprocessing
├── docs/                 # Method audit and sweep guide
├── scripts/              # Shell scripts for common workflows
├── src/release_benchmark/
│   ├── cli/              # Entry points: train, zeroshot, sweep
│   ├── configs/          # Sweep and dataset YAML templates (packaged with the lib)
│   ├── datasets/         # Dataset loaders (FairDataset base class)
│   ├── methods/
│   │   ├── cv/           # Vision methods (ERM, GroupDRO, LAFTR, ...)
│   │   ├── vlm/          # CLIP/BLIP-2; LVLMs (HF + gateway); see scripts/launch_llm_gateway_*.sh
│   │   └── registry.py   # Method name -> class resolution
│   ├── metrics/          # Fairness and performance metrics
│   └── utils/            # Seeds, logging, helpers
└── tests/                # Smoke and registry tests

Environment Variables

Variable Description
WANDB_API_KEY Weights & Biases API key for sweep logging
HF_TOKEN Hugging Face token (required for gated models)

Citation

@inproceedings{
tan2026benchmarking,
title={Benchmarking Bias Mitigation Toward Fairness Without Harm from Vision to {LVLM}s},
author={Xuwei Tan and Ziyu Hu and Xueru Zhang},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=GLPmZhhCAE}
}

License

This project is released under the MIT License.

About

NH-Fair: Benchmarking Bias Mitigation Toward Fairness Without Harm from Vision to LVLMs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors