NH-Fair: Benchmarking Bias Mitigation Toward Fairness Without Harm

Official code for "Benchmarking Bias Mitigation Toward Fairness Without Harm from Vision to LVLMs" (ICLR 2026).

NH-Fair is a unified fairness benchmark that covers both vision models and large vision-language models (LVLMs) under standardized data, metrics, and training protocols. It provides a tuning-aware, sweep-first pipeline for rigorous, harm-aware fairness evaluation.

Installation

From the release/ directory (this folder):

pip install -e .

You can also use uv: run uv sync (and optional extras such as uv sync --extra lavis or uv sync --extra llm as defined in pyproject.toml), then e.g. uv run python -m release_benchmark.cli.train --help.

Run CLIs as python -m release_benchmark.cli.<command> from any working directory after installation, or set PYTHONPATH=src when running from release/ without installing.

Key Features

7 datasets: CelebA, UTKFace, FairFace, FACET, Waterbirds, HAM10000, Fitzpatrick17k
Multiple bias mitigation methods: ERM, GroupDRO, LAFTR, DFR, FairMixup, CLIP-based, OxonFair, and more
LVLMs: Qwen2.5-VL, LLaMA 3.2/4, Gemma 3, LLaVA-NeXT — local transformers or OpenAI-compatible gateway
Standardized metrics: Accuracy, AUC, DP, EqOpp, EqOdd, worst-group accuracy, accuracy gap

Dataset Setup

Each dataset has a dedicated preprocessing script under data/<dataset>/preprocess.py. See data/README.md for detailed download links and step-by-step instructions.

Quick start:

cd data

# Download raw files into <dataset>/raw/, then run:
python celeba/preprocess.py   --raw_dir celeba/raw   --output_dir celeba
python utk/preprocess.py      --raw_dir utk/raw      --output_dir utk
python fairface/preprocess.py --raw_dir fairface/raw  --output_dir fairface
python facet/preprocess.py    --raw_dir facet/raw     --output_dir facet    --num_workers 8
python waterbirds/preprocess.py --raw_dir waterbirds/raw --output_dir waterbirds
python ham/preprocess.py      --raw_dir ham/raw       --output_dir ham
python fitz/preprocess.py     --raw_dir fitz/raw      --output_dir fitz     --num_workers 8

Dataset	Source	Sensitive Attr	Target
CelebA	torchvision	Gender	40 binary attributes
UTKFace	UTKFace	Race / Gender	Gender / Race
FairFace	FairFace	Race / Gender	Gender / Race
FACET	FACET	Gender	Face visibility
Waterbirds	Waterbirds	Background	Bird type
HAM10000	HAM10000	Sex / Age	Diagnosis
Fitzpatrick17k	Fitz17k	Skin type	Diagnosis

Usage

Supervised Training

python -m release_benchmark.cli.train \
  --dataset celeba --method erm --sa sex --ta 33 \
  --model resnet18 --pretrain 1 --lr 0.001 --bs 128 \
  --epochs 30 --gpu 0

Zero-shot LLM Evaluation

python -m release_benchmark.cli.zeroshot \
  --dataset celeba --method qwen --model Qwen/Qwen2.5-VL-7B-Instruct --sa sex --ta 33 \
  --image_direct --bs 1 --gpu 0

# Same LVLM via OpenAI-compatible server (start one with scripts/launch_llm_gateway_qwen.sh, or use vendors' API)
python -m release_benchmark.cli.zeroshot \
  --dataset celeba --method qwen --vlm_backend gateway \
  --llm_gateway_url http://127.0.0.1:8000/v1 --model Qwen/Qwen2.5-VL-7B-Instruct \
  --sa sex --ta 33 --image_direct --bs 1 --gpu 0

python -m release_benchmark.cli.zeroshot \
  --dataset waterbirds --method blip2 --bs 32 --gpu 0

python -m release_benchmark.cli.zeroshot \
  --dataset celeba --method clip --model vitb16 --sa sex --ta 33 --bs 32 --gpu 0

Sweep (Hyperparameter Search via W&B)

python -m release_benchmark.cli.sweep \
  --dataset celeba --method erm --sa sex --ta 33 --gpu auto

Implemented Methods

Category	Methods
Baseline	ERM, RandAug, Resample
Fairness	GroupDRO, LAFTR, DFR, GapReg, MCDP, FairMixup
Data-centric	FIS, BM (Bias Mimicking), FSCL+
CLIP-based	CLIP, CLIP-SFID, CLIP-Fairer, BLIP2
Post-hoc	OxonFair, Decoupled
LVLMs	Qwen2.5-VL, LLaMA 3.2/4, Gemma 3, LLaVA-NeXT

Project Structure

release/
├── data/
│   ├── manifests/        # Dataset manifest examples
│   ├── celeba/           # CelebA preprocessing
│   ├── utk/              # UTKFace preprocessing
│   ├── fairface/         # FairFace preprocessing
│   ├── facet/            # FACET preprocessing
│   ├── waterbirds/       # Waterbirds preprocessing
│   ├── ham/              # HAM10000 preprocessing
│   └── fitz/             # Fitzpatrick17k preprocessing
├── docs/                 # Method audit and sweep guide
├── scripts/              # Shell scripts for common workflows
├── src/release_benchmark/
│   ├── cli/              # Entry points: train, zeroshot, sweep
│   ├── configs/          # Sweep and dataset YAML templates (packaged with the lib)
│   ├── datasets/         # Dataset loaders (FairDataset base class)
│   ├── methods/
│   │   ├── cv/           # Vision methods (ERM, GroupDRO, LAFTR, ...)
│   │   ├── vlm/          # CLIP/BLIP-2; LVLMs (HF + gateway); see scripts/launch_llm_gateway_*.sh
│   │   └── registry.py   # Method name -> class resolution
│   ├── metrics/          # Fairness and performance metrics
│   └── utils/            # Seeds, logging, helpers
└── tests/                # Smoke and registry tests

Environment Variables

Variable	Description
`WANDB_API_KEY`	Weights & Biases API key for sweep logging
`HF_TOKEN`	Hugging Face token (required for gated models)

Citation

@inproceedings{
tan2026benchmarking,
title={Benchmarking Bias Mitigation Toward Fairness Without Harm from Vision to {LVLM}s},
author={Xuwei Tan and Ziyu Hu and Xueru Zhang},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=GLPmZhhCAE}
}

License

This project is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
scripts		scripts
src/release_benchmark		src/release_benchmark
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NH-Fair: Benchmarking Bias Mitigation Toward Fairness Without Harm

Installation

Key Features

Dataset Setup

Usage

Supervised Training

Zero-shot LLM Evaluation

Sweep (Hyperparameter Search via W&B)

Implemented Methods

Project Structure

Environment Variables

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NH-Fair: Benchmarking Bias Mitigation Toward Fairness Without Harm

Installation

Key Features

Dataset Setup

Usage

Supervised Training

Zero-shot LLM Evaluation

Sweep (Hyperparameter Search via W&B)

Implemented Methods

Project Structure

Environment Variables

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages