Does Calibration Survive Quantization?

Post-hoc probability calibration transfer from FP32 to INT4 transformer models.

Can a calibrator trained on full-precision model outputs be safely deployed on a quantized model without retraining? This project provides a systematic study on FinBERT, along with a novel fuzzy-gated log-space affine calibration method.

Key Findings

Finding	Detail
Calibration transfer is robust	All 4 methods transfer FP32 to INT4 with <=5% mean ECE change
Quantization improves uncalibrated ECE	INT4 reduces ECE by 9.4% (0.254 to 0.230)
Best calibration (our method)	ECE 0.036 +/- 0.002 (FP32), 0.037 +/- 0.002 (INT4)
Accuracy improvement	57.6% to 66.6% via class-prior bias correction

Results

Method	ECE (FP32)	ECE (INT4)	Acc (INT4)	Delta %
Uncalibrated	0.254	0.230	57.2%	-9.4%
Temperature Scaling	0.113	0.118	57.2%	+4.5%
Dirichlet	0.068	0.048	64.5%	-28.9%
Plain Affine	0.193	0.161	59.6%	-16.8%
Fuzzy-Gated (Ours)	0.036 +/- 0.002	0.037 +/- 0.002	66.5%	+5.0%

All accuracy improvements are statistically significant (McNemar's test, p < 0.001), evaluated across 3 seeds.

Method

Fuzzy-Gated Log-Space Affine Calibration

Standard post-hoc calibration applies a single global transform. This method applies confidence-conditioned calibration: different corrections for different confidence regions, gated by learnable Gaussian membership functions.

Raw Probability --> Fuzzy Memberships --> Per-Region Calibration --> Weighted Sum
                     (5 regions)          (log-space affine)         (by membership)

Each of the 5 confidence regions learns its own scale and bias in log-space (60 total parameters for 3 classes), enabling strong corrections where the model is overconfident and gentle corrections where it is already well-calibrated.

Training objective: NLL + Soft ECE + Class-wise Soft ECE + Brier Score (all differentiable).

Quick Start

# Setup
conda env create -f environment.yml
conda activate fuzzy-calibration

# Extract FP32 + INT4 probabilities
python scripts/extract_probs.py --model ProsusAI/finbert --dataset lwrf42/financial-sentiment-dataset --seed 42 --quant int4

# Train the fuzzy-gated calibrator
python scripts/train_calibrator.py --model ProsusAI/finbert --dataset lwrf42/financial-sentiment-dataset --seed 42 --max-epochs 200

# Evaluate FP32 -> INT4 calibration transfer
python scripts/run_calibration.py --model ProsusAI/finbert --int4 --seed 42 \
    --load-fp32-probs ./cache/probs/seed42.npz --load-quant-probs ./cache/probs/seed42_int4.npz

Project Structure

.
├── src/
│   ├── config.py                  # Configuration
│   ├── data_loader.py             # Dataset loading + val/test split
│   ├── transformer_base.py        # Model wrapper (MPS/CUDA auto-detect)
│   ├── fuzzy_calibrator.py        # Fuzzy-gated calibrator
│   ├── fuzzy_membership.py        # Learnable Gaussian membership functions
│   ├── label_wise_calibrator.py   # Label-wise calibration
│   ├── evaluator.py               # ECE, Brier, accuracy metrics
│   └── requirements.txt           # Python dependencies
├── scripts/
│   ├── extract_probs.py           # FP32 + INT4 probability extraction
│   ├── train_calibrator.py        # Train fuzzy-gated calibrator
│   ├── run_calibration.py         # Calibration transfer evaluation
│   └── generate_figures.py        # Figure generation
├── environment.yml                # Conda environment
└── .gitignore

Experimental Setup

Model: FinBERT (110M params), INT4 via BitsAndBytes (418 MB -> 132 MB, 68% compression)
Dataset: LWRF Financial Sentiment (95,220 samples, 3 classes)
Metrics: ECE (15 bins), Brier score, accuracy
Seeds: 42, 123, 456
Significance: McNemar's test, paired bootstrap

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Does Calibration Survive Quantization?

Key Findings

Results

Method

Fuzzy-Gated Log-Space Affine Calibration

Quick Start

Project Structure

Experimental Setup

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Does Calibration Survive Quantization?

Key Findings

Results

Method

Fuzzy-Gated Log-Space Affine Calibration

Quick Start

Project Structure

Experimental Setup

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages