Skip to content

gaobb/UniADet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 

Repository files navigation

UniADet

Official PyTorch Implementation of One Language-Free Foundation Model Is Enough for Universal Vision Anomaly Detection, 2026.

📢 News

  • [2026-02-02]: ✅ UniADet has been independently reproduced by a third-party team, confirming our state-of-the-art results.
  • [2026-01-13]: 🚀 Initialized the official UniADet code repository.
  • [2026-01-09]: 📄 The UniADet paper is now available on arXiv.

Contents

📝 Introduction

UniADet is a language-free universal (Zero- and Few-shot) visual anomaly detection framework. It achieves superior performance outperforming state-of-the-art language-dependent Zero- and Few-shot AD models while also exhibiting remarkable simplicity and efficiency.

  • We rethink vision-language ADs and find that language prompts and encoders are unnecessary. This insight leads to an embarrassingly simple(language-free+dual-decoupling), efficient(0.015M or 0.02M learnable params), effective (SOTA zero-/few-shot) and general (support VLMs and Pure VMs) framework for universal anomaly detection.
  • We fully decouple global anomaly classification and local anomaly segmentation across multi-scale hierarchical features, i.e., learning layer-wise cls/seg weights, effectively mitigating the learning conflict between different feature manifolds and substantially improving AD performance.
  • Comprehensive experiments conclusively validate that our approach achieves state-of-the-art zero-shot and few-shot performance. Notably, our few-shot UniADet is the first to outperform full-shot state-of-the-art.

💎 UniADet Framework

📊 Language-Free UniADet with Different Foundation Models

Backbone Shot MVTec-AD VisA Real-IAD
CLIP (ViT-L/14@336px) 0 92.4 / 42.8 88.0 / 28.0 78.6 / 33.6
CLIP (ViT-L/14@336px) 4 97.7 / 58.8 93.3 / 36.7 84.3 / 37.2
DINOv2 (Register ViT-L/14) 0 93.5 / 50.9 91.3 / 32.7 82.5 / 43.1
DINOv2 (Register ViT-L/14) 4 98.7 / 65.4 96.9 / 45.2 90.3 / 48.5
DINOv3 (ViT-L/16) 0 94.0 / 52.7 91.9 / 32.5 81.2 / 41.6
DINOv3 (ViT-L/16) 4 98.2 / 69.0 97.1 / 45.5 88.5 / 49.8

Note: The performance is mesured by Image-AUROC / Pixel-AUPR, and the same below.

🏆 Comparison with State-of-the-Arts

Methods Venue Language-Free Shots MVTec VisA Real-IAD
UniADet $^‡$ ours $\color{red}{\checkmark}$ 0 93.5 / 50.9 91.3 / 32.7 82.5 / 43.1
WinCLIP CVPR 23 $\color{green}{✘}$ 0 90.4 / 18.2 75.5 / 5.4 67.0 / 3.3
APRIL-GAN CVPRW 23 $\color{green}{✘}$ 0 86.1 / - 78.0 / - -
AnomalyCLIP ICLR 24 $\color{green}{✘}$ 0 91.6 / 34.5 82.0 / 21.3 69.5 / 26.7
AdaCLIP ECCV 24 $\color{green}{✘}$ 0 90.7 / 39.1 81.7 / 31.0 73.3 / 30.5
VCPCLIP ECCV 24 $\color{green}{✘}$ 0 92.1 / 49.4 83.8 / 30.1 -
Bayes-PFL CVPR 25 $\color{green}{✘}$ 0 92.5 / 48.3 87.0 / 29.8 70.0 / 27.6
AA-CLIP CVPR 25 $\color{green}{✘}$ 0 90.5 / - 84.6 / - -
FE-CLIP ICCV 25 $\color{green}{✘}$ 0 91.9 / - 84.6 / - -
FAPrompt ICCV 25 $\color{green}{✘}$ 0 91.9 / - 84.6 / - -
RareCLIP ICCV 25 $\color{green}{✘}$ 0 91.5 / 46.1 86.1 / 27.0 -
AdaptCLIP AAAI 26 $\color{green}{✘}$ 0 93.5 / 38.3 84.8 / 26.1 74.2 / 28.2
Methods Venue Language-Free Shots MVTec VisA Real-IAD
UniADet $^‡$ ours $\color{red}{\checkmark}$ 1 97.6 / 63.1 95.2 / 42.1 88.7 / 48.4
UniADet $^‡$ ours $\color{red}{\checkmark}$ 2 98.0 / 64.1 96.1 / 44.2 89.0 / 46.7
UniADet $^‡$ ours $\color{red}{\checkmark}$ 4 98.7 / 65.4 96.9 / 45.2 90.3 / 48.5
MetaUAS NeurIPS 24 $\color{red}{\checkmark}$ 1 90.7 / 59.3 81.2 / 42.7 80.0 / 36.6
APRIL-GAN CVPRW 23 $\color{green}{✘}$ 4 92.8 / 54.5 92.6 / 32.2 -
PromptAD CVPR 24 $\color{green}{✘}$ 4 96.6 / 52.9 89.1 / 31.5 -
UniVAD CVPR 25 $\color{green}{✘}$ 1 97.8 / 55.6 93.5 / 42.8 85.1 / 37.6
AdaptCLIP AAAI 26 $\color{green}{✘}$ 1 94.5 / 53.7 90.5 / 38.9 81.8 / 36.6
AdaptCLIP AAAI 26 $\color{green}{✘}$ 2 95.7 / 55.1 92.2 / 40.7 82.9 / 37.8
AdaptCLIP AAAI 26 $\color{green}{✘}$ 4 96.6 / 57.2 93.1 / 41.8 83.9 / 39.1
Methods Venue Language-Free Setting MVTec VisA Real-IAD
Dinomaly CVPR 25 $\color{red}{\checkmark}$ multi-class (full train set) 99.6 / 69.3 98.7 / 53.2 89.3 / 42.8
UniAD NeurIPS 24 $\color{red}{\checkmark}$ multi-class (full train set) 96.5 / 44.7 90.8 / 33.6 83.0 / 21.1
MuSc ICLR 24 $\color{red}{\checkmark}$ online (full test set) 97.8 / 62.7 92.8 / 45.1 -

Important

Note1: If you find that any existing zero-shot/few-shot AD methods are missing from the table above, please feel free to open an issue so we can add them.

Note2: Dinomaly and UniAD are multi-class unsupervised AD algorithms, and they require dataset-specific training with full normal images.

Note3: MuSc is an Online algorithm that requires access to statistics from the entire test dataset to evaluate the current image. Therefore, it is not a strictly zero-shot AD.

🚀 Complexity and Efficiency Comparisons

Shots Methods Models Input Size # Params (M) Inf. Time (ms)
0 AdaCLIP CLIP ViT-L/14@336px 518×518 428.8 + 1.1e+1 107.4
0 AnomalyCLIP CLIP ViT-L/14@336px 518×518 427.9 + 5.6e+0 70.7
0 Bayes-PFL CLIP ViT-L/14@336px 518×518 427.9 + 2.7e+1 154.9
0 AdaptCLIP-Zero CLIP ViT-L/14@336px 518×518 427.9 + 6.0e-1 57.5
0 UniADet $^†$ CLIP ViT-L/14@336px 518×518 342.9 + 1.5e-2 15.7
0 UniADet $^‡$ DINOv2 ViT-L/14 518×518 303.2 + 2.0e-2 41.9
1 InCtrl CLIP ViT-B-16+240 240×240 208.4 + 3.0e-1 59.0
1 AnomalyCLIP+ CLIP ViT-L/14@336px 518×518 427.9 + 5.6e+0 76.2
1 AdaptCLIP CLIP ViT-L/14@336px 518×518 342.9 + 1.8e+0 58.7
1 UniADet $^†$ CLIP ViT-L/14@336px 518×518 342.9 + 1.5e-2 22.4
1 UniADet $^‡$ DINOv2 ViT-L/14 518×518 303.2 + 2.0e-2 48.4

Note: The number of learnable parameters (1.5e-3 and 2.0e-3) is not correct for our UniADet $^†$ and UniADet $^‡$, respectively. The correct is 1.5e-2 and 2.0e-2 for UniADet $^†$ and UniADet $^‡$, respectively.

🔍 Ablation Studies

Ablation studies about different components.

No DCS DHF CAA Shot MVTec VisA
0 $\color{green}{✘}$ $\color{green}{✘}$ $\color{green}{✘}$ 0 85.4 / 36.4 77.9 / 26.1
1 $\color{red}{\checkmark}$ $\color{green}{✘}$ $\color{green}{✘}$ 0 91.8 / 38.3 85.9 / 27.2
2 $\color{red}{\checkmark}$ $\color{red}{\checkmark}$ $\color{green}{✘}$ 0 92.2 / 40.7 86.0 / 27.6
3 $\color{red}{\checkmark}$ $\color{red}{\checkmark}$ $\color{red}{\checkmark}$ 0 92.4 / 42.8 88.0 / 28.0
4 $\color{red}{\checkmark}$ $\color{red}{\checkmark}$ random 0 91.3 / 41.5 87.5 / 26.6
5 $\color{red}{\checkmark}$ $\color{red}{\checkmark}$ $\color{red}{\checkmark}$ 1 95.9 / 54.6 91.3 / 32.5

Note: The ablation studies are conducted by UniADet $^†$ (i.e., using CLIP ViT-L/14@336px).

⚖️ Comparison with Language-Dependent AnomalyCLIP

  • Essential Differences
Feature AnomalyCLIP UniADet (Ours)
🧠 Paradigm 🔴 Language-Dependent ✅ Language-Free
⚡ Task Decoupling Shared Cls/Seg Weight $W$ ✅ Decoupled Cls/Seg Weights $W_{cls}$, $W_{seg}$
🏗️ Hierarchical Decoupling Single / Last Layer ✅ Layer-Wise Cls/Seg Weights $W_{cls}^l$, $W_{seg}^l$
🤖 Backbones CLIP Only ✅ CLIP, DINOv2-R, DINOv3
📉 Params ⚠️ 130M Text-Encoder + 5.6M (Heavy) 🚀 0.015M or 0.020M(Efficient)

Important

Note1: The official AnomalyCLIP team has clarified that their initial multi-layer claim was a code bug; the effective implementation relies only on the last layer.

Note2: A naive extension of AnomalyCLIP to multiple blocks leads to significant performance degradation.

  • Commonality
Feature AnomalyCLIP & UniADet
Core Objective Learning Normal / Anomaly Weights
Loss Function CE for Cls and Focal + Dice for Seg
Training Data Auxiliary Data (e.g., VisA or MVTec)
  • Performace on VisA
Shots Backbones AnomalyCLIP UniADet (Ours)
0-Shot CLIP 82.0 / 21.3 88.0 / 28.0
0-Shot DINOv2-R Not supported 91.3 / 32.7
0-Shot DINOv3 Not supported 91.9 / 32.5
4-Shot CLIP Not supported 93.3 / 36.7
4-Shot DINOv2-R Not supported 96.9 / 45.2
4-Shot DINOv3 Not supported 97.1 / 45.5

📌 ToDo List

📖 Citation

If you find this work useful in your research, please consider citing:

@inproceedings{uniadet,
  title={One Language-Free Foundation Model Is Enough for Universal Vision Anomaly Detection},
  author={Gao, Bin-Bin and Wang, Chengjie},
  booktitle={arXiv:2601.05552},
  year={2026}
}

Star History

Star History Chart

Releases

No releases published

Packages