CLIP bias analysis: hand-crafted features beat CLIP on non-CLIP generators by wfproc · Pull Request #53 · darkshapes/negate

wfproc · 2026-03-25T18:09:39Z

Summary

Investigates whether CLIP embeddings detect genuine AI artifacts or just recognize their own latent fingerprint in images from CLIP-based generators.

Finding: CLIP detection is biased. On the Defactify MS-COCOAI dataset (96K images, 5 labeled generators, semantically matched captions):

Generator	Uses CLIP?	Hand-crafted (64)	CLIP (512)	CLIP Advantage
SD 2.1	Yes	86.5%	96.1%	+9.6pp
SDXL	Yes	93.5%	99.0%	+5.5pp
SD 3	Yes	85.4%	97.5%	+12.1pp
Midjourney v6	?	88.5%	99.5%	+11.0pp
DALL-E 3	No	98.7%	98.2%	-0.5pp

CLIP's advantage disappears entirely on DALL-E 3 (which uses T5, not CLIP). Hand-crafted features (artwork + style, 64 total) actually beat CLIP on non-CLIP generators.

As generators move away from CLIP-based architectures, CLIP detection will become less reliable. The hand-crafted features are the more robust long-term signal.

What's in this PR

tests/test_clip_bias_defactify.py — per-generator CLIP vs hand-crafted benchmark
tests/generate_final_report.py — generates consolidated PDF
results/negate_research_report.pdf — single 5-page report replacing all prior PDFs
Removes old per-experiment PDFs (superseded by consolidated report)

How to reproduce

uv run python tests/test_clip_bias_defactify.py
uv run python tests/generate_final_report.py

See results/EXPERIMENTS.md for full write-up with code links.

Confirms that CLIP-based detection is biased toward generators that use CLIP internally. Tested on Defactify MS-COCOAI dataset (96K images, 5 labeled generators, semantically matched captions): Generator Uses CLIP? Hand-crafted CLIP Delta SD 2.1 YES 86.5% 96.1% +9.6pp SDXL YES 93.5% 99.0% +5.5pp SD 3 YES 85.4% 97.5% +12.1pp Midjourney v6 Unknown 88.5% 99.5% +11.0pp DALL-E 3 NO 98.7% 98.2% -0.5pp CLIP advantage on CLIP generators: +9.1pp average CLIP advantage on non-CLIP generators: -0.5pp (hand-crafted wins) Replaces per-experiment PDFs with single consolidated research report (negate_research_report.pdf) covering all experiments, scaling analysis, CLIP bias findings, and recommended next steps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLIP bias analysis: hand-crafted features beat CLIP on non-CLIP generators#53

CLIP bias analysis: hand-crafted features beat CLIP on non-CLIP generators#53
wfproc wants to merge 1 commit intodarkshapes:mainfrom
wfproc:feature/artwork-detection

wfproc commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wfproc commented Mar 25, 2026

Summary

What's in this PR

How to reproduce

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant