Add style features, CLIP comparison, and experiment analysis by wfproc · Pull Request #52 · darkshapes/negate

wfproc · 2026-03-23T09:16:39Z

Summary

Adds style-specific feature extractor (feature_style.py) — 15 features targeting brush strokes, color palettes, composition, and micro-texture
Adds comprehensive experiment comparison testing 5 feature approaches on the Hemg AI-Art vs Real-Art dataset (4000 images, 5-fold CV)
Adds reproducible benchmark scripts anyone can run with uv run python tests/test_experiments.py

Results

Approach	Features	Accuracy	AUC
Artwork (Li & Stamp + FFT)	49	79.4%	0.886
Style (stroke/palette/comp)	15	78.8%	0.883
Artwork + Style	64	83.5%	0.923
CLIP ViT-B/32	512	89.3%	0.963
All combined	576	90.0%	0.966

Key finding: CLIP embeddings outperform all hand-crafted features by +10pp on fair art-vs-art data. See results/EXPERIMENTS.md for full analysis, code pointers, limitations, and recommendations.

Test plan

uv run pytest -v passes
uv run python tests/test_experiments.py reproduces the results table above

New feature_style.py extracts 15 art-specific features (stroke patterns, color palette, composition, micro-texture) for AI artwork detection. EXPERIMENTS.md documents all feature experiments run on the Hemg AI-Art vs Real-Art dataset (4000 images, 5-fold CV), with results: Artwork features (49): 79.4% acc, 0.886 AUC Style features (15): 78.8% acc, 0.883 AUC Art + Style combined (64): 83.5% acc, 0.923 AUC CLIP ViT-B/32 (512): 89.3% acc, 0.963 AUC All combined (576): 90.0% acc, 0.966 AUC Key finding: CLIP embeddings outperform all hand-crafted features by +10pp. Combining hand-crafted features with CLIP adds only +0.7pp. See results/EXPERIMENTS.md for full analysis, code links, and limitations.

Reproducible test scripts for evaluating artwork detection features: - test_experiments.py: runs all 5 feature experiments (artwork, style, CLIP, combined) with 5-fold CV on Hemg dataset - test_fair_evaluation.py: tests on semantically-similar datasets to control for content confounds - test_scale_evaluation.py: measures accuracy vs training set size (400 to 4000 samples) - generate_fair_eval_pdf.py: generates timestamped PDF reports Run with: uv run python tests/test_experiments.py Includes result PDFs and JSON for reference. See EXPERIMENTS.md for interpretation.

wfproc added 2 commits March 23, 2026 10:12

exdysa merged commit 7a67d76 into darkshapes:main Mar 23, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add style features, CLIP comparison, and experiment analysis#52

Add style features, CLIP comparison, and experiment analysis#52
exdysa merged 2 commits intodarkshapes:mainfrom
wfproc:feature/artwork-detection

wfproc commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wfproc commented Mar 23, 2026

Summary

Results

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants