Skip to content

Add style features, CLIP comparison, and experiment analysis#52

Merged
exdysa merged 2 commits intodarkshapes:mainfrom
wfproc:feature/artwork-detection
Mar 23, 2026
Merged

Add style features, CLIP comparison, and experiment analysis#52
exdysa merged 2 commits intodarkshapes:mainfrom
wfproc:feature/artwork-detection

Conversation

@wfproc
Copy link
Copy Markdown
Contributor

@wfproc wfproc commented Mar 23, 2026

Summary

  • Adds style-specific feature extractor (feature_style.py) — 15 features targeting brush strokes, color palettes, composition, and micro-texture
  • Adds comprehensive experiment comparison testing 5 feature approaches on the Hemg AI-Art vs Real-Art dataset (4000 images, 5-fold CV)
  • Adds reproducible benchmark scripts anyone can run with uv run python tests/test_experiments.py

Results

Approach Features Accuracy AUC
Artwork (Li & Stamp + FFT) 49 79.4% 0.886
Style (stroke/palette/comp) 15 78.8% 0.883
Artwork + Style 64 83.5% 0.923
CLIP ViT-B/32 512 89.3% 0.963
All combined 576 90.0% 0.966

Key finding: CLIP embeddings outperform all hand-crafted features by +10pp on fair art-vs-art data. See results/EXPERIMENTS.md for full analysis, code pointers, limitations, and recommendations.

Test plan

  • uv run pytest -v passes
  • uv run python tests/test_experiments.py reproduces the results table above

wfproc added 2 commits March 23, 2026 10:12
New feature_style.py extracts 15 art-specific features (stroke patterns,
color palette, composition, micro-texture) for AI artwork detection.

EXPERIMENTS.md documents all feature experiments run on the Hemg AI-Art vs
Real-Art dataset (4000 images, 5-fold CV), with results:

  Artwork features (49):     79.4% acc, 0.886 AUC
  Style features (15):       78.8% acc, 0.883 AUC
  Art + Style combined (64): 83.5% acc, 0.923 AUC
  CLIP ViT-B/32 (512):       89.3% acc, 0.963 AUC
  All combined (576):        90.0% acc, 0.966 AUC

Key finding: CLIP embeddings outperform all hand-crafted features by +10pp.
Combining hand-crafted features with CLIP adds only +0.7pp.

See results/EXPERIMENTS.md for full analysis, code links, and limitations.
Reproducible test scripts for evaluating artwork detection features:

- test_experiments.py: runs all 5 feature experiments (artwork, style,
  CLIP, combined) with 5-fold CV on Hemg dataset
- test_fair_evaluation.py: tests on semantically-similar datasets to
  control for content confounds
- test_scale_evaluation.py: measures accuracy vs training set size
  (400 to 4000 samples)
- generate_fair_eval_pdf.py: generates timestamped PDF reports

Run with: uv run python tests/test_experiments.py

Includes result PDFs and JSON for reference. See EXPERIMENTS.md for
interpretation.
@exdysa exdysa merged commit 7a67d76 into darkshapes:main Mar 23, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants