Skip to content

on-device benchmark harness (v1.1)#1

Merged
2dubu merged 3 commits intomainfrom
feature/v1.1-benchmark
Apr 28, 2026
Merged

on-device benchmark harness (v1.1)#1
2dubu merged 3 commits intomainfrom
feature/v1.1-benchmark

Conversation

@2dubu
Copy link
Copy Markdown
Owner

@2dubu 2dubu commented Apr 28, 2026

Summary

  • Add an on-device benchmark harness in Examples/PaletteKitDemo so PaletteKit performance claims can be measured per-device, per-content. Pick a real photo or a synthesized fixture, vary size /
    quantizer / downsample, and export per-stage timings as CSV.
  • Apply bench-first methodology — the harness gates v1.x decisions on CPU/GPU work. The first round on iPhone 15 Pro confirmed that single-shot extraction is fast enough at default options that
    progressive extraction is not worth shipping; v1.1 ships the harness itself instead.
  • Trim README API surface to a method skeleton + DocC link to reduce drift; add a "Benchmark on your device" section.

What's included

Bench harness (Examples/PaletteKitDemo/PaletteKitDemo/Bench/)

  • BenchModels — case definitions (size × quantizer × downsample), per-sample structures, summary aggregates with mean per-stage timings, marketing-name mapping for hardware identifiers.
  • BenchFixture — deterministic synthesized image (gradient + 5 colored blobs + per-pixel noise), plus resizeToSquare(_:side:) for real-photo input center-cropped to each grid size.
  • BenchRunner (@MainActor ObservableObject) — orchestrates warmup + measured runs, captures ExtractionTimings per sample, computes p50/p95/min/max + per-stage means, surfaces failure
    counts.
  • BenchView — Source picker (Synthesized / Photo) with PhotosPicker, Configuration card with per-row InfoButton popovers, Run/Reset/Cancel state machine, Grid-based summary table,
    stacked-bar chart, raw samples disclosure.
  • BenchChart (Swift Charts) — horizontal stacked bars showing decode / sample / quantize means per case.
  • BenchExport — Raw and Summary CSV with headers including PaletteKit version, device identifier, marketing name, source descriptor (synthesized or photo WxH), and optional run note.
  • BenchInfo — popover content for each Configuration field; reusable InfoButton component (compact-adapted on iPhone, fixed-width with text wrap).

Repo

  • README "## API at a glance" table (~30 lines) replaced by a method skeleton and a DocC reference link. ExtractionOptions defaults now live in Options.md as the single source of truth.
  • README "## Benchmark on your device" section introduces the harness and the benchmark/ convention.
  • .gitignore adds benchmark/ so CSV exports stay local without leaking device-specific results into the repo.

Methodology notes

The harness exists because v1.0 shipped with provisional CPU/GPU thresholds (metalAutoThreshold = 500_000) and a quality.stride default that no one had measured under load. v1.1 ships the
measurement infrastructure first; v1.x feature work queues behind it.

Findings from the first round (iPhone 15 Pro, both synthesized and a 12 MP photo):

  • Default options (Downsample.automatic(maxPixels: 1_000_000)) flatten quantize cost to ~80–140 ms regardless of input size. Metal vs CPU differ by <5 ms (within noise).
  • Metal becomes meaningfully faster (~5–10%) only when Downsample.disabled AND sampled pixel count ≥ ~1M — essentially the 4096²+ raw case.
  • The synthesized fixture underestimates real-world latency by 30–60% at small sizes — real photos populate more histogram bins, so MMCQ's median-cut PQ runs longer.
  • Decode + sample become the dominant cost in raw mode at 4096²+ inputs (>60% of total at 8K). Metal does not accelerate those stages, capping its real-world ceiling.

Findings stay local (memory/); README / DESIGN_SPEC / threshold corrections accumulate for a single batch update around v1.2 instead of churning per-discovery.

Test plan

  • swift test — existing PaletteKit tests pass (no library changes in this PR).
  • make demo-app regenerates the Xcode project; iOS simulator build succeeds.
  • Bench screen renders; info popovers expand with text wrap; source picker switches Synthesized ↔ Photo; PhotosPicker loads CGImage and shows thumbnail with original dimensions.
  • On-device runs (iPhone 15 Pro / A17 Pro): three matrix configurations executed (auto-only, raw-only, photo+raw); CSV exports include all expected header lines and per-stage columns; chart
    renders bars correctly.
  • Reset returns the screen to first-entry state (configuration + samples + photo selection cleared).
  • CSV header reflects PaletteKit version, hardware identifier, marketing name, source descriptor, optional run note.

2dubu added 3 commits April 26, 2026 22:55
Seven additions on top of the initial bench harness:

- Per-row info ⓘ popovers explaining each Configuration field.
- Swift Charts stacked-bar showing decode/sample/quantize means.
- Photo source — pick from library, center-cropped to each grid size.
- '.auto' quantizer added so the gating decision itself is measured.
- Run note field, flowed into CSV header.
- CSV header gains palettekit_version, device_marketing, source.
- Summary CSV gains decode_mean / sample_mean / quantize_mean.

Plus polish: marketing-name device labels, tighter fonts, Grid-based
summary table, spinner progress, failure surfacing, Run/Reset/Cancel
state separation.
…mark/

Three small repo-level changes for the v1.1 release:

- README gains a 'Benchmark on your device' section so the new bench
  harness is discoverable.
- 'API at a glance' replaced by a concise method skeleton plus a
  link to the DocC reference. ExtractionOptions defaults move to
  Options.md (single source of truth, no README drift).
- benchmark/ is gitignored — the convention for storing CSV exports
  locally without leaking device-specific results into the repo.
@2dubu 2dubu self-assigned this Apr 28, 2026
@2dubu 2dubu merged commit 94e6d34 into main Apr 28, 2026
1 check passed
@2dubu 2dubu deleted the feature/v1.1-benchmark branch April 28, 2026 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant