Skip to content

fix(assertScreenshot): report match percent that aligns with the threshold#3261

Merged
proksh merged 4 commits intomainfrom
fix/assert-screenshot-current-percent-metric
May 6, 2026
Merged

fix(assertScreenshot): report match percent that aligns with the threshold#3261
proksh merged 4 commits intomainfrom
fix/assert-screenshot-current-percent-metric

Conversation

@proksh
Copy link
Copy Markdown
Contributor

@proksh proksh commented May 5, 2026

Problem

assertScreenshot can fail with a "current" value that looks like it should have passed:

Comparison error: Assert screenshot matches X (threshold: 95.0%) - threshold not met, current: 96.46815%

96.46815% > 95%, yet the assertion fails. This is the symptom seen in #3259 and reported by Doximity.

Root cause

The MISMATCH branch in Orchestra#assertScreenshotCommand printed 100 - ImageComparisonResult.differencePercent. Inside com.github.romankh3:image-comparison, two unrelated metrics are at play:

Used for Source What it measures
MATCH/MISMATCH gate ImageComparison.isAllowedPercentOfDifferentPixels (private) count of pixels exceeding pixelToleranceLevel
result.differencePercent ImageComparisonUtil.getDifferencePercent (public) average per-channel RGB intensity delta over the image

These can diverge in opposite directions: a small region with a stark change drives the count up while leaving the average tiny, so the threshold check fails while the message reports a high "match %".

The library does not expose its count-based metric, and 4.4.0 (March 2021) is the latest release.

Fix

Introduce ScreenshotMatch as the single source of truth for the comparison:

  • ScreenshotMatch.matchPercentage(expected, actual, pixelToleranceLevel) walks pixels using the same Euclidean distance rule the library uses internally, returning the count-based percentage.
  • ScreenshotMatch.compare(expected, actual, threshold, diffFile) owns both the pass/fail decision and the reported percent — the threshold gate and the user-visible "current %" are now the same number by construction. The library is invoked only as a diff-PNG renderer on Mismatch.
  • Orchestra.kt becomes a when over the typed ScreenshotMatch.Result. Drops the ImageComparison / ImageComparisonState imports.

Behaviour change

  • Passing assertions: unchanged. The library's gate already used the count-based metric; we replicate it.
  • Failing assertions: the printed current % is now consistent with thresholdPercentage. A failure that previously read current: 96.46815% against threshold: 95.0% now reads a number actually below 95% — matching what users expect.

Tests

maestro-orchestra/src/test/.../AssertScreenshotMatchTest.kt adds 5 cases:

  1. compare returns Match when match percent meets threshold — happy path on real screenshots.
  2. compare returns Mismatch when match percent falls below threshold and writes diff file — failure path, asserts the diff PNG is produced.
  3. compare returns SizeMismatch for differently sized images — typed-result coverage.
  4. match percentage agrees with library MATCH/MISMATCH decision at the boundary — pins our pixel walk to the library's gate at three pixelToleranceLevel values (0.0, 0.05, 0.1). Catches drift if the library's pixel-difference math ever changes.
  5. match percentage exposes count-based metric distinct from library's average color delta — locks in the metric distinction the bug came from.

The two expected.png / actual.png fixtures (~200KB each) are real-world iPhone screenshots that differ only by content in the status bar — happy to swap for synthetic images if reviewers prefer.

Out of scope

This PR is the fix only. The repro flow + demo-app screen are owned by #3259 and should land separately so the new flow can verify the corrected message end-to-end.

…shold

The MISMATCH error message printed `100 - ImageComparisonResult.differencePercent`,
but `differencePercent` from the underlying image-comparison library is the
*average per-channel RGB intensity delta* across the whole image, while the
MATCH/MISMATCH gate is decided by the *count of pixels exceeding
pixelToleranceLevel*. The two are unrelated metrics, so users could see a
"current" value comfortably above their threshold while the assertion still
failed (e.g. `current: 96.46815%` against `threshold: 95.0%`).

Move the comparison behind a single `ScreenshotMatch.compare(...)` that owns
both the pass/fail decision and the reported percent. The library is now used
only as a diff-image renderer when the assertion fails. A boundary-equivalence
test pins our pixel-walk to the library's MATCH/MISMATCH semantics across
multiple `pixelToleranceLevel` values, so any future drift fails CI.

No behaviour change for passing assertions. Failing assertions now report the
same metric the threshold check was already enforcing.
@proksh proksh marked this pull request as draft May 5, 2026 13:48
Copy link
Copy Markdown
Contributor

@simon-gilmurray simon-gilmurray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@proksh proksh marked this pull request as ready for review May 6, 2026 10:51
@proksh proksh merged commit 292a172 into main May 6, 2026
10 checks passed
@proksh proksh deleted the fix/assert-screenshot-current-percent-metric branch May 6, 2026 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants