fix(assertScreenshot): report match percent that aligns with the threshold by proksh · Pull Request #3261 · mobile-dev-inc/Maestro

proksh · 2026-05-05T13:35:45Z

Problem

assertScreenshot can fail with a "current" value that looks like it should have passed:

Comparison error: Assert screenshot matches X (threshold: 95.0%) - threshold not met, current: 96.46815%

96.46815% > 95%, yet the assertion fails. This is the symptom seen in #3259 and reported by Doximity.

Root cause

The MISMATCH branch in Orchestra#assertScreenshotCommand printed 100 - ImageComparisonResult.differencePercent. Inside com.github.romankh3:image-comparison, two unrelated metrics are at play:

Used for	Source	What it measures
MATCH/MISMATCH gate	`ImageComparison.isAllowedPercentOfDifferentPixels` (private)	count of pixels exceeding `pixelToleranceLevel`
`result.differencePercent`	`ImageComparisonUtil.getDifferencePercent` (public)	average per-channel RGB intensity delta over the image

These can diverge in opposite directions: a small region with a stark change drives the count up while leaving the average tiny, so the threshold check fails while the message reports a high "match %".

The library does not expose its count-based metric, and 4.4.0 (March 2021) is the latest release.

Fix

Introduce ScreenshotMatch as the single source of truth for the comparison:

ScreenshotMatch.matchPercentage(expected, actual, pixelToleranceLevel) walks pixels using the same Euclidean distance rule the library uses internally, returning the count-based percentage.
ScreenshotMatch.compare(expected, actual, threshold, diffFile) owns both the pass/fail decision and the reported percent — the threshold gate and the user-visible "current %" are now the same number by construction. The library is invoked only as a diff-PNG renderer on Mismatch.
Orchestra.kt becomes a when over the typed ScreenshotMatch.Result. Drops the ImageComparison / ImageComparisonState imports.

Behaviour change

Passing assertions: unchanged. The library's gate already used the count-based metric; we replicate it.
Failing assertions: the printed current % is now consistent with thresholdPercentage. A failure that previously read current: 96.46815% against threshold: 95.0% now reads a number actually below 95% — matching what users expect.

Tests

maestro-orchestra/src/test/.../AssertScreenshotMatchTest.kt adds 5 cases:

compare returns Match when match percent meets threshold — happy path on real screenshots.
compare returns Mismatch when match percent falls below threshold and writes diff file — failure path, asserts the diff PNG is produced.
compare returns SizeMismatch for differently sized images — typed-result coverage.
match percentage agrees with library MATCH/MISMATCH decision at the boundary — pins our pixel walk to the library's gate at three pixelToleranceLevel values (0.0, 0.05, 0.1). Catches drift if the library's pixel-difference math ever changes.
match percentage exposes count-based metric distinct from library's average color delta — locks in the metric distinction the bug came from.

The two expected.png / actual.png fixtures (~200KB each) are real-world iPhone screenshots that differ only by content in the status bar — happy to swap for synthetic images if reviewers prefer.

Out of scope

This PR is the fix only. The repro flow + demo-app screen are owned by #3259 and should land separately so the new flow can verify the corrected message end-to-end.

…shold The MISMATCH error message printed `100 - ImageComparisonResult.differencePercent`, but `differencePercent` from the underlying image-comparison library is the *average per-channel RGB intensity delta* across the whole image, while the MATCH/MISMATCH gate is decided by the *count of pixels exceeding pixelToleranceLevel*. The two are unrelated metrics, so users could see a "current" value comfortably above their threshold while the assertion still failed (e.g. `current: 96.46815%` against `threshold: 95.0%`). Move the comparison behind a single `ScreenshotMatch.compare(...)` that owns both the pass/fail decision and the reported percent. The library is now used only as a diff-image renderer when the assertion fails. A boundary-equivalence test pins our pixel-walk to the library's MATCH/MISMATCH semantics across multiple `pixelToleranceLevel` values, so any future drift fails CI. No behaviour change for passing assertions. Failing assertions now report the same metric the threshold check was already enforcing.

simon-gilmurray

🎉

….com:mobile-dev-inc/Maestro into fix/assert-screenshot-current-percent-metric

proksh marked this pull request as draft May 5, 2026 13:48

simon-gilmurray approved these changes May 5, 2026

View reviewed changes

proksh added 3 commits May 6, 2026 15:53

Merge branch 'main' into fix/assert-screenshot-current-percent-metric

015f9ed

added test for failing case

acc66f2

Merge branch 'fix/assert-screenshot-current-percent-metric' of github…

019d081

….com:mobile-dev-inc/Maestro into fix/assert-screenshot-current-percent-metric

proksh marked this pull request as ready for review May 6, 2026 10:51

proksh merged commit 292a172 into main May 6, 2026
10 checks passed

proksh deleted the fix/assert-screenshot-current-percent-metric branch May 6, 2026 12:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(assertScreenshot): report match percent that aligns with the threshold#3261

fix(assertScreenshot): report match percent that aligns with the threshold#3261
proksh merged 4 commits intomainfrom
fix/assert-screenshot-current-percent-metric

proksh commented May 5, 2026

Uh oh!

simon-gilmurray left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

proksh commented May 5, 2026

Problem

Root cause

Fix

Behaviour change

Tests

Out of scope

Uh oh!

simon-gilmurray left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants