fix(assertScreenshot): report match percent that aligns with the threshold#3261
Merged
fix(assertScreenshot): report match percent that aligns with the threshold#3261
Conversation
…shold The MISMATCH error message printed `100 - ImageComparisonResult.differencePercent`, but `differencePercent` from the underlying image-comparison library is the *average per-channel RGB intensity delta* across the whole image, while the MATCH/MISMATCH gate is decided by the *count of pixels exceeding pixelToleranceLevel*. The two are unrelated metrics, so users could see a "current" value comfortably above their threshold while the assertion still failed (e.g. `current: 96.46815%` against `threshold: 95.0%`). Move the comparison behind a single `ScreenshotMatch.compare(...)` that owns both the pass/fail decision and the reported percent. The library is now used only as a diff-image renderer when the assertion fails. A boundary-equivalence test pins our pixel-walk to the library's MATCH/MISMATCH semantics across multiple `pixelToleranceLevel` values, so any future drift fails CI. No behaviour change for passing assertions. Failing assertions now report the same metric the threshold check was already enforcing.
….com:mobile-dev-inc/Maestro into fix/assert-screenshot-current-percent-metric
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
assertScreenshotcan fail with a "current" value that looks like it should have passed:96.46815% > 95%, yet the assertion fails. This is the symptom seen in #3259 and reported by Doximity.Root cause
The MISMATCH branch in
Orchestra#assertScreenshotCommandprinted100 - ImageComparisonResult.differencePercent. Insidecom.github.romankh3:image-comparison, two unrelated metrics are at play:ImageComparison.isAllowedPercentOfDifferentPixels(private)pixelToleranceLevelresult.differencePercentImageComparisonUtil.getDifferencePercent(public)These can diverge in opposite directions: a small region with a stark change drives the count up while leaving the average tiny, so the threshold check fails while the message reports a high "match %".
The library does not expose its count-based metric, and 4.4.0 (March 2021) is the latest release.
Fix
Introduce
ScreenshotMatchas the single source of truth for the comparison:ScreenshotMatch.matchPercentage(expected, actual, pixelToleranceLevel)walks pixels using the same Euclidean distance rule the library uses internally, returning the count-based percentage.ScreenshotMatch.compare(expected, actual, threshold, diffFile)owns both the pass/fail decision and the reported percent — the threshold gate and the user-visible "current %" are now the same number by construction. The library is invoked only as a diff-PNG renderer onMismatch.Orchestra.ktbecomes awhenover the typedScreenshotMatch.Result. Drops theImageComparison/ImageComparisonStateimports.Behaviour change
current %is now consistent withthresholdPercentage. A failure that previously readcurrent: 96.46815%againstthreshold: 95.0%now reads a number actually below 95% — matching what users expect.Tests
maestro-orchestra/src/test/.../AssertScreenshotMatchTest.ktadds 5 cases:compare returns Match when match percent meets threshold— happy path on real screenshots.compare returns Mismatch when match percent falls below threshold and writes diff file— failure path, asserts the diff PNG is produced.compare returns SizeMismatch for differently sized images— typed-result coverage.match percentage agrees with library MATCH/MISMATCH decision at the boundary— pins our pixel walk to the library's gate at threepixelToleranceLevelvalues (0.0, 0.05, 0.1). Catches drift if the library's pixel-difference math ever changes.match percentage exposes count-based metric distinct from library's average color delta— locks in the metric distinction the bug came from.The two
expected.png/actual.pngfixtures (~200KB each) are real-world iPhone screenshots that differ only by content in the status bar — happy to swap for synthetic images if reviewers prefer.Out of scope
This PR is the fix only. The repro flow + demo-app screen are owned by #3259 and should land separately so the new flow can verify the corrected message end-to-end.