test: ObservableQuery equivalence fuzzer (+ fix stale result on coalesced delete+recreate)#36
Merged
Merged
Conversation
…lesced delete+recreate Adds a randomized equivalence fuzzer that drives long random create/ update/delete walks against an observed query and asserts the value it emits incrementally always equals a fresh full recompute (Query.get()), the obviously-correct oracle. Covers sorted and unsorted queries across a spread of filter selectivities. The fuzzer found a real bug. When a document already in a query's result set is deleted and recreated within the same broadcast batch, the two events coalesce in the event store into a single `added` event. The added handler only *added* documents that pass the filter; it never *evicted* an already-cached document whose recreated value now fails the filter, so a stale snapshot lingered in the result. Effect: a query showing a stale row after a delete+recreate that should have removed it. Fixed by evicting the cached entry in that case, symmetric with the modified handler. Confirmed the fuzzer fails against the unfixed code. The fuzzer uses one long walk per test (resetting the global store between many short trials schedules broadcasts that race the next trial's observer) and a 1ms settle so the zero-duration broadcast timer and its microtask delivery complete before each comparison.
There was a problem hiding this comment.
Pull request overview
Adds a randomized equivalence test for ObservableQuery that compares incrementally-maintained results against fresh Query.get() recomputes, and fixes a stale-cache bug it surfaced: in the added/hydrated branch of _onBroadcast, an already-cached document whose recreated value no longer passes the filter was not evicted.
Changes:
- Evict cached doc in the
added/hydratedbranch when the new snapshot fails the filter, emitting aremovedchange snapshot symmetric to themodifiedbranch. - New fuzzer test driving long random create/update/delete walks against sorted and unsorted queries with varying selectivities.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| lib/observable_query.dart | Adds eviction + removed change snapshot when an added/hydrated event for an already-cached doc no longer passes the filter. |
| test/core/observable_query_equivalence_test.dart | New randomized equivalence fuzzer comparing emitted results against Query.get() across sorted/unsorted and multiple thresholds. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an equivalence fuzzer for
ObservableQueryand fixes a stale-result bug it found.ObservableQuerymaintains its result incrementally — on each broadcast it inspects only the changed documents and patches a cached result rather than recomputing. The property under test: the value it emits must always equal a fresh full recompute of the same query (Query.get()), which filters+sorts the whole collection and is the obviously-correct oracle.The fuzzer
Each test drives one long random walk of create/update/delete operations over a small id/value space (so documents repeatedly cross the filter boundary, exercising the added/removed/modified transitions in
_onBroadcast) and compares the latest emitted value against the oracle after every step. Covers sorted and unsorted queries across low/middle/high filter selectivity. A failing case replays from its seed + sorted flag + threshold.The bug it found
When a document already in a query's result is deleted and recreated within the same broadcast batch, the two events coalesce in the event store into a single
addedevent. Theaddedhandler only added documents passing the filter — it never evicted an already-cached document whose recreated value now fails the filter. So a stale snapshot lingered in the result:The
modifiedhandler already handled its analogous "was cached, now fails → evict" case; theaddedhandler didn't. Fixed by evicting the cached entry there too. Confirmed the fuzzer fails against the unfixed code (e.g. seed 1001, round 34) and passes with the fix.This is the second bug found by automated exploration in this hardening pass (after the
PathRefStoreref leak in #35).A note on test determinism
The fuzzer uses one long walk per test rather than many short trials: resetting the global store between trials schedules a broadcast that can race the next trial's observer (a test-isolation artifact, not a production issue). It also waits on a 1ms settle rather than a zero-duration one, so the broadcast's zero-duration timer and its microtask stream delivery are guaranteed to have completed before each comparison. With this shape it's deterministic — verified 0 failures across many thousands of rounds.
Test plan
flutter test test/core/observable_query_equivalence_test.dart— 6 tests green; stable across 5+ repeated runs_onBroadcastflutter test test/core— 112 tests green (run twice)Generated by Claude Code