Commit f620c77
authored
Add join fan-out prevention test suite (#37)
* Add comprehensive join fan-out prevention tests
20 tests validating that measures evaluate against their defining table
and are immune to row duplication from joins. Covers all aggregate types
(SUM, AVG, COUNT, MIN, MAX, STDDEV, VARIANCE, MEDIAN, STRING_AGG,
COUNT DISTINCT, MODE, PRODUCT, BIT_XOR, KURTOSIS, SKEWNESS, ENTROPY,
LIST, BOOL_AND, BOOL_OR) and ratio measures across 1:N, M:N, and LEFT
join cardinalities with grouped and filtered query shapes.
* Fix BOOL_AND/BOOL_OR test to be fan-out sensitive and resolve measure name collision
Redesign boolean test data so Dave (no orders) is the only non-premium
and only trial user. If fan-out prevention fails, both results flip.
Rename avg_age to avg_cust_age in fan-out tests to avoid colliding with
the downstream customers_qualified_v measure of the same name.
* Make MIN/MAX, COUNT DISTINCT, and MODE tests fan-out sensitive
MIN/MAX: add row with global-min age that has no join match, so MIN
changes if measure evaluates over joined rows only (15 vs 25).
COUNT DISTINCT: join orders to line_items (1:N from orders side) so
order rows are actually duplicated; previous join was M:1 and never
fanned out the defining table.
MODE: redesign data so base mode=20 but fanned-out mode=10 (id=1 has
3 orders, inflating age=10 to appear 3x vs age=20 appearing 2x).
* Make grouped AVG and ratio tests fan-out sensitive
Grouped AVG: replace single-customer-per-group test with tier-based
grouping where gold tier has Alice(30, 3 orders) + Bob(25, 1 order).
Correct AVG=27.5; fan-out would weight Alice 3x giving 28.75.
Grouped ratio: remove per-product grouped profit margin test since
(SUM(rev)-SUM(cost))/SUM(rev) is scale-invariant under uniform row
duplication within a group, making it fundamentally unobservable.
The ungrouped ratio test already validates fan-out prevention.
* Make grouped AVG and ratio tests fan-out sensitive
MIN/MAX: put both the global min (age=15) and global max (age=50) on
unmatched rows so both values change if measure uses joined rows only
(25/35 vs correct 15/50).
COUNT DISTINCT: add an order with product 'Thingamajig' from cust_id=99
who has no match in the join target. Correct count is 3 distinct
products; joined-only evaluation would give 2.1 parent 16aff98 commit f620c77
2 files changed
Lines changed: 439 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
35 | | - | |
| 35 | + | |
| 36 | + | |
36 | 37 | | |
37 | 38 | | |
38 | 39 | | |
| |||
0 commit comments