Skip to content

Commit f620c77

Browse files
authored
Add join fan-out prevention test suite (#37)
* Add comprehensive join fan-out prevention tests 20 tests validating that measures evaluate against their defining table and are immune to row duplication from joins. Covers all aggregate types (SUM, AVG, COUNT, MIN, MAX, STDDEV, VARIANCE, MEDIAN, STRING_AGG, COUNT DISTINCT, MODE, PRODUCT, BIT_XOR, KURTOSIS, SKEWNESS, ENTROPY, LIST, BOOL_AND, BOOL_OR) and ratio measures across 1:N, M:N, and LEFT join cardinalities with grouped and filtered query shapes. * Fix BOOL_AND/BOOL_OR test to be fan-out sensitive and resolve measure name collision Redesign boolean test data so Dave (no orders) is the only non-premium and only trial user. If fan-out prevention fails, both results flip. Rename avg_age to avg_cust_age in fan-out tests to avoid colliding with the downstream customers_qualified_v measure of the same name. * Make MIN/MAX, COUNT DISTINCT, and MODE tests fan-out sensitive MIN/MAX: add row with global-min age that has no join match, so MIN changes if measure evaluates over joined rows only (15 vs 25). COUNT DISTINCT: join orders to line_items (1:N from orders side) so order rows are actually duplicated; previous join was M:1 and never fanned out the defining table. MODE: redesign data so base mode=20 but fanned-out mode=10 (id=1 has 3 orders, inflating age=10 to appear 3x vs age=20 appearing 2x). * Make grouped AVG and ratio tests fan-out sensitive Grouped AVG: replace single-customer-per-group test with tier-based grouping where gold tier has Alice(30, 3 orders) + Bob(25, 1 order). Correct AVG=27.5; fan-out would weight Alice 3x giving 28.75. Grouped ratio: remove per-product grouped profit margin test since (SUM(rev)-SUM(cost))/SUM(rev) is scale-invariant under uniform row duplication within a group, making it fundamentally unobservable. The ungrouped ratio test already validates fan-out prevention. * Make grouped AVG and ratio tests fan-out sensitive MIN/MAX: put both the global min (age=15) and global max (age=50) on unmatched rows so both values change if measure uses joined rows only (25/35 vs correct 15/50). COUNT DISTINCT: add an order with product 'Thingamajig' from cust_id=99 who has no match in the join target. Correct count is 3 distinct products; joined-only evaluation would give 2.
1 parent 16aff98 commit f620c77

2 files changed

Lines changed: 439 additions & 1 deletion

File tree

docs/measures-sql-paper-parity.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,8 @@ This matrix tracks parity for the core language semantics described in sections
3232
| Listing 12 (queries 1-4) | correlated subquery, self-join, window, and measure forms return same rows | Covered | `test/sql/measures.test:1614`, `test/sql/measures.test:1624`, `test/sql/measures.test:1637`, `test/sql/measures.test:1652` |
3333
| §5.1 claim | `AT` can access rows excluded by outer `WHERE` (more expressive than `OVER`) | Covered | `test/sql/measures.test:962` |
3434
| §5.4 composability | derived measures referencing measures in same `SELECT` | Covered | `test/sql/measures.test:772`, `test/sql/measures.test:1499` |
35-
| §5.3 wide-table safety direction | joins with measures avoid double counting in tested cases | Partial | `test/sql/measures.test:889`, `test/sql/measures.test:1473` |
35+
| §3.6/§5.3 join fan-out prevention | measures immune to join fan-out across all aggregate types, join cardinalities (1:N, M:N, LEFT), and query shapes (grouped, filtered) | Covered | `test/sql/measures.test:949-1355` (20 tests) |
36+
| §5.3 wide-table safety direction | joins with measures avoid double counting in tested cases | Covered | `test/sql/measures.test:889`, `test/sql/measures.test:1473`, `test/sql/measures.test:982` |
3637
| §5.5 security model | measure views preserve SQL security boundaries | Gap | no privilege-based test in suite |
3738
| §3.4 call-site breadth | explicit use in `HAVING` parity path | Covered | `test/sql/measures.test:1548` |
3839

0 commit comments

Comments
 (0)