diff --git a/benchmarks/sql/LINQ_TO_TABLE.md b/benchmarks/sql/LINQ_TO_TABLE.md index 3ba956a74..85c002a82 100644 --- a/benchmarks/sql/LINQ_TO_TABLE.md +++ b/benchmarks/sql/LINQ_TO_TABLE.md @@ -4,13 +4,32 @@ Sibling of [LINQ.md](LINQ.md) / [LINQ_TO_DECS.md](LINQ_TO_DECS.md). Plan of reco `table` / `table` as the 6th `_fold` source, plus the `to_table` sink. Edited in-place as PRs land. -Status: **stage 7 committed** (group_by fusion — `can_group_by` + `build_group_by_adapter` on -`TableAdapter`, riding `plan_group_by_core` with the usage-pruned slot walk as the bucket-fill -loop; stage 6 = to_table sink; stage 5 = join probe + table-lead joins, 2742f6db2; stage 4 = +Status: **stage B committed** (point-lookup conjunct extraction — `where(key == X && residual)` +probes and evaluates the residual on the probed element; stage 7 = group_by fusion, #3103; +stage 6 = to_table sink; stage 5 = join probe + table-lead joins, 2742f6db2; stage 4 = point-lookup folds, ac441c4a0; stage 3 = `%linq!` table sources, 29d23baf6; stage 2 = TableAdapter + m7, 571fe879e; stage 1 = `each_kv` builtin, 8751bb9ba; master's fixed-array rework merged in after stage 5, 1ab3e6a67; the JIT inline slot walk landed separately, #3100). +Stage B findings: +- `extract_key_probe` peels the left-assoc `&&` spine: the **leftmost** conjunct must be the + key-equality (operand order + invariance/purity gates on X unchanged); the remaining conjuncts + AND-rebuild, order-preserving, into a residual evaluated on the probed element only, with a + false residual routing to the same miss path. `any(p)`/`count(p)` with a residual ride the + element probe (`hit && residual` / `? 1 : 0`). +- **Leftmost-only is the semantics rule, not a simplification**: keys are unique, so conjuncts + right of the key-equality run at most once on both paths — no purity gate needed on the + residual (pinned by a bump-counter regression test); a conjunct LEFT of it runs per scan + element vs once in the probe, so that order declines and stays the bench's scan control. +- Dead-generality lesson: a where-run coalescing loop in the matcher was written, then deleted — + `collapse_chained_wheres` already merges consecutive wheres at flatten time, so the matcher can + never see two `where_` entries. The conjunct peel alone covers both spellings. +- m7 (2026-06-11 sweep): `point_lookup_residual` (the shape that was a full scan before stage B) + 466 ns/op INTERP vs the `point_lookup_scan` control's 9967 (~21×); 1270 vs 9049 JIT (~7×) — + both normalize to 0.0 ns/elem in the matrix, same cell as the bare probe. The residual probe + costs ~2× the bare one per op (the hit path binds the kv pair, copying the value for the + residual to read) — still O(1), invisible next to the walk. + Stage 7 findings: - **Two overrides were the whole change**: the group_by splice pattern was already adapter-generic (`can_group_by_source` gate → `build_group_by_adapter` → `plan_group_by_core`), so enabling @@ -87,8 +106,8 @@ Stage 4 findings: - Scan-semantics mirroring: `first` panics "sequence contains no elements"; `first_or_default` binds its default eagerly before the probe (same order as the early-exit lane / linq.das). - `collapse_chained_wheres` runs before dispatch, so `where(key==X)|>where(p)` arrives as one - `&&` body → correctly declined (compound predicates keep the scan). Conjunct extraction - (probe + residual predicate on the probed element) is a named deferred edge below. + `&&` body → was correctly declined (compound predicates kept the scan) until stage B built + conjunct extraction (probe + residual predicate on the probed element — findings above). - m7 INTERP (2026-06-11 sweep): `point_lookup` 0.0 ns/elem (O(1) probe) vs `point_lookup_scan` (the same query forced through the walk via a second always-true where) at full scan cost. @@ -275,9 +294,6 @@ reasons" rather than falling out of the architecture). upstream-join arm (returns null → tier-2). The fix is a TableJoin analog of `ArrayJoinAdapter` (lead loop from the pruned slot walk, srcB hash/probe from the stage-5 pieces); the `join_groupby_*` m7 cells are the numbers it would improve. Revisit on demand. -- **Point-lookup conjunct extraction**: `where(kv.key == X && )` (incl. the collapsed - multi-where form) could probe and evaluate the residual on the probed element only. The matcher - currently declines compound predicates; add when a real chain wants it. - **Multiple-`from` (cross / SelectMany) over tables**: the unfused `_cross_join` arm passes the bare source text so the array×array overload resolves without an `each` unsafe trip; a table there has no overload (confusing 30303 cascade). `cross_join` has iterator overloads, so routing diff --git a/benchmarks/sql/results.md b/benchmarks/sql/results.md index 83b006a1e..3903f6fb3 100644 --- a/benchmarks/sql/results.md +++ b/benchmarks/sql/results.md @@ -16,8 +16,9 @@ are stable now). - **m5f XML** — `_fold` over `from_xml_node(root, type)` (`XmlAdapter` fuses + field-prunes). - **m6f JSON** — `_fold` over `from_json(jv, type)` (`JsonAdapter`, same machinery, array walk). - **m7 Table** — `_fold` over `each_kv(table)` (`TableAdapter`; kv usage-pruning picks keys-only / - values-only / zipped slot walks; key-equality `where` + terminator folds to an O(1) probe — the - `point_lookup` / `point_lookup_scan` pair measures it; joins fuse on either side, and a table srcB + values-only / zipped slot walks; key-equality `where` + terminator folds to an O(1) probe, residual + conjuncts right of the key-equality evaluating on the probed element only — the `point_lookup` / + `point_lookup_residual` / `point_lookup_scan` trio measures it; joins fuse on either side, and a table srcB joined on its bare key probes the table instead of building the join hash — the `join_probe` / `join_probe_build` pair measures it; a trailing `to_table()` inserts straight into the result table with no intermediate array — the `to_table` / `to_table_staged` pair measures it; @@ -39,175 +40,177 @@ signal, JIT deltas as indicative.** | Benchmark | SQL (m1) | Array (m3f) | Decs (m4) | XML fold (m5f) | JSON fold (m6f) | Table fold (m7) | |---|---:|---:|---:|---:|---:|---:| -| `aggregate_match` | 34.7 | 5.9 | 6.1 | 60.5 | 158.9 | 19.9 | -| `all_match` | 27.8 | 3.5 | 3.5 | 56.1 | 157.6 | 16.1 | +| `aggregate_match` | 35.0 | 5.9 | 5.8 | 60.8 | 159.8 | 19.1 | +| `all_match` | 27.8 | 3.5 | 3.4 | 56.4 | 155.6 | 15.9 | | `any_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -| `average_aggregate` | 30.6 | 6.1 | 8.7 | 58.7 | 157.7 | 17.2 | -| `bare_last` | — | 4.2 | 0.0 | 0.0 | 4.2 | 30.1 | -| `bare_order_where` | 279.8 | 117.0 | 125.6 | 302.7 | 296.8 | 164.3 | -| `chained_select_collapse` | — | 17.7 | 17.4 | 70.5 | 154.5 | 28.5 | -| `chained_where` | 36.6 | 6.6 | 7.1 | 105.5 | 177.6 | 23.9 | -| `contains_match` | 0.0 | 2.2 | 1.4 | 27.7 | 71.6 | 6.5 | -| `count_aggregate` | 29.4 | 4.2 | 4.1 | 64.2 | 162.2 | 20.3 | -| `cross_join` | 12628.8 | 3713.6 | — | 4051.3 | 4077.4 | — | -| `decs_count_bare_pred` | — | — | 4.1 | — | — | — | -| `distinct_by_count` | 41.4 | 15.7 | 15.7 | 70.6 | 159.1 | 27.2 | -| `distinct_by_order_take` | 242.2 | 22.0 | 23.4 | 124.9 | 158.7 | 49.5 | -| `distinct_by_order_to_array` | 240.8 | 21.9 | 23.4 | 125.2 | 160.5 | 49.5 | -| `distinct_count` | 41.6 | 15.5 | 15.6 | 70.6 | 154.4 | 27.4 | -| `distinct_count_pred` | 254.8 | 15.8 | 16.0 | 112.5 | 170.7 | 27.3 | +| `average_aggregate` | 30.6 | 6.1 | 8.7 | 58.7 | 164.0 | 17.3 | +| `bare_last` | — | 4.2 | 0.0 | 0.0 | 4.2 | 31.0 | +| `bare_order_where` | 280.6 | 116.8 | 125.5 | 300.5 | 289.3 | 162.6 | +| `chained_select_collapse` | — | 17.6 | 17.5 | 70.5 | 164.8 | 28.0 | +| `chained_where` | 36.7 | 6.6 | 7.1 | 105.6 | 183.3 | 24.0 | +| `contains_match` | 0.0 | 2.2 | 1.4 | 29.2 | 73.0 | 6.6 | +| `count_aggregate` | 29.8 | 4.2 | 4.1 | 63.9 | 154.3 | 20.2 | +| `cross_join` | 12641.5 | 3703.0 | — | 4040.3 | 4032.0 | — | +| `decs_count_bare_pred` | — | — | 4.2 | — | — | — | +| `distinct_by_count` | 41.1 | 15.8 | 15.8 | 71.2 | 162.6 | 26.5 | +| `distinct_by_order_take` | 245.5 | 22.2 | 23.6 | 123.7 | 162.2 | 48.7 | +| `distinct_by_order_to_array` | 247.9 | 22.1 | 23.5 | 125.1 | 163.8 | 48.6 | +| `distinct_count` | 41.5 | 15.6 | 15.8 | 71.2 | 161.9 | 27.1 | +| `distinct_count_pred` | 256.2 | 15.8 | 15.8 | 112.8 | 177.5 | 26.3 | | `distinct_take` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | | `element_at_match` | 0.0 | 0.0 | 0.0 | 0.4 | 0.3 | 0.0 | | `first_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | | `first_or_default_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -| `groupby_average` | 176.5 | 29.0 | 29.3 | 124.3 | 187.3 | 41.8 | -| `groupby_count` | 143.2 | 19.0 | 19.4 | 74.7 | 160.0 | 31.1 | -| `groupby_first` | 253.3 | 19.0 | 20.1 | 72.4 | 156.2 | 40.7 | -| `groupby_having_count` | 141.7 | 19.0 | 19.1 | 75.1 | 159.4 | 31.2 | -| `groupby_having_hidden_sum` | 176.8 | 22.2 | 22.3 | 118.8 | 183.3 | 34.2 | -| `groupby_having_post_where` | 174.3 | 20.3 | 20.4 | 114.7 | 180.1 | 32.4 | -| `groupby_max` | 175.5 | 24.7 | 24.9 | 120.5 | 184.2 | 35.0 | -| `groupby_min` | 173.9 | 25.5 | 25.2 | 121.0 | 183.6 | 34.8 | -| `groupby_multi_reducer` | 191.1 | 30.3 | 30.3 | 126.3 | 189.1 | 43.5 | -| `groupby_select_order` | 172.0 | 20.3 | 20.4 | 115.0 | 179.9 | 32.3 | -| `groupby_select_sum` | 201.3 | 38.5 | 38.5 | 102.1 | 185.6 | 50.3 | -| `groupby_sum` | 170.5 | 20.4 | 20.4 | 115.0 | 179.7 | 32.4 | -| `groupby_where_count` | 76.5 | 13.9 | 14.5 | 115.8 | 181.4 | 29.9 | -| `groupby_where_sum` | 87.2 | 14.2 | 14.8 | 116.6 | 181.9 | 31.3 | -| `join_count` | 38.3 | 52.0 | 65.0 | 112.3 | 177.5 | 65.1 | -| `join_groupby_count` | 157.7 | 77.3 | 88.9 | 177.9 | 224.8 | 260.4 | -| `join_groupby_to_array` | 191.6 | 79.1 | 91.3 | 215.2 | 211.3 | 289.4 | -| `join_probe` | — | — | — | — | — | 46.5 | -| `join_probe_build` | — | — | — | — | — | 79.1 | -| `join_select` | 150.4 | 74.1 | 85.0 | 188.7 | 211.5 | 222.8 | -| `join_where_count` | 39.7 | 62.3 | 76.2 | 161.1 | 193.0 | 79.6 | -| `last_match` | 0.0 | 5.8 | 14.0 | 65.2 | 159.4 | 30.8 | -| `long_count_aggregate` | 29.8 | 4.2 | 4.1 | 63.8 | 158.0 | 21.4 | -| `max_aggregate` | 31.3 | 6.1 | 6.8 | 58.8 | 157.3 | 16.9 | -| `min_aggregate` | 31.4 | 6.1 | 6.8 | 59.0 | 157.5 | 16.9 | -| `order_by_multi_key` | 341.1 | 274.6 | 283.0 | 459.8 | 450.9 | 334.7 | -| `order_distinct_take` | 138.7 | 15.7 | 99.0 | 72.6 | 155.7 | 31.6 | -| `order_reverse_normalized` | 38.4 | 16.3 | 20.0 | 70.5 | 162.8 | 32.9 | -| `order_take_desc` | 38.5 | 16.4 | 20.0 | 70.6 | 162.2 | 32.9 | +| `groupby_average` | 171.1 | 29.2 | 29.2 | 123.8 | 217.9 | 40.9 | +| `groupby_count` | 141.4 | 19.1 | 19.2 | 74.8 | 171.7 | 30.8 | +| `groupby_first` | 257.2 | 19.1 | 19.8 | 72.5 | 162.9 | 40.2 | +| `groupby_having_count` | 141.6 | 19.2 | 19.2 | 75.5 | 168.3 | 30.6 | +| `groupby_having_hidden_sum` | 176.8 | 22.2 | 22.6 | 118.4 | 192.2 | 33.5 | +| `groupby_having_post_where` | 172.0 | 20.4 | 20.5 | 114.7 | 188.7 | 31.7 | +| `groupby_max` | 174.6 | 24.8 | 24.9 | 119.4 | 193.4 | 34.2 | +| `groupby_min` | 174.1 | 25.5 | 25.3 | 120.0 | 191.8 | 34.3 | +| `groupby_multi_reducer` | 190.7 | 30.6 | 30.1 | 125.0 | 195.9 | 43.2 | +| `groupby_select_order` | 171.5 | 20.4 | 20.5 | 114.8 | 188.2 | 31.6 | +| `groupby_select_sum` | 198.6 | 38.5 | 38.6 | 101.9 | 193.6 | 49.4 | +| `groupby_sum` | 170.7 | 20.5 | 20.5 | 114.7 | 187.7 | 31.5 | +| `groupby_where_count` | 76.0 | 13.8 | 14.5 | 116.5 | 185.7 | 30.2 | +| `groupby_where_sum` | 87.1 | 14.2 | 14.8 | 117.5 | 186.2 | 30.6 | +| `join_count` | 38.6 | 52.3 | 64.4 | 112.9 | 183.8 | 65.0 | +| `join_groupby_count` | 158.9 | 77.1 | 89.0 | 178.8 | 230.9 | 260.6 | +| `join_groupby_to_array` | 191.1 | 78.5 | 90.7 | 215.8 | 212.8 | 290.5 | +| `join_probe` | — | — | — | — | — | 46.8 | +| `join_probe_build` | — | — | — | — | — | 80.8 | +| `join_select` | 150.7 | 73.7 | 84.2 | 194.2 | 214.5 | 223.2 | +| `join_where_count` | 39.6 | 62.0 | 75.7 | 161.1 | 198.9 | 80.2 | +| `last_match` | 0.0 | 5.7 | 14.0 | 65.4 | 160.2 | 31.4 | +| `long_count_aggregate` | 29.7 | 4.1 | 4.1 | 63.8 | 154.5 | 20.1 | +| `max_aggregate` | 31.0 | 6.1 | 6.9 | 59.0 | 163.6 | 16.9 | +| `min_aggregate` | 30.9 | 6.1 | 6.8 | 59.3 | 163.2 | 16.9 | +| `order_by_multi_key` | 341.9 | 274.9 | 283.2 | 459.1 | 445.7 | 333.6 | +| `order_distinct_take` | 138.9 | 15.9 | 100.1 | 73.0 | 163.1 | 31.0 | +| `order_reverse_normalized` | 38.5 | 16.3 | 20.0 | 70.7 | 170.8 | 32.9 | +| `order_take_desc` | 38.6 | 16.4 | 20.0 | 70.8 | 170.4 | 33.0 | | `point_lookup` | — | — | — | — | — | 0.0 | -| `point_lookup_scan` | — | — | — | — | — | 8.3 | -| `reverse_distinct_by` | 296.8 | 21.1 | 28.3 | 71.3 | 155.8 | 43.8 | -| `reverse_take` | 0.1 | 0.0 | 0.2 | 0.0 | 26.2 | 58.5 | +| `point_lookup_residual` | — | — | — | — | — | 0.0 | +| `point_lookup_scan` | — | — | — | — | — | 8.5 | +| `reverse_distinct_by` | 299.0 | 21.2 | 27.7 | 71.2 | 162.4 | 44.2 | +| `reverse_take` | 0.1 | 0.0 | 0.2 | 0.0 | 26.3 | 58.7 | | `reverse_take_select` | 0.0 | 0.0 | 0.2 | 0.0 | 26.2 | 58.6 | -| `select_count` | 0.1 | 0.0 | 2.2 | 65.1 | 2.2 | 0.0 | -| `select_many` | — | 189.4 | — | — | — | — | -| `select_where` | 195.0 | 11.2 | 19.3 | 197.8 | 188.5 | 37.6 | -| `select_where_count` | 33.0 | 5.2 | 7.4 | 65.3 | 150.2 | 23.0 | -| `select_where_order_take` | 37.0 | 12.2 | 14.8 | 72.7 | 162.8 | 34.6 | -| `select_where_sum` | 37.4 | 7.4 | 7.5 | 66.2 | 157.7 | 24.1 | -| `single_match` | 0.0 | 2.9 | 5.4 | 55.9 | 148.2 | 23.0 | -| `skip_take` | 0.5 | 0.1 | 0.2 | 3.1 | 2.8 | 0.3 | -| `skip_while_match` | 3.5 | 5.3 | 5.3 | 57.7 | 149.0 | 18.2 | -| `sort_first` | 38.2 | 11.0 | 13.4 | 65.6 | 159.4 | 31.6 | -| `sort_take` | 38.6 | 16.1 | 20.4 | 70.5 | 163.4 | 33.0 | -| `sort_take_select` | 38.2 | 16.4 | 20.3 | 71.0 | 163.1 | 33.1 | -| `sum_aggregate` | 30.2 | 2.1 | 2.1 | 54.5 | 156.8 | 13.5 | -| `sum_where` | 33.0 | 4.3 | 4.3 | 63.7 | 158.1 | 20.4 | +| `select_count` | 0.1 | 0.0 | 2.2 | 69.6 | 2.2 | 0.0 | +| `select_many` | — | 191.6 | — | — | — | — | +| `select_where` | 197.3 | 11.0 | 19.3 | 196.5 | 183.9 | 37.5 | +| `select_where_count` | 32.9 | 5.2 | 7.4 | 64.7 | 158.0 | 22.5 | +| `select_where_order_take` | 37.2 | 12.2 | 15.0 | 73.1 | 164.7 | 34.5 | +| `select_where_sum` | 37.4 | 7.5 | 7.5 | 66.7 | 162.3 | 23.3 | +| `single_match` | 0.0 | 2.8 | 5.4 | 58.7 | 150.6 | 22.8 | +| `skip_take` | 0.5 | 0.1 | 0.2 | 3.0 | 2.8 | 0.3 | +| `skip_while_match` | 3.5 | 5.2 | 5.3 | 60.3 | 153.6 | 18.2 | +| `sort_first` | 38.7 | 11.0 | 13.5 | 65.6 | 167.0 | 31.5 | +| `sort_take` | 38.7 | 16.1 | 20.2 | 71.1 | 170.6 | 32.7 | +| `sort_take_select` | 38.5 | 16.4 | 20.2 | 71.3 | 171.4 | 33.1 | +| `sum_aggregate` | 30.5 | 2.1 | 2.1 | 54.7 | 152.7 | 13.4 | +| `sum_where` | 33.1 | 4.3 | 4.3 | 63.7 | 154.0 | 21.0 | | `take_count` | 3.6 | 0.2 | 0.4 | 2.9 | 2.7 | 0.5 | -| `take_count_filtered` | 1.1 | 0.2 | 0.2 | 1.4 | 1.1 | 0.3 | +| `take_count_filtered` | 1.1 | 0.2 | 0.2 | 1.3 | 1.1 | 0.3 | | `take_sum_aggregate` | 0.8 | 0.1 | 0.1 | 0.6 | 0.5 | 0.1 | | `take_where_count` | 0.9 | 0.1 | 0.1 | 0.7 | 0.6 | 0.2 | -| `take_while_match` | 7.8 | 2.4 | 2.5 | 29.0 | 72.5 | 16.6 | -| `to_array_filter` | 70.6 | 11.7 | 11.7 | 71.8 | 161.9 | 28.9 | -| `to_table` | — | 18.7 | 141.8 | 118.5 | 140.1 | 32.2 | -| `to_table_staged` | — | 54.6 | 56.6 | 143.0 | 164.0 | 68.6 | -| `where_join_count` | 39.7 | 29.5 | 41.1 | 132.2 | 166.3 | 46.8 | -| `zip_count_pred` | 39.1 | 15.8 | — | 317.6 | 318.8 | — | -| `zip_dot_product` | 46.9 | 12.6 | 10.6 | 312.4 | 315.1 | — | -| `zip_dot_product_3arg` | 46.8 | 12.7 | — | 311.9 | 315.7 | — | -| `zip_reverse_to_array` | — | 32.1 | — | 348.5 | 349.7 | — | +| `take_while_match` | 8.2 | 2.4 | 2.5 | 30.4 | 75.6 | 16.4 | +| `to_array_filter` | 70.9 | 11.7 | 11.8 | 71.6 | 164.5 | 29.2 | +| `to_table` | — | 18.6 | 143.9 | 117.9 | 143.7 | 32.5 | +| `to_table_staged` | — | 55.7 | 57.7 | 143.7 | 167.5 | 69.7 | +| `where_join_count` | 41.8 | 29.5 | 41.2 | 132.6 | 167.6 | 47.9 | +| `zip_count_pred` | 39.5 | 15.8 | — | 315.7 | 319.9 | — | +| `zip_dot_product` | 49.2 | 12.7 | 10.5 | 310.1 | 316.0 | — | +| `zip_dot_product_3arg` | 48.7 | 12.8 | — | 310.3 | 316.5 | — | +| `zip_reverse_to_array` | — | 31.9 | — | 344.9 | 350.7 | — | ## JIT | Benchmark | SQL (m1) | Array (m3f) | Decs (m4) | XML fold (m5f) | JSON fold (m6f) | Table fold (m7) | |---|---:|---:|---:|---:|---:|---:| -| `aggregate_match` | 34.7 | 0.3 | 0.7 | 29.6 | 25.8 | 7.2 | -| `all_match` | 27.5 | 0.3 | 0.2 | 18.8 | 24.9 | 7.2 | +| `aggregate_match` | 34.9 | 0.3 | 0.7 | 18.8 | 26.4 | 7.3 | +| `all_match` | 27.6 | 0.3 | 0.2 | 18.4 | 25.4 | 7.2 | | `any_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -| `average_aggregate` | 30.1 | 1.0 | 3.6 | 18.5 | 24.5 | 7.4 | -| `bare_last` | — | 0.4 | 0.0 | 0.0 | 0.0 | 8.9 | -| `bare_order_where` | 184.9 | 33.8 | 35.0 | 105.9 | 51.7 | 68.4 | -| `chained_select_collapse` | — | 1.1 | 1.1 | 20.5 | 31.9 | 8.1 | -| `chained_where` | 36.6 | 0.6 | 0.9 | 36.3 | 29.9 | 10.6 | -| `contains_match` | 0.0 | 0.2 | 0.1 | 19.3 | 8.8 | 2.5 | -| `count_aggregate` | 29.8 | 0.3 | 0.6 | 29.1 | 25.1 | 7.3 | -| `cross_join` | 5967.3 | 717.5 | — | 831.1 | 766.7 | — | +| `average_aggregate` | 30.2 | 1.0 | 3.6 | 18.4 | 24.8 | 7.4 | +| `bare_last` | — | 0.4 | 0.0 | 0.0 | 0.0 | 8.8 | +| `bare_order_where` | 187.2 | 34.2 | 35.7 | 104.2 | 53.2 | 68.7 | +| `chained_select_collapse` | — | 1.1 | 1.1 | 20.6 | 33.7 | 8.3 | +| `chained_where` | 36.9 | 0.6 | 0.9 | 38.3 | 31.6 | 10.6 | +| `contains_match` | 0.0 | 0.2 | 0.1 | 17.5 | 8.9 | 2.5 | +| `count_aggregate` | 29.6 | 0.3 | 0.6 | 23.4 | 25.6 | 7.3 | +| `cross_join` | 5998.8 | 737.5 | — | 830.4 | 766.4 | — | | `decs_count_bare_pred` | — | — | 0.6 | — | — | — | -| `distinct_by_count` | 41.4 | 1.1 | 1.1 | 20.5 | 32.0 | 8.0 | -| `distinct_by_order_take` | 238.2 | 1.7 | 2.6 | 45.3 | 37.2 | 19.5 | -| `distinct_by_order_to_array` | 239.9 | 1.7 | 2.7 | 45.3 | 37.1 | 19.7 | -| `distinct_count` | 41.6 | 1.1 | 1.1 | 20.5 | 32.1 | 8.0 | -| `distinct_count_pred` | 252.3 | 1.1 | 1.3 | 37.6 | 41.8 | 8.0 | +| `distinct_by_count` | 41.3 | 1.1 | 1.1 | 20.6 | 33.8 | 8.1 | +| `distinct_by_order_take` | 241.8 | 1.7 | 2.6 | 44.7 | 38.8 | 19.4 | +| `distinct_by_order_to_array` | 241.0 | 1.7 | 2.7 | 45.2 | 38.6 | 19.7 | +| `distinct_count` | 41.6 | 1.1 | 1.1 | 20.7 | 33.6 | 8.1 | +| `distinct_count_pred` | 252.6 | 1.1 | 1.3 | 37.7 | 43.5 | 8.0 | | `distinct_take` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | | `element_at_match` | 0.0 | 0.0 | 0.0 | 0.1 | 0.0 | 0.0 | | `first_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | | `first_or_default_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -| `groupby_average` | 175.4 | 1.5 | 1.8 | 37.0 | 42.9 | 8.9 | -| `groupby_count` | 153.2 | 1.3 | 1.5 | 20.5 | 32.3 | 8.4 | -| `groupby_first` | 252.8 | 1.3 | 2.3 | 20.5 | 32.9 | 10.0 | -| `groupby_having_count` | 141.8 | 1.3 | 1.5 | 20.5 | 32.3 | 8.5 | -| `groupby_having_hidden_sum` | 176.3 | 1.5 | 1.9 | 36.9 | 43.0 | 8.7 | -| `groupby_having_post_where` | 172.1 | 1.4 | 1.9 | 36.9 | 42.2 | 8.5 | -| `groupby_max` | 176.7 | 1.5 | 1.9 | 37.1 | 45.2 | 8.6 | -| `groupby_min` | 177.2 | 1.5 | 1.9 | 38.3 | 45.6 | 8.5 | -| `groupby_multi_reducer` | 191.7 | 1.6 | 1.9 | 37.2 | 43.7 | 9.0 | -| `groupby_select_order` | 176.6 | 1.4 | 1.6 | 37.8 | 41.9 | 8.4 | -| `groupby_select_sum` | 204.9 | 2.8 | 3.2 | 33.5 | 37.7 | 22.8 | -| `groupby_sum` | 171.6 | 1.4 | 1.9 | 37.7 | 42.0 | 8.4 | -| `groupby_where_count` | 76.9 | 0.9 | 1.3 | 37.1 | 39.7 | 11.2 | -| `groupby_where_sum` | 90.4 | 0.9 | 1.3 | 37.1 | 39.7 | 11.2 | -| `join_count` | 38.6 | 11.2 | 12.6 | 40.7 | 68.3 | 25.1 | -| `join_groupby_count` | 157.4 | 17.2 | 19.2 | 66.2 | 86.0 | 73.5 | -| `join_groupby_to_array` | 191.5 | 17.8 | 19.6 | 78.4 | 35.8 | 80.6 | +| `groupby_average` | 171.5 | 1.6 | 1.8 | 36.8 | 45.6 | 8.9 | +| `groupby_count` | 141.1 | 1.3 | 1.5 | 20.7 | 34.0 | 8.5 | +| `groupby_first` | 254.1 | 1.3 | 2.3 | 20.6 | 34.7 | 10.0 | +| `groupby_having_count` | 141.5 | 1.3 | 1.5 | 20.7 | 34.1 | 8.5 | +| `groupby_having_hidden_sum` | 175.3 | 1.5 | 1.9 | 36.6 | 45.2 | 8.6 | +| `groupby_having_post_where` | 171.9 | 1.4 | 1.9 | 36.6 | 44.3 | 8.5 | +| `groupby_max` | 174.1 | 1.5 | 1.9 | 36.8 | 45.6 | 8.5 | +| `groupby_min` | 175.0 | 1.5 | 2.0 | 36.8 | 46.2 | 8.5 | +| `groupby_multi_reducer` | 190.4 | 1.6 | 1.9 | 36.7 | 46.1 | 9.1 | +| `groupby_select_order` | 171.6 | 1.4 | 1.6 | 36.7 | 44.3 | 8.5 | +| `groupby_select_sum` | 199.2 | 2.8 | 3.2 | 33.0 | 39.7 | 23.0 | +| `groupby_sum` | 172.7 | 1.4 | 1.9 | 36.7 | 44.3 | 8.4 | +| `groupby_where_count` | 76.4 | 0.9 | 1.3 | 36.5 | 42.0 | 11.3 | +| `groupby_where_sum` | 87.5 | 0.9 | 1.3 | 36.6 | 42.0 | 11.2 | +| `join_count` | 38.3 | 10.8 | 12.8 | 40.9 | 70.9 | 25.1 | +| `join_groupby_count` | 157.8 | 17.3 | 19.5 | 66.5 | 90.0 | 73.5 | +| `join_groupby_to_array` | 191.5 | 18.4 | 19.8 | 78.1 | 36.2 | 80.8 | | `join_probe` | — | — | — | — | — | 16.6 | -| `join_probe_build` | — | — | — | — | — | 31.6 | -| `join_select` | 93.0 | 19.6 | 21.7 | 73.2 | 90.5 | 69.5 | -| `join_where_count` | 48.9 | 19.1 | 20.6 | 62.9 | 77.6 | 31.5 | -| `last_match` | 0.0 | 0.5 | 1.4 | 19.5 | 25.5 | 12.0 | -| `long_count_aggregate` | 29.7 | 0.3 | 0.6 | 29.3 | 25.1 | 7.3 | -| `max_aggregate` | 30.7 | 0.3 | 0.5 | 29.6 | 26.3 | 7.3 | -| `min_aggregate` | 31.0 | 0.3 | 0.5 | 29.6 | 26.3 | 7.4 | -| `order_by_multi_key` | 245.0 | 53.4 | 54.5 | 124.5 | 70.3 | 118.9 | -| `order_distinct_take` | 138.7 | 1.1 | 75.0 | 20.8 | 34.4 | 8.0 | -| `order_reverse_normalized` | 38.7 | 0.7 | 1.3 | 19.6 | 27.0 | 9.5 | -| `order_take_desc` | 38.7 | 0.7 | 1.3 | 19.6 | 27.0 | 9.3 | +| `join_probe_build` | — | — | — | — | — | 31.5 | +| `join_select` | 92.8 | 19.6 | 21.7 | 73.1 | 95.2 | 69.5 | +| `join_where_count` | 48.8 | 19.1 | 20.9 | 62.5 | 76.7 | 31.5 | +| `last_match` | 0.0 | 0.5 | 1.4 | 19.3 | 26.0 | 12.0 | +| `long_count_aggregate` | 29.4 | 0.3 | 0.6 | 23.4 | 25.8 | 7.3 | +| `max_aggregate` | 30.9 | 0.3 | 0.5 | 19.1 | 27.1 | 7.5 | +| `min_aggregate` | 31.0 | 0.3 | 0.5 | 19.1 | 27.1 | 7.4 | +| `order_by_multi_key` | 248.9 | 53.1 | 54.3 | 123.5 | 71.6 | 119.2 | +| `order_distinct_take` | 138.5 | 1.1 | 75.3 | 21.0 | 36.0 | 8.1 | +| `order_reverse_normalized` | 38.3 | 0.7 | 1.4 | 22.4 | 27.8 | 9.8 | +| `order_take_desc` | 38.6 | 0.7 | 1.4 | 22.5 | 27.7 | 9.7 | | `point_lookup` | — | — | — | — | — | 0.0 | -| `point_lookup_scan` | — | — | — | — | — | 3.0 | -| `reverse_distinct_by` | 296.5 | 1.6 | 3.2 | 20.5 | 34.2 | 11.1 | +| `point_lookup_residual` | — | — | — | — | — | 0.0 | +| `point_lookup_scan` | — | — | — | — | — | 3.1 | +| `reverse_distinct_by` | 297.8 | 1.6 | 3.2 | 20.7 | 34.6 | 11.0 | | `reverse_take` | 0.0 | 0.0 | 0.0 | 0.0 | 3.7 | 19.5 | -| `reverse_take_select` | 0.0 | 0.0 | 0.0 | 0.0 | 3.7 | 19.0 | -| `select_count` | 0.1 | 0.0 | 0.0 | 64.6 | 0.0 | 0.0 | -| `select_many` | — | 61.3 | — | — | — | — | -| `select_where` | 107.1 | 4.2 | 5.2 | 75.6 | 21.9 | 17.7 | -| `select_where_count` | 32.8 | 0.3 | 0.6 | 29.4 | 25.9 | 7.2 | -| `select_where_order_take` | 37.1 | 0.7 | 1.4 | 19.6 | 26.6 | 12.9 | -| `select_where_sum` | 37.2 | 0.4 | 0.6 | 20.2 | 24.9 | 7.4 | -| `single_match` | 0.0 | 0.4 | 1.1 | 44.2 | 22.3 | 9.2 | +| `reverse_take_select` | 0.0 | 0.0 | 0.0 | 0.0 | 3.8 | 19.0 | +| `select_count` | 0.1 | 0.0 | 0.0 | 62.0 | 0.0 | 0.0 | +| `select_many` | — | 62.3 | — | — | — | — | +| `select_where` | 107.2 | 4.1 | 5.3 | 74.5 | 22.2 | 17.6 | +| `select_where_count` | 33.0 | 0.3 | 0.6 | 18.8 | 26.6 | 7.3 | +| `select_where_order_take` | 37.0 | 0.7 | 1.4 | 19.4 | 27.2 | 13.0 | +| `select_where_sum` | 37.1 | 0.4 | 0.6 | 18.5 | 25.5 | 7.5 | +| `single_match` | 0.0 | 0.4 | 1.1 | 46.8 | 22.3 | 9.0 | | `skip_take` | 0.3 | 0.0 | 0.0 | 1.2 | 0.2 | 0.1 | -| `skip_while_match` | 3.5 | 0.4 | 0.4 | 44.4 | 21.8 | 7.6 | -| `sort_first` | 38.2 | 0.4 | 1.3 | 18.7 | 26.2 | 9.3 | -| `sort_take` | 38.4 | 0.7 | 1.3 | 19.6 | 27.2 | 9.6 | -| `sort_take_select` | 38.5 | 0.7 | 1.4 | 19.5 | 27.1 | 9.6 | -| `sum_aggregate` | 30.3 | 0.3 | 0.0 | 22.4 | 24.2 | 7.3 | -| `sum_where` | 33.1 | 0.3 | 0.6 | 29.5 | 25.8 | 7.3 | -| `take_count` | 1.8 | 0.1 | 0.1 | 1.2 | 0.3 | 0.1 | +| `skip_while_match` | 3.4 | 0.4 | 0.4 | 46.1 | 21.9 | 7.7 | +| `sort_first` | 38.5 | 0.4 | 1.3 | 18.5 | 26.8 | 9.0 | +| `sort_take` | 38.7 | 0.7 | 1.4 | 22.5 | 27.9 | 9.4 | +| `sort_take_select` | 38.6 | 0.7 | 1.4 | 22.4 | 27.9 | 9.3 | +| `sum_aggregate` | 30.1 | 0.3 | 0.1 | 28.4 | 24.6 | 7.3 | +| `sum_where` | 32.7 | 0.3 | 0.6 | 18.9 | 26.4 | 7.3 | +| `take_count` | 1.8 | 0.1 | 0.1 | 1.2 | 0.2 | 0.1 | | `take_count_filtered` | 1.1 | 0.0 | 0.0 | 0.5 | 0.1 | 0.0 | -| `take_sum_aggregate` | 0.8 | 0.0 | 0.0 | 0.2 | 0.0 | 0.0 | -| `take_where_count` | 0.9 | 0.0 | 0.0 | 0.3 | 0.0 | 0.0 | -| `take_while_match` | 7.8 | 0.2 | 0.3 | 19.0 | 9.0 | 7.3 | -| `to_array_filter` | 48.4 | 3.3 | 3.3 | 22.0 | 33.5 | 13.0 | -| `to_table` | — | 14.1 | 36.9 | 49.4 | 52.0 | 21.1 | -| `to_table_staged` | — | 25.8 | 26.0 | 53.0 | 61.5 | 33.9 | -| `where_join_count` | 41.7 | 6.0 | 6.7 | 47.6 | 40.5 | 19.7 | -| `zip_count_pred` | 39.6 | 0.1 | — | 113.5 | 33.5 | — | -| `zip_dot_product` | 46.9 | 0.1 | 0.1 | 113.6 | 33.3 | — | -| `zip_dot_product_3arg` | 46.8 | 0.1 | — | 113.6 | 33.3 | — | -| `zip_reverse_to_array` | — | 4.5 | — | 124.9 | 50.6 | — | +| `take_sum_aggregate` | 0.8 | 0.0 | 0.0 | 0.3 | 0.0 | 0.0 | +| `take_where_count` | 0.9 | 0.0 | 0.0 | 0.2 | 0.0 | 0.0 | +| `take_while_match` | 8.2 | 0.2 | 0.3 | 17.6 | 8.9 | 7.3 | +| `to_array_filter` | 48.9 | 3.3 | 3.4 | 21.8 | 35.3 | 13.0 | +| `to_table` | — | 14.0 | 37.2 | 49.0 | 54.1 | 20.9 | +| `to_table_staged` | — | 25.8 | 26.2 | 52.4 | 64.1 | 33.4 | +| `where_join_count` | 41.6 | 5.9 | 6.8 | 47.2 | 42.6 | 19.8 | +| `zip_count_pred` | 39.5 | 0.1 | — | 113.3 | 33.8 | — | +| `zip_dot_product` | 49.3 | 0.1 | 0.1 | 113.0 | 33.8 | — | +| `zip_dot_product_3arg` | 49.1 | 0.1 | — | 113.1 | 33.9 | — | +| `zip_reverse_to_array` | — | 4.6 | — | 126.3 | 51.9 | — | ## Missing lanes (the `—` cells) @@ -224,7 +227,7 @@ Each empty cell's reason is also in the bench `.das` file's comment; SQL gaps ar - **`order_distinct_take` m4 vs m3f** — `unique_key` hashes workhorse keys directly (array `int`) but string-interpolates structs (decs `DecsBrand`); the gap is per-element string hashing, not decs-walk. `distinct_by_count` is the key-based variant (m4 parity). - **`zip_reverse_to_array` / `zip_*` SQL / Decs** — `reverse` has no SQL order key; zip is not relational / not expressible over one archetype walk. By design. (XML/JSON zip lanes are lit, partially fused.) - **m7 absent families** — `zip_*` / `cross_join` (lockstep pairing over an unordered slot walk is meaningless) and `select_many` (flat fixture, no nested array field; array-only). Everything else in the m7 column is instantiated, and the `groupby_*` family is a fused emit (`plan_group_by_core` over the usage-pruned slot walk). The remaining cascade cells are `join_groupby_*` (join |> group_by over a table lead declines) and the reverse family (no backward slot walk) — both named deferred edges (see `LINQ_TO_TABLE.md`), so those cells are the numbers a fix would improve. -- **`point_lookup` / `point_lookup_scan` non-m7** — m7-only pair: only a table source has a key to probe (`where(kv.key == X)` + terminator → `key_exists` / `tab?[X]`, O(1)); the `_scan` twin forces the same query through the walk (compound `&&` predicate declines the probe) to show the gap. Other sources have no analog by design. +- **`point_lookup` / `point_lookup_residual` / `point_lookup_scan` non-m7** — m7-only trio: only a table source has a key to probe (`where(kv.key == X)` + terminator → `key_exists` / `tab?[X]`, O(1)); the `_residual` twin adds a conjunct right of the key-equality (`key == X && residual`), which probes and evaluates the residual on the probed element only; the `_scan` control puts the residual conjunct FIRST, so the key-equality is not the leftmost conjunct and the probe matcher correctly declines to the walk. Other sources have no analog by design. - **`join_probe` / `join_probe_build` non-m7** — m7-only A/B pair: a table srcB joined on its bare key probes the user's table per lead row (no internal join hash, no build loop); the `_build` twin feeds the identical rows pre-materialized to a kv array, forcing the hashed build. Other sources have no keyed-srcB analog by design. - **`to_table` / `to_table_staged` SQL** — `to_table` isn't an SQL terminator (`_sql` pass-through has no table sink). All in-memory sources are instantiated: array / XML / JSON / table fuse the insert-loop sink (`_staged` is the materialize-then-`to_table_move` shape every chain had before the sink arm); decs declines by design (explicit guard in its loop_or_count lane), so its `to_table` cell is the full tier-2 cascade — currently slower than its `_staged` twin, which fuses the array materialization first. That gap is the motivating number for a future decs sink hook. diff --git a/benchmarks/sql/table.das b/benchmarks/sql/table.das index 0a28d7afd..3e3ec7f8a 100644 --- a/benchmarks/sql/table.das +++ b/benchmarks/sql/table.das @@ -674,9 +674,10 @@ def order_take_desc_m7(b : B?) { } } -// Point-lookup pair: the fused probe (key-equality where + first_or_default → `g_t?[k]`, O(1) total — -// per-element ns reads ~0) vs the same query forced onto the linear scan via a second always-true -// `where` (collapses to a compound `&&` predicate, which the probe matcher correctly declines). +// Point-lookup trio: the bare fused probe (key-equality where + first_or_default → `g_t?[k]`, O(1) total — +// per-element ns reads ~0); the probe with a residual conjunct (key-eq leftmost, residual evaluated on the +// probed element only — also O(1)); and the scan control (residual FIRST, so the key-eq is not the leftmost +// conjunct and the probe matcher correctly declines to the linear walk). [benchmark] def point_lookup_m7(b : B?) { b |> run("point_lookup", N) { @@ -688,10 +689,22 @@ def point_lookup_m7(b : B?) { } } +[benchmark] +def point_lookup_residual_m7(b : B?) { + b |> run("point_lookup_residual", N) { + let row = _fold(unsafe(each_kv(g_t))._where(_.key == N / 2)._where(_.value.price >= 0) + .first_or_default(default)) + b |> accept(row) + if (row.key == 0) { + b->failNow() + } + } +} + [benchmark] def point_lookup_scan_m7(b : B?) { b |> run("point_lookup_scan", N) { - let row = _fold(unsafe(each_kv(g_t))._where(_.key == N / 2)._where(_.value.price >= 0) + let row = _fold(unsafe(each_kv(g_t))._where(_.value.price >= 0)._where(_.key == N / 2) .first_or_default(default)) b |> accept(row) if (row.key == 0) { diff --git a/daslib/linq_fold_table.das b/daslib/linq_fold_table.das index d744c1884..5ee1ea0a2 100644 --- a/daslib/linq_fold_table.das +++ b/daslib/linq_fold_table.das @@ -191,9 +191,9 @@ class TableAdapter : SourceAdapter { } } -// ===== Point-lookup folds — `where(kv.key == X)` + terminator → O(1) key probe ===== -// any/contains → key_exists, count → key_exists?1:0, first[_or_default] (± select) → tab?[X] probe, -// with the scan's exact semantics. Full shape/decline table: linq_fold_patterns.rst (table source row). +// ===== Point-lookup folds — `where(kv.key == X [&& residual])` + terminator → O(1) key probe ===== +// any/contains → key_exists, count → key_exists?1:0, first[_or_default] (± select) → tab?[X] probe, with the +// scan's exact semantics; residual conjuncts run on the probed element only. Table: linq_fold_patterns.rst. [macro_function] def private match_key_probe_side(var keySide, otherSide : Expression?; lane : TableLane; bindName : string) : Expression? { @@ -218,19 +218,57 @@ def private match_key_probe_side(var keySide, otherSide : Expression?; lane : Ta return clone_expression(otherSide) } -// Decompose a peeled predicate body (binder renamed to bindName) as ` == X`. Returns cloned X. +// Decompose a peeled predicate body (binder renamed to bindName) as ` == X [&& conjunct …]`; +// returns cloned X, AND-rebuilds the rest into residual. Key-equality must be LEFTMOST: keys are unique, so +// right of it runs at most once on both paths (no purity gate); a LEFT conjunct runs per scan element — decline. [macro_function] -def private extract_key_probe(var pred : Expression?; lane : TableLane; bindName : string) : Expression? { - if (pred == null || !(pred is ExprOp2)) return null - var op2 = pred as ExprOp2 +def private extract_key_probe(var pred : Expression?; lane : TableLane; bindName : string; var residual : Expression?&) : Expression? { + // peel the left-assoc `&&` spine; right arms collect outermost-first (= reversed source order) + var leaf = pred + var conjuncts : array + while (leaf != null && leaf is ExprOp2 && (leaf as ExprOp2).op == "&&") { + var andOp = leaf as ExprOp2 + conjuncts |> push(andOp.right) + leaf = andOp.left + } + if (leaf == null || !(leaf is ExprOp2)) return null + var op2 = leaf as ExprOp2 if (op2.op != "==") return null var probe = match_key_probe_side(op2.left, op2.right, lane, bindName) if (probe == null) { probe = match_key_probe_side(op2.right, op2.left, lane, bindName) } + if (probe == null) return null + for (c in conjuncts) { + var cc = clone_expression(c) + residual = residual == null ? cc : merge_where_cond(cc, residual) + } return probe } +// Miss-path emission per terminator: any → false, count → 0, first_or_default → bound default, first → panic. +[macro_function] +def private make_probe_miss(termName : string; dName : string) : Expression? { + if (termName == "any") { + return qmacro_expr() { + return false + } + } + if (termName == "count") { + return qmacro_expr() { + return 0 + } + } + if (termName == "first_or_default") { + return qmacro_expr() { + return $i(dName) + } + } + return qmacro_expr() { + panic("sequence contains no elements") + } +} + [macro_function] def try_table_point_lookup(var calls : array>; var adapter : TableAdapter?; at : LineInfo) : Expression? { if (adapter.lane == TableLane.VALUES) return null @@ -240,9 +278,14 @@ def try_table_point_lookup(var calls : array>; var a let termName = calls[n - 1]._1.name let termArgs = length(termCall.arguments) var selCall : ExprCall? - var predArg : Expression? var keyX : Expression? - let bindName = qn("plk_it", at) + var residual : Expression? + let kName = qn("plk_k", at) + let dName = qn("plk_d", at) + let pName = qn("plk_p", at) + // keys-lane elements ARE the key, so the probe local doubles as the residual/projection binder + let bindName = adapter.lane == TableLane.KEYS ? kName : qn("plk_kv", at) + var predArg : Expression? if (n == 1) { // predicate-form terminators, and the keys-lane contains if ((termName == "any" || termName == "count") && termArgs == 2) { @@ -254,6 +297,7 @@ def try_table_point_lookup(var calls : array>; var a return null } } else { + // collapse_chained_wheres ran at flatten time, so a leading where run is already ONE call here if (calls[0]._1.name != "where_" || length(calls[0]._0.arguments) != 2) return null predArg = calls[0]._0.arguments[1] if (n == 3) { @@ -269,28 +313,29 @@ def try_table_point_lookup(var calls : array>; var a } if (predArg != null) { var predBody = peel_lambda_rename_var(predArg, bindName) - keyX = extract_key_probe(predBody, adapter.lane, bindName) + keyX = extract_key_probe(predBody, adapter.lane, bindName, residual) } if (keyX == null) return null let sn = adapter.srcName - // boolean / counting probes - if (termName == "any" || termName == "contains") { + // residual-free boolean / counting probes — bare key_exists, no element touch + if (residual == null && (termName == "any" || termName == "contains")) { var anyStmts <- qmacro_block_to_array() { return key_exists($i(sn), $e(keyX)) } return adapter->wrap_invoke(anyStmts, null, false, at) } - if (termName == "count") { + if (residual == null && termName == "count") { var cntStmts <- qmacro_block_to_array() { return key_exists($i(sn), $e(keyX)) ? 1 : 0 } return adapter->wrap_invoke(cntStmts, null, false, at) } - // element probes: first / first_or_default, ± trailing select - var retT = strip_const_ref(clone_type(termCall._type)) - retT.flags.removeConstant = false - let kName = qn("plk_k", at) - let dName = qn("plk_d", at) + // element probes: first / first_or_default ± select, and any/count carrying a residual conjunct + var retT : TypeDeclPtr + if (termName == "first" || termName == "first_or_default") { + retT = strip_const_ref(clone_type(termCall._type)) + retT.flags.removeConstant = false + } var stmts : array if (termName == "first_or_default") { // eager default bind, matching linq.das argument evaluation order @@ -301,50 +346,60 @@ def try_table_point_lookup(var calls : array>; var a stmts |> push <| qmacro_expr() { let $i(kName) = $e(keyX) } - var missTail : Expression? - if (termName == "first_or_default") { - missTail = qmacro_expr() { - return $i(dName) - } - } else { - missTail = qmacro_expr() { - panic("sequence contains no elements") - } - } if (adapter.lane == TableLane.KEYS) { stmts |> push <| qmacro_expr() { if (!key_exists($i(sn), $i(kName))) { - $e(missTail) + $e(make_probe_miss(termName, dName)) } } - if (selCall != null) { - var proj = peel_lambda_rename_var(selCall.arguments[1], kName) - stmts |> push <| qmacro_expr() { - return $e(proj) + } else { + // KV lane: probe the value pointer, materialize the (key, value) pair on hit. Table safe-index is + // unsafe (the pointer dangles on rehash) — fine here, the generated invoke never mutates the table. + stmts |> push_from <| qmacro_block_to_array() { + let $i(pName) = unsafe($i(sn)?[$i(kName)]) + if ($i(pName) == null) { + $e(make_probe_miss(termName, dName)) } - } else { + } + if (residual != null || selCall != null) { stmts |> push <| qmacro_expr() { - return $i(kName) + let $i(bindName) = (key = $i(kName), value = *$i(pName)) } } - return adapter->wrap_invoke(stmts, retT, false, at) } - // KV lane: probe the value pointer, materialize the (key, value) pair on hit. Table safe-index is - // unsafe (the pointer dangles on rehash) — fine here, the generated invoke never mutates the table. - let pName = qn("plk_p", at) - stmts |> push_from <| qmacro_block_to_array() { - let $i(pName) = unsafe($i(sn)?[$i(kName)]) - if ($i(pName) == null) { - $e(missTail) + // residual conjuncts evaluate on the probed element only — false routes to the same miss path + if (residual != null) { + stmts |> push <| qmacro_expr() { + if (!$e(residual)) { + $e(make_probe_miss(termName, dName)) + } } } + if (termName == "any") { + stmts |> push <| qmacro_expr() { + return true + } + return adapter->wrap_invoke(stmts, null, false, at) + } + if (termName == "count") { + stmts |> push <| qmacro_expr() { + return 1 + } + return adapter->wrap_invoke(stmts, null, false, at) + } if (selCall != null) { - let bName = qn("plk_kv", at) - var proj = peel_lambda_rename_var(selCall.arguments[1], bName) - stmts |> push_from <| qmacro_block_to_array() { - let $i(bName) = (key = $i(kName), value = *$i(pName)) + var proj = peel_lambda_rename_var(selCall.arguments[1], bindName) + stmts |> push <| qmacro_expr() { return $e(proj) } + } elif (adapter.lane == TableLane.KEYS) { + stmts |> push <| qmacro_expr() { + return $i(kName) + } + } elif (residual != null) { + stmts |> push <| qmacro_expr() { + return $i(bindName) + } } else { stmts |> push <| qmacro_expr() { return (key = $i(kName), value = *$i(pName)) diff --git a/doc/source/reference/linq_fold_patterns.rst b/doc/source/reference/linq_fold_patterns.rst index b3f17fcf5..f9035f631 100644 --- a/doc/source/reference/linq_fold_patterns.rst +++ b/doc/source/reference/linq_fold_patterns.rst @@ -150,7 +150,7 @@ Source-side entry points - Optional source — only when the ``pugixml`` module is linked (``require ?pugixml`` + ``static_if (typeinfo builtin_module_exists(pugixml))``). Emits an inlined DOM child-element walk replacing the generator, and **field-prunes** the per-element materialization (pass 2b): the chain body is scanned for the ``Row`` fields it reads, and only those attributes are read via ``read_xml_field`` into scalar locals — unread fields (notably ``string`` fields, whose ``clone_string`` is the alloc cost) are never touched, so a float-only chain runs alloc-free and JIT beats the equivalent SQLite query. A whole-row escape (``to_array`` / identity ``_select(_)`` / pass-to-fn) routes to the full ``build_xml_row`` instead. The ``XmlAdapter`` **rides every pattern row** (``try_splice_patterns`` runs with no ``onlyRow`` restriction); per-row ``requires`` predicates and the adapter's capability hooks (``can_join`` / ``can_group_by`` / ``defers_materialization`` / the ``non_array_source`` gate) decide what fuses, and a shape it can't fuse cascades to tier-2 — see :ref:`linq_fold_xml_patterns` for the full fuse/defer breakdown. ``unsafe`` is required (the source is ``[unsafe_outside_of_for]``) and the node is passed by value (``var root`` — ``_fold``'s macro-arg inference skips the const&→value copy). * - ``unsafe(each_kv(tab))`` / ``keys(tab)`` / ``values(tab)`` - ``extract_table_source`` (``TableAdapter``, ``daslib/linq_fold_table.das``) - - In-tree source — recognized by name **plus** a table-typed argument (``table`` / ``table``), so an unrelated user ``keys`` never fires it. The kv lane (``each_kv``) binds ``kv.key`` / ``kv.value`` and **usage-prunes the walk**: a chain touching only ``.value`` walks ``values(tab)`` alone, only ``.key`` (or neither) walks ``keys(tab)`` alone — half the slot-skip work of the zipped two-iterator form, which is emitted only when both sides (or the whole pair) are read. A whole-pair escape binds a named-tuple copy, so the kv lane fuses **copyable value types only** — a non-copyable-valued ``each_kv`` falls through and the surviving instantiation concept-asserts (error 31400; the keys/values lanes still fuse such tables). Bare ``count()`` / ``long_count()`` folds to O(1) ``length(tab)``; a plain ``distinct`` over raw keys/kv elements is **dropped** before matching (keys are unique by construction; only uniqueness-preserving prefix ops allow the drop — a preceding ``select`` keeps the distinct, and the values lane always keeps it). **Point-lookup folds** (``try_table_point_lookup``): a key-equality ``where`` (``kv.key == X``, bare ``k == X`` on the keys lane, either operand order; predicate-form ``any(p)`` / ``count(p)`` too) against a loop-invariant, side-effect-free ``X`` folds the whole walk to an O(1) probe — ``any`` / keys-lane ``contains(X)`` → ``key_exists(tab, X)``, ``count`` → ``key_exists ? 1 : 0``, ``first`` / ``first_or_default`` (± one trailing ``select``) → a ``tab?[X]`` probe with the scan's exact semantics (panic on a missing ``first``, eagerly-bound default value otherwise). Anything else — compound ``&&`` predicates, other comparison operators, an ``X`` that reads the binder or has side effects (the scan evaluates ``X`` per element, the probe once) — keeps the scan. ``order_by`` / ``take`` / ``first`` observe the table's unspecified slot order, exactly like a hand ``for (k, v in keys(t), values(t))`` loop. **Joins fuse on either side** (``can_join`` is on; the adapter rides the shared ``emit_array_join`` through its own ``wrap_source_loop``): a table *lead* walks its pruned slot iterator(s) as the probe loop; a table in the *srcB slot* joined on its bare key — ``d.key`` on the kv lane, the bare element on a ``keys(set)`` source — skips the join's internal ``table>`` entirely and probes the user's table per lead row (``join_keyb_is_bare_key`` + ``build_join_probe_pieces``; unique table keys make the probe ≡ hash semantics exactly). The probe is itself usage-pruned: count-no-where and key-only shapes stay on ``key_exists``, value shapes bind the matched value **by reference** from a ``tab?[k]`` pointer (no copy), and only a whole-pair use binds the kv tuple. A non-bare b-key keeps the hashed build over the kv iterator; ``group_join`` (outer — its result consumes the whole bucket) always keeps it. **``group_by`` fuses** (``can_group_by`` is on; ``build_group_by_adapter`` hands ``plan_group_by_core`` a fresh ``TableAdapter``, so the bucket-fill loop is the usage-pruned slot walk — a group key over ``kv.value.brand`` walks ``values(tab)`` alone) for the plain-lead shape only: ``join |> group_by`` over a table lead declines (the upstream-join arm returns null) and reverse has no backward slot walk — those shapes cascade to tier-2 (see ``benchmarks/sql/LINQ_TO_TABLE.md``). **``to_table()`` sinks fuse** (table-buffer materializer row above): the chain inserts straight into the result table — a bare ``each_kv(tab).to_table()`` is a reserve-ahead table clone through the fused walk, and a ``keys(tab)`` chain lands in the ``table`` set form. ``unsafe`` is required at an unfused chain head (the sources are ``[unsafe_outside_of_for]``); fused chains rewrite the head before inference. + - In-tree source — recognized by name **plus** a table-typed argument (``table`` / ``table``), so an unrelated user ``keys`` never fires it. The kv lane (``each_kv``) binds ``kv.key`` / ``kv.value`` and **usage-prunes the walk**: a chain touching only ``.value`` walks ``values(tab)`` alone, only ``.key`` (or neither) walks ``keys(tab)`` alone — half the slot-skip work of the zipped two-iterator form, which is emitted only when both sides (or the whole pair) are read. A whole-pair escape binds a named-tuple copy, so the kv lane fuses **copyable value types only** — a non-copyable-valued ``each_kv`` falls through and the surviving instantiation concept-asserts (error 31400; the keys/values lanes still fuse such tables). Bare ``count()`` / ``long_count()`` folds to O(1) ``length(tab)``; a plain ``distinct`` over raw keys/kv elements is **dropped** before matching (keys are unique by construction; only uniqueness-preserving prefix ops allow the drop — a preceding ``select`` keeps the distinct, and the values lane always keeps it). **Point-lookup folds** (``try_table_point_lookup``): a key-equality ``where`` (``kv.key == X``, bare ``k == X`` on the keys lane, either operand order; predicate-form ``any(p)`` / ``count(p)`` too) against a loop-invariant, side-effect-free ``X`` folds the whole walk to an O(1) probe — ``any`` / keys-lane ``contains(X)`` → ``key_exists(tab, X)``, ``count`` → ``key_exists ? 1 : 0``, ``first`` / ``first_or_default`` (± one trailing ``select``) → a ``tab?[X]`` probe with the scan's exact semantics (panic on a missing ``first``, eagerly-bound default value otherwise). The key-equality may carry **residual conjuncts** — ``kv.key == X && residual…`` with the key-equality as the *leftmost* conjunct (a leading run of consecutive ``where`` calls AND-merges first, so ``where(key == X) |> where(res)`` is the same shape): the probe evaluates the residual on the probed element only, a false residual routes to the same miss path, and no purity gate is needed on the residual because keys are unique — under ``&&`` short-circuit it runs at most once on both paths. Anything else — a compound predicate whose leftmost conjunct is not the key-equality, other comparison operators, an ``X`` that reads the binder or has side effects (the scan evaluates ``X`` per element, the probe once) — keeps the scan. ``order_by`` / ``take`` / ``first`` observe the table's unspecified slot order, exactly like a hand ``for (k, v in keys(t), values(t))`` loop. **Joins fuse on either side** (``can_join`` is on; the adapter rides the shared ``emit_array_join`` through its own ``wrap_source_loop``): a table *lead* walks its pruned slot iterator(s) as the probe loop; a table in the *srcB slot* joined on its bare key — ``d.key`` on the kv lane, the bare element on a ``keys(set)`` source — skips the join's internal ``table>`` entirely and probes the user's table per lead row (``join_keyb_is_bare_key`` + ``build_join_probe_pieces``; unique table keys make the probe ≡ hash semantics exactly). The probe is itself usage-pruned: count-no-where and key-only shapes stay on ``key_exists``, value shapes bind the matched value **by reference** from a ``tab?[k]`` pointer (no copy), and only a whole-pair use binds the kv tuple. A non-bare b-key keeps the hashed build over the kv iterator; ``group_join`` (outer — its result consumes the whole bucket) always keeps it. **``group_by`` fuses** (``can_group_by`` is on; ``build_group_by_adapter`` hands ``plan_group_by_core`` a fresh ``TableAdapter``, so the bucket-fill loop is the usage-pruned slot walk — a group key over ``kv.value.brand`` walks ``values(tab)`` alone) for the plain-lead shape only: ``join |> group_by`` over a table lead declines (the upstream-join arm returns null) and reverse has no backward slot walk — those shapes cascade to tier-2 (see ``benchmarks/sql/LINQ_TO_TABLE.md``). **``to_table()`` sinks fuse** (table-buffer materializer row above): the chain inserts straight into the result table — a bare ``each_kv(tab).to_table()`` is a reserve-ahead table clone through the fused walk, and a ``keys(tab)`` chain lands in the ``table`` set form. ``unsafe`` is required at an unfused chain head (the sources are ``[unsafe_outside_of_for]``); fused chains rewrite the head before inference. * - ``unsafe(from_json(jv, type))`` - ``extract_json_source`` (``JsonAdapter``, ``daslib/linq_fold_json.das``) - In-tree source — the adapter is compiled in unconditionally (no ``static_if`` gate, unlike XML's pugixml one), but a program only pulls JSON into scope by requiring ``json`` / ``json_boost`` itself. ``extract_json_source`` matches a ``from_json`` whose first argument is a ``json::JsonValue?``, so a JSON-less program returns null and the chain falls to the array tier. The adapter pulls in **no** json dependency — it emits ``from_json`` / ``read_json_field`` by name (resolved at the user's splice site, like ``linq_fold_decs`` emits ``for_each_archetype``; ``from_JV`` is emitted only for a non-struct element type). Emits an inlined ``for (e in jv.value as _array)`` walk replacing the generator, and **field-prunes** the per-element materialization (pass 2b): only the keys the chain reads are pulled via ``read_json_field`` by name — unread keys (notably ``string`` fields whose materialization clones) are never touched, so a scalar-only chain skips ~all of the full per-row build (3.6× over the full materialize — see ``benchmarks/micro/json_source_shapes.das``). A whole-row escape reads **every** top-level field by name (``emit_full_row_by_name``), so a custom whole-row ``from_JV(Row)`` override is **not** honored (Option B — this is a flat query source, not a deserializer; materialize the array with an explicit ``from_JV`` first for that). ``unsafe`` is required (the source is ``[unsafe_outside_of_for]``). Deferred materialization mirrors XML: order/distinct/take buffer a cheap ``(orderKey, JsonValue?)`` surrogate and materialize only the K survivors — by name (``emit_full_row_by_name``), so a struct survivor reads each field by key; only a non-struct ``Row`` falls back to ``outBind <- from_JV(handle, type)``. The ``JsonAdapter`` also fuses ``join`` / ``join |> group_by`` (``emit_join_hook`` + ``JsonJoinAdapter`` off ``build_group_by_adapter``'s upstream-join arm), reusing the array-join machinery (``build_join_standalone_pieces`` / ``build_join_adapter_pieces``): srcB is collected into a ``table>`` and the field-pruned array walk is the probe side, so the join key reads only its own field per element (e.g. ``read_json_field(jcur, "brand", …)``). Standalone ``group_join`` and a trailing ``where`` / ``select`` / ``count`` over group-join rows defer to tier-2, mirroring XML. diff --git a/tests/linq/test_linq_table_source.das b/tests/linq/test_linq_table_source.das index 6c78bec35..87f1b36c6 100644 --- a/tests/linq/test_linq_table_source.das +++ b/tests/linq/test_linq_table_source.das @@ -210,8 +210,9 @@ def private bump_key(var c : int&) : int { return 2 } -// Point-lookup folds: `where(kv.key == X)` + terminator → O(1) key probe (key_exists / `tab?[X]`). -// Probes must agree with the scan on hit AND miss; non-probe shapes must keep riding the scan. +// Point-lookup folds: `where(kv.key == X [&& residual])` + terminator → O(1) key probe (key_exists / +// `tab?[X]`), residual conjuncts evaluated on the probed element only. Probes must agree with the scan +// on hit AND miss; non-probe shapes (key-eq not the leftmost conjunct) must keep riding the scan. [test] def test_table_point_lookup(t : T?) { @@ -251,7 +252,46 @@ def test_table_point_lookup(t : T?) { t |> equal(_fold(keys(s).contains("z")), false) delete s } - t |> run("first probe panics on a missing key, like the scan") @(t : T?) { + t |> run("residual conjuncts route to the probed element only") @(t : T?) { + var tab <- make_int_table(10) + let h = _fold(each_kv(tab)._where(_.key == 5 && _.value > 0).first_or_default(default)) + t |> equal(h.key, 5) + t |> equal(h.value, 50) + let rf = _fold(each_kv(tab)._where(_.key == 5 && _.value > 500).first_or_default(default)) + t |> equal(rf.key, 0, "residual-false on a hit routes to the miss path") + let ms = _fold(each_kv(tab)._where(_.key == 99 && _.value > 0).first_or_default(default)) + t |> equal(ms.key, 0) + t |> equal(_fold(each_kv(tab)._any(_.key == 3 && _.value == 30)), true) + t |> equal(_fold(each_kv(tab)._any(_.key == 3 && _.value == 31)), false) + t |> equal(_fold(each_kv(tab)._count(_.key == 3 && _.value == 30)), 1) + t |> equal(_fold(each_kv(tab)._count(_.key == 3 && _.value == 31)), 0) + // three conjuncts AND-rebuild in order; trailing select projects the probed element + t |> equal(_fold(each_kv(tab)._where(_.key == 6 && _.value > 0 && _.value < 100)._select(_.value).first_or_default(-1)), 60) + // `key == 2 && key == 3` probes 2, then the residual re-check fails — agrees with the scan + t |> equal(_fold(each_kv(tab)._where(_.key == 2 && _.key == 3).count()), 0) + delete tab + } + t |> run("consecutive wheres AND-merge before conjunct extraction") @(t : T?) { + var tab <- make_int_table(10) + t |> equal(_fold(each_kv(tab)._where(_.key == 5)._where(_.value > 0).any()), true) + t |> equal(_fold(each_kv(tab)._where(_.key == 5)._where(_.value > 500).any()), false) + t |> equal(_fold(each_kv(tab)._where(_.key == 4)._where(_.value > 0)._select(_.value).first()), 40) + delete tab + } + t |> run("keys lane residual") @(t : T?) { + var tab <- make_int_table(10) + t |> equal(_fold(keys(tab)._where(_ == 8 && _ % 2 == 0).first_or_default(-1)), 8) + t |> equal(_fold(keys(tab)._where(_ == 7 && _ % 2 == 0).first_or_default(-1)), -1) + delete tab + } + t |> run("residual evaluates once, like the scan (keys are unique)") @(t : T?) { + var tab <- make_int_table(4) + var evals = 0 + t |> equal(_fold(each_kv(tab)._where(_.key == 2 && bump_key(evals) > 0).count()), 1) + t |> equal(evals, 1, "residual must run once, on the probed element") + delete tab + } + t |> run("first probe panics on a missing key or a failed residual, like the scan") @(t : T?) { var tab <- make_int_table(4) var panicked = false try { @@ -260,13 +300,20 @@ def test_table_point_lookup(t : T?) { panicked = true } t |> equal(panicked, true) + panicked = false + try { + let _r = _fold(each_kv(tab)._where(_.key == 2 && _.value > 999).first()) + } recover { + panicked = true + } + t |> equal(panicked, true) delete tab } t |> run("non-probe shapes stay scans and stay correct") @(t : T?) { var tab <- make_int_table(10) - t |> equal(_fold(each_kv(tab)._where(_.key != 5).count()), 9) // wrong operator - t |> equal(_fold(each_kv(tab)._where(_.key == _.value / 10).count()), 10) // X references the binder - t |> equal(_fold(each_kv(tab)._where(_.key == 5)._where(_.value > 0).any()), true) // collapses to a compound && predicate + t |> equal(_fold(each_kv(tab)._where(_.key != 5).count()), 9) // wrong operator + t |> equal(_fold(each_kv(tab)._where(_.key == _.value / 10).count()), 10) // X references the binder + t |> equal(_fold(each_kv(tab)._where(_.value > 0)._where(_.key == 5).any()), true) // key-eq not leftmost delete tab } t |> run("impure X stays a scan — per-element evaluation preserved") @(t : T?) {