diff --git a/CLAUDE.md b/CLAUDE.md
index 02a293eba..ad985b387 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -286,6 +286,7 @@ Full migration table (when reading older docs that say `var inscope` or `<-` for
 | `for (s in A) { B \|> push(s) }` / `push_clone(s)` (iter-var only) | `B \|> push_from(A)` / `push_clone_from(A)` | PERF022: the bulk overload in builtin.das reserves combined capacity up front. Single name `push`/`push_clone` is overloaded between single-element and bulk (ambiguous when destination is `array<T[]>`); the `_from` suffix names the bulk intent. Source must be `array<T>` or C-array — range/iterator sources are not flagged. `emplace` is out of scope (const iter-var can't be moved) |
 | `var a : array<T>; for (x in SRC) { if (COND) { a \|> push(EXPR) } }` (or `table<K;V>` + `insert`/`a[k]=v`) | `var a <- [for (x in SRC); EXPR; where COND]` (or `\{for (...); k => v; where ...\}`) | STYLE027: var with empty default-init followed by a for-loop that only push/insert into it. Accepts depth ≤ 2 nested fors and if-filters at any depth. `emplace` excluded — move-source-zeroing differs from comprehension element-construction. Iterator-comprehension form (`[$f ...]`) NOT suggested |
 | `var X = clone_expression(E); ... $e(X) ...` (only-uses-are-qmacro-splice) | drop the pre-clone, inline `$e(E)` at each splice site | PERF023: `qmacro`/`qmacro_block`/`qmacro_expr`/`qmacro_block_to_array` go through `apply_template` (templates_boost.das:251), which calls `clone_expression` on every substitution input. Pre-cloning is wasted work. Detection: post-expansion `$e(X)` becomes `add_ptr_ref(X)` inside an `ExprMakeBlock`; visitor tracks splice-wrapper depth via preVisitExprCall/visitExprCall counter on `add_ptr_ref`, classifies each candidate `ExprVar` reference as "safe" when depth>0. Fires only when ALL uses are safe AND ≥1 is observed. Multi-clone-of-same-source flagged too — apply_template clones each substitution independently |
+| hand-rolled `is X` / `as X` / null-guard / `ExprRef2Value`-peel ladders in macro code | `qmatch(e, $e(a) + $e(b))` for source-syntax shapes; `match (e) { if (ExprField(name = "key", value = ExprVar(...))) { ... } }` for node-class shapes | both matchers peel `ExprRef2Value` automatically; `\|\|` alternation, `&&` guards, and `match_expr(local)` cover most ladders. Limits + the qmatch↔match division of labor: `skills/das_macros.md` "`match` (daslib/match)" |
 
 For path/filename ops use `fio` helpers (`base_name`/`dir_name`/`path_join`/etc.) — see `skills/filesystem.md`. Never hand-roll `rfind("/")` / slice — misses Windows separators.
 
diff --git a/benchmarks/sql/LINQ_TO_TABLE.md b/benchmarks/sql/LINQ_TO_TABLE.md
index 85c002a82..20609cec8 100644
--- a/benchmarks/sql/LINQ_TO_TABLE.md
+++ b/benchmarks/sql/LINQ_TO_TABLE.md
@@ -243,13 +243,27 @@ PR1 findings:
 
 End of arc: `skills/linq.md` + linq docs mention the table source.
 
-## Late stage (planned) — reducer shapes & general code hygiene
+## Late stage (IN PROGRESS, 2026-06-11) — reducer shapes & general code hygiene
 
 Cross-source cleanups; none are table-specific. Items 1–2 are user-facing reducer-shape fixes,
 items 3–4 are codebase hygiene investigations (the linq_fold surface is workable but "a tad too
 unwieldy" — the table adapter took several stages, and many fuses read as "add this hook, because
 reasons" rather than falling out of the architecture).
 
+Status: **items 1+2 DONE** (selector overloads `sum/min/max/average(src, selector)`, the
+`_select`-macro bucket-element type stamping, the 2-arg recognizer arm with identity
+canonicalization — branch `bbatkin/linq-reducer-shapes`). **Item 4 partially DONE** (4A
+upstream-join validation dedup + 4B `loop_source_expr`/`loop_source_name` recontract — branch
+`bbatkin/linq-adapter-hygiene`; reverse hooks kept as-is per audit, stringly-Captures →
+typed ChainView deferred, per-source dispatch can't become a registry — macro modules compile
+into separate contexts). **Item 3 DONE with scope notes** (key-probe matchers + `join_keyb_is_bare_key`
+converted to `daslib/match` class patterns — n.b. the AST conversions use BOTH `daslib/ast_match`
+(qmatch) and `daslib/match`, each where it fits; prereq landed = ExprRef2Value transparency in
+match.das. Declined: `is_bucket_reducer_call` (statement-shaped match doesn't fit a
+tuple-returning recognizer) and `extract_decs_bridge` (match.das array patterns reject
+das-vector scrutinees — revisit if the library grows them). Toolbox doc:
+`skills/das_macros.md` "`match` (daslib/match)".
+
 1. **Identity-lambda reducers**: `_._1 |> max($(v) => v)` (also `min`/`sum`/`average`) fails with
    30303 today — the untyped lambda can't infer on the tier-2 lazy-bucket surface, and
    `recognize_reducer_specs` has no identity arm either. Fix both ends: recognize the identity
diff --git a/benchmarks/sql/results.md b/benchmarks/sql/results.md
index 3903f6fb3..2acab79f7 100644
--- a/benchmarks/sql/results.md
+++ b/benchmarks/sql/results.md
@@ -40,177 +40,177 @@ signal, JIT deltas as indicative.**
 
 | Benchmark | SQL (m1) | Array (m3f) | Decs (m4) | XML fold (m5f) | JSON fold (m6f) | Table fold (m7) |
 |---|---:|---:|---:|---:|---:|---:|
-| `aggregate_match` | 35.0 | 5.9 | 5.8 | 60.8 | 159.8 | 19.1 |
-| `all_match` | 27.8 | 3.5 | 3.4 | 56.4 | 155.6 | 15.9 |
+| `aggregate_match` | 35.0 | 5.8 | 5.8 | 60.5 | 152.8 | 19.1 |
+| `all_match` | 27.9 | 3.5 | 3.4 | 56.2 | 146.1 | 15.8 |
 | `any_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
-| `average_aggregate` | 30.6 | 6.1 | 8.7 | 58.7 | 164.0 | 17.3 |
+| `average_aggregate` | 30.3 | 6.2 | 8.7 | 58.6 | 156.9 | 17.2 |
 | `bare_last` | — | 4.2 | 0.0 | 0.0 | 4.2 | 31.0 |
-| `bare_order_where` | 280.6 | 116.8 | 125.5 | 300.5 | 289.3 | 162.6 |
-| `chained_select_collapse` | — | 17.6 | 17.5 | 70.5 | 164.8 | 28.0 |
-| `chained_where` | 36.7 | 6.6 | 7.1 | 105.6 | 183.3 | 24.0 |
-| `contains_match` | 0.0 | 2.2 | 1.4 | 29.2 | 73.0 | 6.6 |
-| `count_aggregate` | 29.8 | 4.2 | 4.1 | 63.9 | 154.3 | 20.2 |
-| `cross_join` | 12641.5 | 3703.0 | — | 4040.3 | 4032.0 | — |
-| `decs_count_bare_pred` | — | — | 4.2 | — | — | — |
-| `distinct_by_count` | 41.1 | 15.8 | 15.8 | 71.2 | 162.6 | 26.5 |
-| `distinct_by_order_take` | 245.5 | 22.2 | 23.6 | 123.7 | 162.2 | 48.7 |
-| `distinct_by_order_to_array` | 247.9 | 22.1 | 23.5 | 125.1 | 163.8 | 48.6 |
-| `distinct_count` | 41.5 | 15.6 | 15.8 | 71.2 | 161.9 | 27.1 |
-| `distinct_count_pred` | 256.2 | 15.8 | 15.8 | 112.8 | 177.5 | 26.3 |
+| `bare_order_where` | 282.3 | 117.3 | 126.3 | 301.5 | 287.8 | 162.7 |
+| `chained_select_collapse` | — | 17.5 | 17.6 | 70.5 | 160.3 | 28.1 |
+| `chained_where` | 37.0 | 6.8 | 7.1 | 105.0 | 175.7 | 24.1 |
+| `contains_match` | 0.0 | 2.2 | 1.4 | 27.8 | 68.4 | 6.5 |
+| `count_aggregate` | 30.1 | 4.2 | 4.1 | 63.6 | 147.2 | 20.2 |
+| `cross_join` | 12614.0 | 3686.9 | — | 4012.4 | 4033.2 | — |
+| `decs_count_bare_pred` | — | — | 4.1 | — | — | — |
+| `distinct_by_count` | 41.2 | 15.9 | 15.8 | 70.9 | 156.2 | 26.1 |
+| `distinct_by_order_take` | 242.3 | 22.1 | 23.2 | 125.0 | 158.2 | 48.7 |
+| `distinct_by_order_to_array` | 242.4 | 22.2 | 23.3 | 125.7 | 158.6 | 48.5 |
+| `distinct_count` | 41.3 | 15.8 | 15.7 | 71.1 | 154.4 | 27.2 |
+| `distinct_count_pred` | 253.7 | 16.0 | 15.7 | 112.5 | 171.2 | 26.1 |
 | `distinct_take` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
 | `element_at_match` | 0.0 | 0.0 | 0.0 | 0.4 | 0.3 | 0.0 |
 | `first_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
 | `first_or_default_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
-| `groupby_average` | 171.1 | 29.2 | 29.2 | 123.8 | 217.9 | 40.9 |
-| `groupby_count` | 141.4 | 19.1 | 19.2 | 74.8 | 171.7 | 30.8 |
-| `groupby_first` | 257.2 | 19.1 | 19.8 | 72.5 | 162.9 | 40.2 |
-| `groupby_having_count` | 141.6 | 19.2 | 19.2 | 75.5 | 168.3 | 30.6 |
-| `groupby_having_hidden_sum` | 176.8 | 22.2 | 22.6 | 118.4 | 192.2 | 33.5 |
-| `groupby_having_post_where` | 172.0 | 20.4 | 20.5 | 114.7 | 188.7 | 31.7 |
-| `groupby_max` | 174.6 | 24.8 | 24.9 | 119.4 | 193.4 | 34.2 |
-| `groupby_min` | 174.1 | 25.5 | 25.3 | 120.0 | 191.8 | 34.3 |
-| `groupby_multi_reducer` | 190.7 | 30.6 | 30.1 | 125.0 | 195.9 | 43.2 |
-| `groupby_select_order` | 171.5 | 20.4 | 20.5 | 114.8 | 188.2 | 31.6 |
-| `groupby_select_sum` | 198.6 | 38.5 | 38.6 | 101.9 | 193.6 | 49.4 |
-| `groupby_sum` | 170.7 | 20.5 | 20.5 | 114.7 | 187.7 | 31.5 |
-| `groupby_where_count` | 76.0 | 13.8 | 14.5 | 116.5 | 185.7 | 30.2 |
-| `groupby_where_sum` | 87.1 | 14.2 | 14.8 | 117.5 | 186.2 | 30.6 |
-| `join_count` | 38.6 | 52.3 | 64.4 | 112.9 | 183.8 | 65.0 |
-| `join_groupby_count` | 158.9 | 77.1 | 89.0 | 178.8 | 230.9 | 260.6 |
-| `join_groupby_to_array` | 191.1 | 78.5 | 90.7 | 215.8 | 212.8 | 290.5 |
-| `join_probe` | — | — | — | — | — | 46.8 |
-| `join_probe_build` | — | — | — | — | — | 80.8 |
-| `join_select` | 150.7 | 73.7 | 84.2 | 194.2 | 214.5 | 223.2 |
-| `join_where_count` | 39.6 | 62.0 | 75.7 | 161.1 | 198.9 | 80.2 |
-| `last_match` | 0.0 | 5.7 | 14.0 | 65.4 | 160.2 | 31.4 |
-| `long_count_aggregate` | 29.7 | 4.1 | 4.1 | 63.8 | 154.5 | 20.1 |
-| `max_aggregate` | 31.0 | 6.1 | 6.9 | 59.0 | 163.6 | 16.9 |
-| `min_aggregate` | 30.9 | 6.1 | 6.8 | 59.3 | 163.2 | 16.9 |
-| `order_by_multi_key` | 341.9 | 274.9 | 283.2 | 459.1 | 445.7 | 333.6 |
-| `order_distinct_take` | 138.9 | 15.9 | 100.1 | 73.0 | 163.1 | 31.0 |
-| `order_reverse_normalized` | 38.5 | 16.3 | 20.0 | 70.7 | 170.8 | 32.9 |
-| `order_take_desc` | 38.6 | 16.4 | 20.0 | 70.8 | 170.4 | 33.0 |
+| `groupby_average` | 173.5 | 29.2 | 29.2 | 123.7 | 187.6 | 40.9 |
+| `groupby_count` | 142.8 | 19.7 | 19.1 | 74.7 | 159.0 | 30.8 |
+| `groupby_first` | 253.1 | 19.1 | 19.8 | 72.3 | 155.3 | 40.1 |
+| `groupby_having_count` | 141.5 | 19.2 | 19.1 | 75.1 | 158.6 | 30.7 |
+| `groupby_having_hidden_sum` | 176.7 | 22.5 | 22.2 | 118.1 | 183.3 | 33.6 |
+| `groupby_having_post_where` | 172.3 | 20.5 | 20.5 | 114.4 | 180.3 | 31.8 |
+| `groupby_max` | 175.3 | 25.1 | 24.9 | 119.3 | 183.0 | 34.4 |
+| `groupby_min` | 175.5 | 25.6 | 25.2 | 119.7 | 183.9 | 34.5 |
+| `groupby_multi_reducer` | 192.5 | 30.7 | 30.1 | 125.0 | 187.9 | 43.0 |
+| `groupby_select_order` | 172.4 | 20.5 | 20.5 | 114.8 | 180.3 | 31.6 |
+| `groupby_select_sum` | 198.4 | 39.6 | 38.5 | 101.4 | 185.4 | 49.4 |
+| `groupby_sum` | 175.0 | 20.5 | 20.4 | 113.9 | 179.9 | 31.5 |
+| `groupby_where_count` | 76.5 | 13.9 | 14.5 | 116.2 | 178.8 | 29.9 |
+| `groupby_where_sum` | 88.0 | 14.2 | 14.7 | 117.0 | 181.9 | 29.9 |
+| `join_count` | 38.5 | 52.4 | 64.7 | 112.7 | 177.1 | 65.0 |
+| `join_groupby_count` | 157.7 | 77.4 | 89.0 | 178.3 | 223.4 | 260.9 |
+| `join_groupby_to_array` | 190.6 | 79.0 | 91.5 | 214.9 | 212.3 | 290.9 |
+| `join_probe` | — | — | — | — | — | 47.1 |
+| `join_probe_build` | — | — | — | — | — | 81.2 |
+| `join_select` | 149.7 | 73.2 | 85.2 | 194.3 | 206.2 | 223.1 |
+| `join_where_count` | 40.0 | 62.3 | 76.1 | 160.8 | 192.9 | 80.6 |
+| `last_match` | 0.0 | 5.8 | 14.0 | 65.1 | 152.1 | 31.6 |
+| `long_count_aggregate` | 29.8 | 4.2 | 4.1 | 63.5 | 147.5 | 20.7 |
+| `max_aggregate` | 31.3 | 6.1 | 6.8 | 59.0 | 155.3 | 17.1 |
+| `min_aggregate` | 31.3 | 6.1 | 6.8 | 93.0 | 155.1 | 17.2 |
+| `order_by_multi_key` | 335.4 | 277.1 | 281.9 | 458.1 | 443.1 | 333.9 |
+| `order_distinct_take` | 137.7 | 15.9 | 98.5 | 73.0 | 155.2 | 31.1 |
+| `order_reverse_normalized` | 38.4 | 16.3 | 20.0 | 71.2 | 162.5 | 33.3 |
+| `order_take_desc` | 38.7 | 16.5 | 20.0 | 70.1 | 162.6 | 33.4 |
 | `point_lookup` | — | — | — | — | — | 0.0 |
 | `point_lookup_residual` | — | — | — | — | — | 0.0 |
-| `point_lookup_scan` | — | — | — | — | — | 8.5 |
-| `reverse_distinct_by` | 299.0 | 21.2 | 27.7 | 71.2 | 162.4 | 44.2 |
-| `reverse_take` | 0.1 | 0.0 | 0.2 | 0.0 | 26.3 | 58.7 |
-| `reverse_take_select` | 0.0 | 0.0 | 0.2 | 0.0 | 26.2 | 58.6 |
-| `select_count` | 0.1 | 0.0 | 2.2 | 69.6 | 2.2 | 0.0 |
-| `select_many` | — | 191.6 | — | — | — | — |
-| `select_where` | 197.3 | 11.0 | 19.3 | 196.5 | 183.9 | 37.5 |
-| `select_where_count` | 32.9 | 5.2 | 7.4 | 64.7 | 158.0 | 22.5 |
-| `select_where_order_take` | 37.2 | 12.2 | 15.0 | 73.1 | 164.7 | 34.5 |
-| `select_where_sum` | 37.4 | 7.5 | 7.5 | 66.7 | 162.3 | 23.3 |
-| `single_match` | 0.0 | 2.8 | 5.4 | 58.7 | 150.6 | 22.8 |
+| `point_lookup_scan` | — | — | — | — | — | 8.6 |
+| `reverse_distinct_by` | 297.2 | 21.2 | 27.8 | 70.8 | 155.4 | 43.7 |
+| `reverse_take` | 0.1 | 0.0 | 0.2 | 0.0 | 26.2 | 58.4 |
+| `reverse_take_select` | 0.0 | 0.0 | 0.2 | 0.0 | 26.2 | 58.5 |
+| `select_count` | 0.1 | 0.0 | 2.2 | 64.4 | 2.2 | 0.0 |
+| `select_many` | — | 191.1 | — | — | — | — |
+| `select_where` | 198.1 | 11.5 | 19.4 | 196.0 | 184.9 | 37.7 |
+| `select_where_count` | 33.1 | 5.2 | 7.4 | 64.8 | 152.9 | 22.3 |
+| `select_where_order_take` | 37.0 | 12.3 | 14.9 | 72.5 | 157.6 | 34.5 |
+| `select_where_sum` | 37.3 | 7.5 | 7.5 | 66.6 | 154.3 | 23.4 |
+| `single_match` | 0.0 | 2.8 | 5.5 | 55.9 | 143.4 | 23.0 |
 | `skip_take` | 0.5 | 0.1 | 0.2 | 3.0 | 2.8 | 0.3 |
-| `skip_while_match` | 3.5 | 5.2 | 5.3 | 60.3 | 153.6 | 18.2 |
-| `sort_first` | 38.7 | 11.0 | 13.5 | 65.6 | 167.0 | 31.5 |
-| `sort_take` | 38.7 | 16.1 | 20.2 | 71.1 | 170.6 | 32.7 |
-| `sort_take_select` | 38.5 | 16.4 | 20.2 | 71.3 | 171.4 | 33.1 |
-| `sum_aggregate` | 30.5 | 2.1 | 2.1 | 54.7 | 152.7 | 13.4 |
-| `sum_where` | 33.1 | 4.3 | 4.3 | 63.7 | 154.0 | 21.0 |
+| `skip_while_match` | 3.5 | 5.2 | 5.3 | 57.6 | 146.2 | 18.2 |
+| `sort_first` | 38.3 | 11.1 | 13.3 | 65.1 | 159.4 | 31.7 |
+| `sort_take` | 38.6 | 16.1 | 20.1 | 70.6 | 162.9 | 32.8 |
+| `sort_take_select` | 38.6 | 16.4 | 20.1 | 70.7 | 162.8 | 33.1 |
+| `sum_aggregate` | 30.1 | 2.1 | 2.1 | 54.9 | 145.8 | 13.4 |
+| `sum_where` | 32.9 | 4.2 | 4.3 | 63.6 | 147.2 | 20.5 |
 | `take_count` | 3.6 | 0.2 | 0.4 | 2.9 | 2.7 | 0.5 |
 | `take_count_filtered` | 1.1 | 0.2 | 0.2 | 1.3 | 1.1 | 0.3 |
 | `take_sum_aggregate` | 0.8 | 0.1 | 0.1 | 0.6 | 0.5 | 0.1 |
 | `take_where_count` | 0.9 | 0.1 | 0.1 | 0.7 | 0.6 | 0.2 |
-| `take_while_match` | 8.2 | 2.4 | 2.5 | 30.4 | 75.6 | 16.4 |
-| `to_array_filter` | 70.9 | 11.7 | 11.8 | 71.6 | 164.5 | 29.2 |
-| `to_table` | — | 18.6 | 143.9 | 117.9 | 143.7 | 32.5 |
-| `to_table_staged` | — | 55.7 | 57.7 | 143.7 | 167.5 | 69.7 |
-| `where_join_count` | 41.8 | 29.5 | 41.2 | 132.6 | 167.6 | 47.9 |
-| `zip_count_pred` | 39.5 | 15.8 | — | 315.7 | 319.9 | — |
-| `zip_dot_product` | 49.2 | 12.7 | 10.5 | 310.1 | 316.0 | — |
-| `zip_dot_product_3arg` | 48.7 | 12.8 | — | 310.3 | 316.5 | — |
-| `zip_reverse_to_array` | — | 31.9 | — | 344.9 | 350.7 | — |
+| `take_while_match` | 8.2 | 2.5 | 2.5 | 28.9 | 70.3 | 16.4 |
+| `to_array_filter` | 70.3 | 12.1 | 11.8 | 71.3 | 155.8 | 28.7 |
+| `to_table` | — | 18.7 | 141.3 | 117.1 | 140.6 | 32.3 |
+| `to_table_staged` | — | 56.0 | 57.6 | 143.0 | 165.4 | 69.0 |
+| `where_join_count` | 41.8 | 29.6 | 41.3 | 132.8 | 162.5 | 46.9 |
+| `zip_count_pred` | 39.9 | 15.8 | — | 315.5 | 319.1 | — |
+| `zip_dot_product` | 47.0 | 12.6 | 10.7 | 308.7 | 315.5 | — |
+| `zip_dot_product_3arg` | 46.9 | 12.6 | — | 308.9 | 314.2 | — |
+| `zip_reverse_to_array` | — | 32.0 | — | 344.3 | 350.5 | — |
 
 ## JIT
 
 | Benchmark | SQL (m1) | Array (m3f) | Decs (m4) | XML fold (m5f) | JSON fold (m6f) | Table fold (m7) |
 |---|---:|---:|---:|---:|---:|---:|
-| `aggregate_match` | 34.9 | 0.3 | 0.7 | 18.8 | 26.4 | 7.3 |
-| `all_match` | 27.6 | 0.3 | 0.2 | 18.4 | 25.4 | 7.2 |
+| `aggregate_match` | 34.9 | 0.3 | 0.7 | 18.8 | 27.3 | 7.3 |
+| `all_match` | 27.7 | 0.3 | 0.2 | 18.4 | 25.1 | 7.2 |
 | `any_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
-| `average_aggregate` | 30.2 | 1.0 | 3.6 | 18.4 | 24.8 | 7.4 |
-| `bare_last` | — | 0.4 | 0.0 | 0.0 | 0.0 | 8.8 |
-| `bare_order_where` | 187.2 | 34.2 | 35.7 | 104.2 | 53.2 | 68.7 |
-| `chained_select_collapse` | — | 1.1 | 1.1 | 20.6 | 33.7 | 8.3 |
-| `chained_where` | 36.9 | 0.6 | 0.9 | 38.3 | 31.6 | 10.6 |
-| `contains_match` | 0.0 | 0.2 | 0.1 | 17.5 | 8.9 | 2.5 |
-| `count_aggregate` | 29.6 | 0.3 | 0.6 | 23.4 | 25.6 | 7.3 |
-| `cross_join` | 5998.8 | 737.5 | — | 830.4 | 766.4 | — |
+| `average_aggregate` | 29.8 | 1.0 | 3.5 | 18.3 | 24.4 | 7.4 |
+| `bare_last` | — | 0.4 | 0.0 | 0.0 | 0.0 | 8.9 |
+| `bare_order_where` | 185.3 | 33.8 | 34.8 | 104.0 | 51.8 | 68.4 |
+| `chained_select_collapse` | — | 1.1 | 1.1 | 20.5 | 32.0 | 8.2 |
+| `chained_where` | 36.8 | 0.6 | 0.9 | 38.2 | 31.8 | 10.6 |
+| `contains_match` | 0.0 | 0.2 | 0.1 | 14.8 | 9.2 | 2.5 |
+| `count_aggregate` | 29.3 | 0.3 | 0.6 | 23.2 | 25.1 | 7.2 |
+| `cross_join` | 5967.1 | 712.3 | — | 827.0 | 759.4 | — |
 | `decs_count_bare_pred` | — | — | 0.6 | — | — | — |
-| `distinct_by_count` | 41.3 | 1.1 | 1.1 | 20.6 | 33.8 | 8.1 |
-| `distinct_by_order_take` | 241.8 | 1.7 | 2.6 | 44.7 | 38.8 | 19.4 |
-| `distinct_by_order_to_array` | 241.0 | 1.7 | 2.7 | 45.2 | 38.6 | 19.7 |
-| `distinct_count` | 41.6 | 1.1 | 1.1 | 20.7 | 33.6 | 8.1 |
-| `distinct_count_pred` | 252.6 | 1.1 | 1.3 | 37.7 | 43.5 | 8.0 |
+| `distinct_by_count` | 40.9 | 1.1 | 1.1 | 20.5 | 31.9 | 8.0 |
+| `distinct_by_order_take` | 250.9 | 1.7 | 2.6 | 44.6 | 36.9 | 19.4 |
+| `distinct_by_order_to_array` | 253.4 | 1.7 | 2.7 | 45.1 | 36.8 | 19.6 |
+| `distinct_count` | 41.5 | 1.1 | 1.1 | 20.5 | 31.9 | 8.0 |
+| `distinct_count_pred` | 262.0 | 1.1 | 1.3 | 37.6 | 41.6 | 8.1 |
 | `distinct_take` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
 | `element_at_match` | 0.0 | 0.0 | 0.0 | 0.1 | 0.0 | 0.0 |
 | `first_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
 | `first_or_default_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
-| `groupby_average` | 171.5 | 1.6 | 1.8 | 36.8 | 45.6 | 8.9 |
-| `groupby_count` | 141.1 | 1.3 | 1.5 | 20.7 | 34.0 | 8.5 |
-| `groupby_first` | 254.1 | 1.3 | 2.3 | 20.6 | 34.7 | 10.0 |
-| `groupby_having_count` | 141.5 | 1.3 | 1.5 | 20.7 | 34.1 | 8.5 |
-| `groupby_having_hidden_sum` | 175.3 | 1.5 | 1.9 | 36.6 | 45.2 | 8.6 |
-| `groupby_having_post_where` | 171.9 | 1.4 | 1.9 | 36.6 | 44.3 | 8.5 |
-| `groupby_max` | 174.1 | 1.5 | 1.9 | 36.8 | 45.6 | 8.5 |
-| `groupby_min` | 175.0 | 1.5 | 2.0 | 36.8 | 46.2 | 8.5 |
-| `groupby_multi_reducer` | 190.4 | 1.6 | 1.9 | 36.7 | 46.1 | 9.1 |
-| `groupby_select_order` | 171.6 | 1.4 | 1.6 | 36.7 | 44.3 | 8.5 |
-| `groupby_select_sum` | 199.2 | 2.8 | 3.2 | 33.0 | 39.7 | 23.0 |
-| `groupby_sum` | 172.7 | 1.4 | 1.9 | 36.7 | 44.3 | 8.4 |
-| `groupby_where_count` | 76.4 | 0.9 | 1.3 | 36.5 | 42.0 | 11.3 |
-| `groupby_where_sum` | 87.5 | 0.9 | 1.3 | 36.6 | 42.0 | 11.2 |
-| `join_count` | 38.3 | 10.8 | 12.8 | 40.9 | 70.9 | 25.1 |
-| `join_groupby_count` | 157.8 | 17.3 | 19.5 | 66.5 | 90.0 | 73.5 |
-| `join_groupby_to_array` | 191.5 | 18.4 | 19.8 | 78.1 | 36.2 | 80.8 |
-| `join_probe` | — | — | — | — | — | 16.6 |
-| `join_probe_build` | — | — | — | — | — | 31.5 |
-| `join_select` | 92.8 | 19.6 | 21.7 | 73.1 | 95.2 | 69.5 |
-| `join_where_count` | 48.8 | 19.1 | 20.9 | 62.5 | 76.7 | 31.5 |
-| `last_match` | 0.0 | 0.5 | 1.4 | 19.3 | 26.0 | 12.0 |
-| `long_count_aggregate` | 29.4 | 0.3 | 0.6 | 23.4 | 25.8 | 7.3 |
-| `max_aggregate` | 30.9 | 0.3 | 0.5 | 19.1 | 27.1 | 7.5 |
-| `min_aggregate` | 31.0 | 0.3 | 0.5 | 19.1 | 27.1 | 7.4 |
-| `order_by_multi_key` | 248.9 | 53.1 | 54.3 | 123.5 | 71.6 | 119.2 |
-| `order_distinct_take` | 138.5 | 1.1 | 75.3 | 21.0 | 36.0 | 8.1 |
-| `order_reverse_normalized` | 38.3 | 0.7 | 1.4 | 22.4 | 27.8 | 9.8 |
-| `order_take_desc` | 38.6 | 0.7 | 1.4 | 22.5 | 27.7 | 9.7 |
+| `groupby_average` | 175.0 | 1.5 | 1.8 | 36.5 | 42.8 | 8.8 |
+| `groupby_count` | 143.1 | 1.3 | 1.5 | 20.5 | 32.2 | 8.4 |
+| `groupby_first` | 263.5 | 1.3 | 2.3 | 20.5 | 33.1 | 10.0 |
+| `groupby_having_count` | 143.6 | 1.3 | 1.5 | 20.5 | 32.7 | 8.4 |
+| `groupby_having_hidden_sum` | 181.0 | 1.5 | 1.9 | 36.5 | 42.6 | 8.5 |
+| `groupby_having_post_where` | 175.9 | 1.4 | 1.9 | 36.5 | 41.8 | 8.4 |
+| `groupby_max` | 178.1 | 1.5 | 1.9 | 36.6 | 43.5 | 8.4 |
+| `groupby_min` | 181.7 | 1.5 | 1.9 | 36.7 | 43.5 | 8.4 |
+| `groupby_multi_reducer` | 194.0 | 1.6 | 1.9 | 36.7 | 43.4 | 8.9 |
+| `groupby_select_order` | 175.6 | 1.4 | 1.6 | 36.4 | 41.9 | 8.3 |
+| `groupby_select_sum` | 204.5 | 2.8 | 3.2 | 32.8 | 37.6 | 22.6 |
+| `groupby_sum` | 176.0 | 1.4 | 1.9 | 36.6 | 41.8 | 8.3 |
+| `groupby_where_count` | 75.9 | 0.9 | 1.3 | 36.4 | 39.7 | 11.2 |
+| `groupby_where_sum` | 87.5 | 0.9 | 1.3 | 36.5 | 39.6 | 11.1 |
+| `join_count` | 38.4 | 10.9 | 12.6 | 40.6 | 68.0 | 25.1 |
+| `join_groupby_count` | 160.4 | 17.4 | 19.2 | 66.2 | 85.9 | 73.1 |
+| `join_groupby_to_array` | 195.5 | 18.4 | 19.6 | 78.0 | 35.9 | 80.4 |
+| `join_probe` | — | — | — | — | — | 16.5 |
+| `join_probe_build` | — | — | — | — | — | 31.6 |
+| `join_select` | 92.8 | 19.6 | 21.6 | 72.9 | 90.9 | 70.0 |
+| `join_where_count` | 39.3 | 19.1 | 20.8 | 62.5 | 78.7 | 31.5 |
+| `last_match` | 0.0 | 0.5 | 1.4 | 19.0 | 25.6 | 11.9 |
+| `long_count_aggregate` | 29.4 | 0.3 | 0.6 | 23.2 | 25.1 | 7.2 |
+| `max_aggregate` | 30.9 | 0.3 | 0.5 | 18.9 | 26.3 | 7.4 |
+| `min_aggregate` | 30.9 | 0.3 | 0.5 | 18.9 | 26.3 | 7.3 |
+| `order_by_multi_key` | 246.7 | 53.3 | 54.3 | 123.6 | 70.3 | 118.8 |
+| `order_distinct_take` | 140.9 | 1.1 | 75.4 | 20.8 | 33.9 | 8.0 |
+| `order_reverse_normalized` | 38.6 | 0.7 | 1.4 | 22.4 | 27.0 | 9.8 |
+| `order_take_desc` | 38.5 | 0.7 | 1.4 | 22.4 | 27.1 | 9.6 |
 | `point_lookup` | — | — | — | — | — | 0.0 |
 | `point_lookup_residual` | — | — | — | — | — | 0.0 |
 | `point_lookup_scan` | — | — | — | — | — | 3.1 |
-| `reverse_distinct_by` | 297.8 | 1.6 | 3.2 | 20.7 | 34.6 | 11.0 |
-| `reverse_take` | 0.0 | 0.0 | 0.0 | 0.0 | 3.7 | 19.5 |
-| `reverse_take_select` | 0.0 | 0.0 | 0.0 | 0.0 | 3.8 | 19.0 |
-| `select_count` | 0.1 | 0.0 | 0.0 | 62.0 | 0.0 | 0.0 |
-| `select_many` | — | 62.3 | — | — | — | — |
-| `select_where` | 107.2 | 4.1 | 5.3 | 74.5 | 22.2 | 17.6 |
-| `select_where_count` | 33.0 | 0.3 | 0.6 | 18.8 | 26.6 | 7.3 |
-| `select_where_order_take` | 37.0 | 0.7 | 1.4 | 19.4 | 27.2 | 13.0 |
-| `select_where_sum` | 37.1 | 0.4 | 0.6 | 18.5 | 25.5 | 7.5 |
-| `single_match` | 0.0 | 0.4 | 1.1 | 46.8 | 22.3 | 9.0 |
+| `reverse_distinct_by` | 309.2 | 1.6 | 3.2 | 20.5 | 32.8 | 11.0 |
+| `reverse_take` | 0.0 | 0.0 | 0.0 | 0.0 | 3.7 | 19.4 |
+| `reverse_take_select` | 0.0 | 0.0 | 0.0 | 0.0 | 3.8 | 18.9 |
+| `select_count` | 0.1 | 0.0 | 0.0 | 62.5 | 0.0 | 0.0 |
+| `select_many` | — | 61.4 | — | — | — | — |
+| `select_where` | 110.2 | 4.1 | 5.3 | 74.5 | 22.1 | 17.6 |
+| `select_where_count` | 32.9 | 0.3 | 0.6 | 18.6 | 26.0 | 7.3 |
+| `select_where_order_take` | 36.9 | 0.7 | 1.4 | 19.2 | 26.6 | 12.9 |
+| `select_where_sum` | 38.7 | 0.4 | 0.6 | 18.3 | 25.0 | 7.3 |
+| `single_match` | 0.0 | 0.4 | 1.2 | 43.4 | 22.2 | 8.9 |
 | `skip_take` | 0.3 | 0.0 | 0.0 | 1.2 | 0.2 | 0.1 |
-| `skip_while_match` | 3.4 | 0.4 | 0.4 | 46.1 | 21.9 | 7.7 |
-| `sort_first` | 38.5 | 0.4 | 1.3 | 18.5 | 26.8 | 9.0 |
-| `sort_take` | 38.7 | 0.7 | 1.4 | 22.5 | 27.9 | 9.4 |
-| `sort_take_select` | 38.6 | 0.7 | 1.4 | 22.4 | 27.9 | 9.3 |
-| `sum_aggregate` | 30.1 | 0.3 | 0.1 | 28.4 | 24.6 | 7.3 |
-| `sum_where` | 32.7 | 0.3 | 0.6 | 18.9 | 26.4 | 7.3 |
-| `take_count` | 1.8 | 0.1 | 0.1 | 1.2 | 0.2 | 0.1 |
+| `skip_while_match` | 3.4 | 0.4 | 0.4 | 43.6 | 21.9 | 7.6 |
+| `sort_first` | 38.3 | 0.4 | 1.3 | 18.3 | 26.2 | 8.9 |
+| `sort_take` | 38.8 | 0.7 | 1.4 | 22.3 | 27.1 | 9.3 |
+| `sort_take_select` | 38.4 | 0.7 | 1.4 | 22.3 | 26.9 | 9.2 |
+| `sum_aggregate` | 30.1 | 0.3 | 0.1 | 28.3 | 24.3 | 7.2 |
+| `sum_where` | 33.2 | 0.3 | 0.6 | 18.6 | 25.8 | 7.2 |
+| `take_count` | 1.9 | 0.1 | 0.1 | 1.2 | 0.2 | 0.1 |
 | `take_count_filtered` | 1.1 | 0.0 | 0.0 | 0.5 | 0.1 | 0.0 |
 | `take_sum_aggregate` | 0.8 | 0.0 | 0.0 | 0.3 | 0.0 | 0.0 |
 | `take_where_count` | 0.9 | 0.0 | 0.0 | 0.2 | 0.0 | 0.0 |
-| `take_while_match` | 8.2 | 0.2 | 0.3 | 17.6 | 8.9 | 7.3 |
-| `to_array_filter` | 48.9 | 3.3 | 3.4 | 21.8 | 35.3 | 13.0 |
-| `to_table` | — | 14.0 | 37.2 | 49.0 | 54.1 | 20.9 |
-| `to_table_staged` | — | 25.8 | 26.2 | 52.4 | 64.1 | 33.4 |
-| `where_join_count` | 41.6 | 5.9 | 6.8 | 47.2 | 42.6 | 19.8 |
-| `zip_count_pred` | 39.5 | 0.1 | — | 113.3 | 33.8 | — |
-| `zip_dot_product` | 49.3 | 0.1 | 0.1 | 113.0 | 33.8 | — |
-| `zip_dot_product_3arg` | 49.1 | 0.1 | — | 113.1 | 33.9 | — |
-| `zip_reverse_to_array` | — | 4.6 | — | 126.3 | 51.9 | — |
+| `take_while_match` | 8.1 | 0.2 | 0.3 | 15.3 | 9.1 | 7.3 |
+| `to_array_filter` | 48.4 | 3.2 | 3.3 | 21.6 | 33.4 | 12.9 |
+| `to_table` | — | 14.0 | 37.2 | 48.7 | 52.2 | 20.8 |
+| `to_table_staged` | — | 25.6 | 26.4 | 52.2 | 61.4 | 33.1 |
+| `where_join_count` | 41.7 | 5.8 | 6.7 | 47.0 | 40.5 | 19.7 |
+| `zip_count_pred` | 39.6 | 0.1 | — | 113.0 | 33.5 | — |
+| `zip_dot_product` | 46.9 | 0.1 | 0.1 | 112.5 | 33.3 | — |
+| `zip_dot_product_3arg` | 47.2 | 0.1 | — | 112.5 | 33.3 | — |
+| `zip_reverse_to_array` | — | 4.6 | — | 123.6 | 50.2 | — |
 <!-- BENCH:TABLES END -->
 
 ## Missing lanes (the `—` cells)
diff --git a/daslib/linq.das b/daslib/linq.das
index 2906aa443..368cdd2ce 100644
--- a/daslib/linq.das
+++ b/daslib/linq.das
@@ -1515,6 +1515,72 @@ def max_by(src : array<auto(TT)>; key) : TT -& -const {
     }
 }
 
+[unused_argument(tt)]
+def private max_sel_impl(var src; tt : auto(TT); selector) : typedecl(selector(type<TT>)) - const - & {
+    var maxx : typedecl(selector(type<TT>)) - const - &
+    var first : bool = true
+    for (x in src) {
+        let k = selector(x)
+        if (first) {
+            maxx := k
+            first = false
+        } elif (_::less(maxx, k)) {
+            maxx := k
+        }
+    }
+    return <- maxx
+}
+
+[unused_argument(tt)]
+def private max_sel_impl_const(src : auto(ARGT); tt : auto(TT); selector) : typedecl(selector(type<TT>)) - const - & {
+    return <- max_sel_impl(unsafe(reinterpret<ARGT -const>(src)), type<TT -const -&>, selector)
+}
+
+def max(var src : iterator<auto(TT)>; selector) : typedecl(selector(type<TT>)) - const - & {
+    //! Maximum of ``selector(x)`` over iterator elements (C# ``Max(selector)``).
+    //! Returns the projected value; see ``max_by`` for the element-returning form.
+    return <- max_sel_impl(src, type<TT -const -&>, selector)
+}
+
+def max(src : array<auto(TT)>; selector) : typedecl(selector(type<TT>)) - const - & {
+    //! Maximum of ``selector(x)`` over array elements (C# ``Max(selector)``).
+    //! Returns the projected value; see ``max_by`` for the element-returning form.
+    return <- max_sel_impl_const(src, type<TT -const -&>, selector)
+}
+
+[unused_argument(tt)]
+def private min_sel_impl(var src; tt : auto(TT); selector) : typedecl(selector(type<TT>)) - const - & {
+    var minn : typedecl(selector(type<TT>)) - const - &
+    var first : bool = true
+    for (x in src) {
+        let k = selector(x)
+        if (first) {
+            minn := k
+            first = false
+        } elif (_::less(k, minn)) {
+            minn := k
+        }
+    }
+    return <- minn
+}
+
+[unused_argument(tt)]
+def private min_sel_impl_const(src : auto(ARGT); tt : auto(TT); selector) : typedecl(selector(type<TT>)) - const - & {
+    return <- min_sel_impl(unsafe(reinterpret<ARGT -const>(src)), type<TT -const -&>, selector)
+}
+
+def min(var src : iterator<auto(TT)>; selector) : typedecl(selector(type<TT>)) - const - & {
+    //! Minimum of ``selector(x)`` over iterator elements (C# ``Min(selector)``).
+    //! Returns the projected value; see ``min_by`` for the element-returning form.
+    return <- min_sel_impl(src, type<TT -const -&>, selector)
+}
+
+def min(src : array<auto(TT)>; selector) : typedecl(selector(type<TT>)) - const - & {
+    //! Minimum of ``selector(x)`` over array elements (C# ``Min(selector)``).
+    //! Returns the projected value; see ``min_by`` for the element-returning form.
+    return <- min_sel_impl_const(src, type<TT -const -&>, selector)
+}
+
 [unused_argument(tt)]
 def private min_max_impl(var src; tt : auto(TT)) : tuple<TT -& -const, TT -& -const> {
     var minn : TT -& -const
@@ -1670,6 +1736,30 @@ def sum(src : array<auto(TT)>) : TT -const -& {
     }
 }
 
+[unused_argument(tt)]
+def private sum_sel_impl(var src; tt : auto(TT); selector) : typedecl(selector(type<TT>)) - const - & {
+    var total : typedecl(selector(type<TT>)) - const - &
+    for (x in src) {
+        total += selector(x)
+    }
+    return <- total
+}
+
+[unused_argument(tt)]
+def private sum_sel_impl_const(src : auto(ARGT); tt : auto(TT); selector) : typedecl(selector(type<TT>)) - const - & {
+    return <- sum_sel_impl(unsafe(reinterpret<ARGT -const>(src)), type<TT -const -&>, selector)
+}
+
+def sum(var src : iterator<auto(TT)>; selector) : typedecl(selector(type<TT>)) - const - & {
+    //! Sum of ``selector(x)`` over iterator elements (C# ``Sum(selector)``).
+    return <- sum_sel_impl(src, type<TT -const -&>, selector)
+}
+
+def sum(src : array<auto(TT)>; selector) : typedecl(selector(type<TT>)) - const - & {
+    //! Sum of ``selector(x)`` over array elements (C# ``Sum(selector)``).
+    return <- sum_sel_impl_const(src, type<TT -const -&>, selector)
+}
+
 def average(var src : iterator<auto(TT)>) : double {
     //! Averages elements in an iterator. Always returns ``double``
     //! (matches SQL ``AVG`` / C# ``Average``); element type must cast to ``double``.
@@ -1702,6 +1792,40 @@ def average(src : array<auto(TT)>) : double {
     return count != 0ul ? total / double(count) : 0lf
 }
 
+def average(var src : iterator<auto(TT)>; selector) : double {
+    //! Average of ``selector(x)`` over iterator elements (C# ``Average(selector)``).
+    //! Always returns ``double``; the projected type must cast to ``double``.
+    var total : double = 0lf
+    var count : uint64 = 0ul
+    for (x in src) {
+        let k = selector(x)
+        static_if (typeinfo stripped_typename(k) == typeinfo stripped_typename(default<double>)) {
+            total += k
+        } else {
+            total += double(k)
+        }
+        count ++
+    }
+    return count != 0ul ? total / double(count) : 0lf
+}
+
+def average(src : array<auto(TT)>; selector) : double {
+    //! Average of ``selector(x)`` over array elements (C# ``Average(selector)``).
+    //! Always returns ``double``; the projected type must cast to ``double``.
+    var total : double = 0lf
+    var count : uint64 = 0ul
+    for (x in src) {
+        let k = selector(x)
+        static_if (typeinfo stripped_typename(k) == typeinfo stripped_typename(default<double>)) {
+            total += k
+        } else {
+            total += double(k)
+        }
+        count ++
+    }
+    return count != 0ul ? total / double(count) : 0lf
+}
+
 [unused_argument(tt)]
 def private min_max_average_impl(var src; tt : auto(TT)) : tuple<TT -const -&, TT -const -&, TT -const -&> {
     var minn : TT -const -&
diff --git a/daslib/linq_boost.das b/daslib/linq_boost.das
index f37ba70e9..099745ae0 100644
--- a/daslib/linq_boost.das
+++ b/daslib/linq_boost.das
@@ -24,6 +24,69 @@ def private clone_iterator_type(it : TypeDeclPtr) : TypeDeclPtr {
     return <- res
 }
 
+// Bucket-surface lambda stamping. When the chain element is the group_by_lazy shape
+// `tuple<K, array<E>>`, an untyped lambda passed to a direct `<bind>._1 |> <op>(<lambda>)`
+// call can't infer against the fully generic tier-2 params (error 30303) — but the chain
+// type knows E, so the macro stamps it, same as the outer-param injection in visit().
+class private BucketLambdaStamper : AstVisitor {
+    bindName : string
+    elemT : TypeDeclPtr
+    def BucketLambdaStamper(n : string; var et : TypeDeclPtr) {
+        bindName = n
+        elemT = et
+    }
+    def override visitExprCall(var expr : ExprCall?) : ExpressionPtr {
+        let nm = string(expr.name)
+        if ((nm != "select" && nm != "select_to_array" && nm != "sum" && nm != "min"
+                && nm != "max" && nm != "average" && nm != "min_by" && nm != "max_by")
+                || length(expr.arguments) != 2) {
+            return expr
+        }
+        var recv = expr.arguments[0]
+        if (recv != null && recv is ExprRef2Value) {
+            recv = (recv as ExprRef2Value).subexpr
+        }
+        if (recv == null || !(recv is ExprField)) return expr
+        var fld = recv as ExprField
+        if (fld.name != "_1") return expr
+        var base = fld.value
+        if (base != null && base is ExprRef2Value) {
+            base = (base as ExprRef2Value).subexpr
+        }
+        if (base == null || !(base is ExprVar) || (base as ExprVar).name != bindName) return expr
+        var lam = expr.arguments[1]
+        if (lam == null || !(lam is ExprMakeBlock)) return expr
+        var blk = (lam as ExprMakeBlock)._block
+        if (blk == null || !(blk is ExprBlock)) return expr
+        var eb = blk as ExprBlock
+        if (length(eb.arguments) != 1) return expr
+        if (eb.arguments[0]._type == null || eb.arguments[0]._type.isAutoOrAlias) {
+            var et = clone_type(elemT)
+            et.flags.ref = false
+            et.flags.constant = true
+            eb.arguments[0]._type = et
+        }
+        return expr
+    }
+}
+
+[macro_function]
+def private stamp_bucket_reducer_lambdas(var projExpr : ExpressionPtr; bindName : string; iterType : TypeDeclPtr) {
+    if (projExpr == null || iterType == null || iterType.baseType != Type.tTuple
+            || length(iterType.argTypes) != 2) {
+        return
+    }
+    var bucketT = iterType.argTypes[1]
+    if (bucketT == null || bucketT.baseType != Type.tArray || bucketT.firstType == null) return
+    var sc = new BucketLambdaStamper(bindName, clone_type(bucketT.firstType))
+    make_visitor(*sc) $(astVisitorAdapter) {
+        visit_expression(projExpr, astVisitorAdapter)
+    }
+    unsafe {
+        delete sc
+    }
+}
+
 class private AstCallMacro_LinqPred2 : AstCallMacro {
     //! Base call macro for LINQ-style two-argument predicate operators.
     predName = "where_"
@@ -41,6 +104,16 @@ class private AstCallMacro_LinqPred2 : AstCallMacro {
         macro_verify(!arg0type.firstType.isAutoOrAlias, prog, call.at, "iterable type cannot be auto or alias")
         // replacing function
         var iterType = clone_iterator_type(arg0type)
+        // bucket-surface inner lambdas (`<bind>._1 |> select/max/…(<untyped lambda>)`) get the
+        // bucket element type stamped before the rewrite — see BucketLambdaStamper above
+        var bucketBind = "_"
+        if (call.arguments[1] is ExprMakeBlock) {
+            var bblk = (call.arguments[1] as ExprMakeBlock)._block
+            if (bblk != null && bblk is ExprBlock && length((bblk as ExprBlock).arguments) == 1) {
+                bucketBind = string((bblk as ExprBlock).arguments[0].name)
+            }
+        }
+        stamp_bucket_reducer_lambdas(call.arguments[1], bucketBind, iterType)
         var res : ExpressionPtr
         if (call.arguments[1] is ExprMakeBlock) {
             // named-variable form: the argument is already a `$(x) => …` block (e.g. emitted by linq_das).
diff --git a/daslib/linq_fold.md b/daslib/linq_fold.md
index 741099e13..937f373a1 100644
--- a/daslib/linq_fold.md
+++ b/daslib/linq_fold.md
@@ -67,7 +67,7 @@ The adapter is an abstract `class SourceAdapter` (`[macro_interface]`, so every
 - `wrap_source_loop(loopShape : LoopDispatch; var body; at) : Expression?` — emit the per-element iteration (array `for`, decs `for_each_archetype{,_find}`, zip lockstep, joins hash+probe). `loopShape` is the loop-framing knob consumed only by nested-callback sources (decs); direct-return sources frame an unconditional loop and ignore it.
 - `wrap_invoke(var stmts; retType; wrapIter; at) : Expression?` — outer invoke binding sources as params.
 
-Emit fns hold a `SourceAdapter?` (via `EmitCtx.src` or an `adapter` local) and call these virtually. **daslang classes have no `is`/`as` downcast** (variant-only), so source-specific data is never pulled off a base pointer by downcasting — it goes through virtual methods. Beyond the 4 dispatch methods the base also declares 6 default-null **per-operation hook methods** (`emit_loop_or_count` / `emit_reverse_skip_into_tail` / `emit_reverse_last_backward` / `emit_distinct_take_loop` / `build_group_by_adapter` / `emit_join_hook`) that the owning source overrides; the generic lane falls back to its inline (array) body when the hook returns null. (`XmlAdapter` overrides the two reverse hooks with a **backward DOM walk** — `last_child`/`previous_sibling`, both O(1) in pugixml: `emit_reverse_skip_into_tail` collects only the last N children for `reverse |> take(N)` (m5f `reverse_take` 88.9 → 0.0 ns/op), and `emit_reverse_last_backward` returns the last element in one step for a no-predicate `last()` / `reverse |> first`. Predicated `[where] |> last` stays on the forward walk — reverse DOM traversal is ~2× cache-hostile per node, profiled — and the named 3-arg `from_xml_node` form falls back to the buffer path since pugixml has no last-named-child primitive.) (`emit_join_hook` is the standalone-join dispatch: the single `join_general` pattern's thin `emit_join` routes to it, so each source supplies its own join body — array `for`+2-param invoke, decs `for_each_archetype`, XML field-pruned DOM walk — with no parallel per-source join pattern.) It also declares **capability methods** the source answers about itself — `can_group_by` / `can_join` / `can_reserve_by_length` / `has_own_loop_or_count_lane` (bool, default false) and `name_prefix` (string) — which replaced the old `kind() : AdapterKind` enum + per-site switches, so a new source only implements the methods (no central enum to extend). The `can_group_by` / `can_join` capabilities are queried from the `can_group_by_source` / `can_join_source` `RequiresPredicate`s (which thread the adapter), so the single `group_by` / `join_general` pattern admits any capable source and the adapter's `build_group_by_adapter` / `emit_join_hook` does the source/srcb-shape gating (null → tier-2). Two transitional getters remain — `arrayTop()`/`arraySrcName()` (default null/"" on base, overridden by `ArrayAdapter`); the decs-specific getters were removed in G2a so the base (and thus `linq_fold_common`) is free of `DecsAdapter`/ECS coupling. One decorator subclass lives in `linq_fold_common`: `ProjectedSourceAdapter` wraps any inner adapter to absorb a leading `_select(f)` source projection (the `srcsel` slot) — it binds `projName = f(rawElem)` atop the per-element body and delegates `wrap_source_loop`/`wrap_invoke`/`name_prefix` to the inner adapter, leaving the base no-op `arrayTop`/`arraySrcName`/`can_reserve_by_length` so source-direct fast paths (which would bypass the projection) stay disabled. This lets order/distinct splices fuse over `source |> _select(f) |> …` for any source.
+Emit fns hold a `SourceAdapter?` (via `EmitCtx.src` or an `adapter` local) and call these virtually. **daslang classes have no `is`/`as` downcast** (variant-only), so source-specific data is never pulled off a base pointer by downcasting — it goes through virtual methods. Beyond the 4 dispatch methods the base also declares 6 default-null **per-operation hook methods** (`emit_loop_or_count` / `emit_reverse_skip_into_tail` / `emit_reverse_last_backward` / `emit_distinct_take_loop` / `build_group_by_adapter` / `emit_join_hook`) that the owning source overrides; the generic lane falls back to its inline (array) body when the hook returns null. (`XmlAdapter` overrides the two reverse hooks with a **backward DOM walk** — `last_child`/`previous_sibling`, both O(1) in pugixml: `emit_reverse_skip_into_tail` collects only the last N children for `reverse |> take(N)` (m5f `reverse_take` 88.9 → 0.0 ns/op), and `emit_reverse_last_backward` returns the last element in one step for a no-predicate `last()` / `reverse |> first`. Predicated `[where] |> last` stays on the forward walk — reverse DOM traversal is ~2× cache-hostile per node, profiled — and the named 3-arg `from_xml_node` form falls back to the buffer path since pugixml has no last-named-child primitive.) (`emit_join_hook` is the standalone-join dispatch: the single `join_general` pattern's thin `emit_join` routes to it, so each source supplies its own join body — array `for`+2-param invoke, decs `for_each_archetype`, XML field-pruned DOM walk — with no parallel per-source join pattern.) It also declares **capability methods** the source answers about itself — `can_group_by` / `can_join` / `can_reserve_by_length` / `has_own_loop_or_count_lane` (bool, default false) and `name_prefix` (string) — which replaced the old `kind() : AdapterKind` enum + per-site switches, so a new source only implements the methods (no central enum to extend). The `can_group_by` / `can_join` capabilities are queried from the `can_group_by_source` / `can_join_source` `RequiresPredicate`s (which thread the adapter), so the single `group_by` / `join_general` pattern admits any capable source and the adapter's `build_group_by_adapter` / `emit_join_hook` does the source/srcb-shape gating (null → tier-2). The **generic-lane source feed** is the getter pair `loop_source_expr()`/`loop_source_name()` (default null/"" on base; array/table override both, xml/json name-only): the shared array-shaped lanes (counter / early-exit / dedup / order family / hashed join) emit their loops and `length()` reads against the name and read the expr for compile-time facts (reserve hints, srcB element validation), so overriding the pair is what lights those lanes up for a source; the decs-specific getters were removed in G2a so the base (and thus `linq_fold_common`) is free of `DecsAdapter`/ECS coupling. One decorator subclass lives in `linq_fold_common`: `ProjectedSourceAdapter` wraps any inner adapter to absorb a leading `_select(f)` source projection (the `srcsel` slot) — it binds `projName = f(rawElem)` atop the per-element body and delegates `wrap_source_loop`/`wrap_invoke`/`name_prefix` to the inner adapter, leaving the base no-op `loop_source_expr`/`loop_source_name`/`can_reserve_by_length` so source-direct fast paths (which would bypass the projection) stay disabled. This lets order/distinct splices fuse over `source |> _select(f) |> …` for any source.
 
 **Realized module layout (post-G3d):** `linq_fold_common` (kernel + abstract base + adapter-pure generic lanes — terminator/fold-array plus the source-generic loop_or_count / counter / accumulator / early-exit lanes, with `LoopDispatch` + the per-op `!supports_direct_return` state path that lets nested-callback sources ride the early-exit lane — + `splice_patterns` + `DecsBridgeShape`/`extract_decs_bridge`) ← `linq_fold_array` (Array/Zip/ArrayJoin adapters + the zip/join emit `emit_zip`/`emit_array_join` + array row-builders) and `linq_fold_decs` (Decs/DecsJoin adapters + decs-bridge visitors + the decs dispatcher `emit_loop_or_count_lane_decs` + the decs-specific hooks `emit_decs_count_archsize`/`emit_decs_reverse_skip_into_tail`/`emit_decs_join_impl`/`emit_decs_min_max_by` — the parallel terminator scaffold is gone, decs rides the generic lanes via `DecsAdapter`); the engine `linq_fold` requires all three and holds only the dispatcher + the `LinqFold` macro + the single `register_all_linq_fold_rows`. Adding a source = a new `linq_fold_<src>.das` subclass module + one `require` + one `build_<src>_rows()` call in the engine registrar. Later sources follow that recipe: `linq_fold_json` (`JsonAdapter`/`JsonJoinAdapter`), `pugixml/linq_fold_xml` (`XmlAdapter`, optional), `sqlite/linq_fold_sql` (pass-through detector), and `linq_fold_table` (`TableAdapter` over `each_kv`/`keys`/`values` heads — kv usage-pruned slot walks, no new rows; arc plan in `benchmarks/sql/LINQ_TO_TABLE.md`).
 
@@ -660,7 +660,7 @@ The imperative code has a few subtle co-occurrence rules that may not map cleanl
 - **2026-05-31 (deferred materialization — handle-buffering for buffered reducers)** — the buffered reducers (`order_by`/`sort`/`reverse` + `take`/`first`, `distinct_by |> order_by`) materialized the full `Car` (its `name` clone) for *every* source element before the reducer kept only K — `from_xml_node` builds all N. Fix: the reducer buffers a cheap **surrogate** — `(orderKey, xml_node)` for the order emits, a bare `xml_node` for reverse (no key) — and `build_xml_row` runs only for the K survivors. The comparator is the fixed `_::less(a._0, b._0)` on the precomputed key; where/distinct are consumed during the walk (cheap field reads gate which elements get a surrogate), so they never enter the surrogate. **The abstraction is source-generic** (an "element handle"): the surrogate machinery + materialize-survivors tail live in `linq_fold_common` (`build_surrogate_type` / `build_surrogate_cmp` / `build_surrogate_materialize_loop`); each source supplies `defers_materialization()` + `handle_type()` + `current_handle_expr()` + `materialize_handle()`. Only `XmlAdapter` overrides them this PR (a future `linq_json` is just those 4 hooks); `array`/`decs` inherit the no-defer default and stay **byte-identical** (their backing store is pre-materialized, so their reducers already clone only ~K heap-entrants — confirmed by `benchmarks/micro/sort_distinct_take_shapes.das`, where the `array<Car?>` pointer form is slower or tied). Wired into `emit_bounded_heap` (take), `emit_fused_prefilter` (distinct-only no-take arm — the pure-where case is already materialize-under-guard'd), `emit_streaming_min` (first), and `emit_reverse_buffer_inplace` (reverse + take). **Design validated by hand-coded micro-bench first** (`benchmarks/micro/sort_distinct_take_shapes.das`). Wins (m5f INTERP / JIT): `sort_take` 338 → 69 / 17, `order_take_desc` 343 → 69, `distinct_by_order_take` 354 → 126 / 46, `select_where_order_take` 228 → 71, `distinct_by_order_to_array` 356 → 131 / 46, `sort_first` 336 → 64 / 17, `reverse_take` 360 → 90 / 70 — string clones 100 000 → K everywhere. Not deferred (inherent floor / out of scope): `bare_order_where` (already at the under-guard survivor floor), `order_reverse_normalized` (order+reverse → all rows out), `reverse_distinct_by` (tier-2, no fused emit), `groupby_first` (group_by path).
 - **2026-05-31 (deferred materialization — `last` + group-by `first`)** — extends the element-handle deferral to the two remaining survivors-≪-N reducers: the full-walk `last`/`last_or_default` terminator (in `emit_early_exit_lane`) and `first`-per-group inside `plan_group_by_core`. `last` cloned the whole `Car` (`lst := it`) on *every* match and kept only the final one; over a deferring source it now stores the node **handle** per match and runs `materialize_handle` once, for the single survivor. `group_by(brand) |> select((key, first per group))` pinned the whole row (`slot := it`) in `mk_reducer_first`, forcing `wrap_source_loop` to build every element; a new `mk_reducer_first_deferred` materializes from the handle *inside the table miss-branch*, so the walk field-prunes to just the group key and `build_xml_row` runs only once per distinct group. Both ride the same four `SourceAdapter` hooks — only `XmlAdapter` defers; `array`/`decs` pass `null`/no-defer and stay byte-identical (the `emit_reducer_branches` adapter param defaults to `null`; the group-by gate also requires the bind be the raw element — `itName == bind_name`, i.e. no upstream `_select` rebinds it — since the handle yields the raw row). **Design validated by hand-coded micro-bench first** (the `last_match` / `groupby_first` lanes in `benchmarks/micro/sort_distinct_take_shapes.das`). Wins (m5f INTERP / JIT, string clones 100 000 → K): `last_match` 219 → 65 / 21 (K=1), `groupby_first` 339 → 72 / 22 (K=#brands). Closes `groupby_first` (the last item on the prior entry's floor list). Still not deferred: `bare_order_where` / `order_reverse_normalized` (all rows out), `reverse_distinct_by` (tier-2, no fused emit).
 - **2026-05-31 (forward keep-last — `reverse |> distinct[_by]` over forward sources)** — the only buffered shape still falling to tier-2 over a forward source. `reverse() |> distinct_by(K) |> to_array()` means "keep the LAST forward row per key, output in reverse-discovery order." The sole fused emit was `emit_reverse_backward_walk_dset_gate` — a backward **index** walk (`src[len-1-k]`) gated `array_source`, so XML / decs / plain iterators (forward-only, no random access) cascaded: `reverse()` materialized all N, then `distinct_by` walked. New `emit_reverse_distinct_forward_keeplast` (R-2b, gated by the exact complement `non_array_source`) does a single forward pass instead — `table<key; (seq, val)>`, **OVERWRITE** the slot per element (so it ends at the last forward occurrence + its seq), then sort survivors by **descending seq** (`build_surrogate_cmp(true)`) and emit. Output-identical to the backward walk (descending forward-index of each last occurrence), proven by parity vs both `m3f` (array backward walk) and the tier-2 cascade. It rides `emit_terminator_lane` → `wrap_source_loop`, so it's source-generic: **XML defers** (the table holds `(seq, xml_node)` and `build_xml_row` runs only for the K survivors — field-pruned to the key); **decs / iterator** store the full element (no handle), winning single-pass over the cascade's reverse-buffer + second walk. `ctx.top` is `null` for decs (bridge-driven), so `elemType` falls back to `ctx.src->element_type()`; arrays still match the backward-walk row first (registered earlier), so they're byte-identical. **Design validated by hand-coded micro-bench first** (the `reverse_distinct_by` lane in `benchmarks/micro/sort_distinct_take_shapes.das`: INTERP 405.8 → 88.6, JIT 162.6 → 37.0, string clones 100 000 → #keys). Wins: `reverse_distinct_by` m5f **429 → 74 INTERP / 166.6 → 22 JIT** (clones 100 000 → 5), and the previously-`—` decs **m4 lights up at 27.7 / 5.0** (near the array fast path). Closes `reverse_distinct_by` — the last forward-source buffered floor.
-- **2026-06-11 (table joins — adapter-generalized `emit_array_join` + table-srcB probe)** — table-arc stage 5 (branch `bbatkin/linq-table-each-kv`; plan: `benchmarks/sql/LINQ_TO_TABLE.md`). Two halves. (1) **Lead generalization**: `emit_array_join` no longer hand-rolls its `for (tup_a in srcA)` — the lead loop, bind name, and lead invoke-param spelling come from the adapter (`wrap_source_loop(LoopDispatch(Each=null))` / `bind_name(at)` / new `SourceAdapter.invoke_param_type()` capability, default `invoke_src_param_type(arrayTop())`), so `TableAdapter` just sets `can_join() = true` and routes `emit_join_hook` to the same emitter: a table-lead join walks the kv usage-pruned slot iterator(s) — a join body touching only `c.value.*` walks `values(tab)` alone — and group joins stay outer over every slot. decs/xml/json hooks untouched (nested-callback walks). (2) **Table-srcB probe**: when the join's srcb is `each_kv(tab)` / `keys(set)` joined on its **bare key** (`join_srcb_table_call` + `join_keyb_is_bare_key` on the peeled keyb), the emitter skips the internal `table<KEY; array<TUPB>>` + build loop entirely — srcB binds the user's table (const param) and the per-A probe is a key lookup, usage-pruned like the point-lookup fold (count-no-where / key-only → `key_exists`, value shapes → by-ref bind off `unsafe(tab?[k])`, whole-pair → kv-tuple bind). Unique table keys ⇒ probe ≡ hash semantics exactly; a bare field read is pure by construction so skipping keyb's per-B evaluation is unobservable; non-bare keybs and `group_join` (result consumes the whole bucket) keep the hashed build. Plumbing: per-pair statements factored into `build_join_pair_core` (`JoinPairCore`), shared by `build_join_standalone_pieces` (keeps the group-join arm + `get`-bucket wrap — hash-mode AST unchanged for the decs/xml/json callers) and the new `build_join_probe_pieces`. m7: `join_count` / `join_where_count` (table lead) leave tier-2; new `join_probe` vs `join_probe_build` A/B lanes.
+- **2026-06-11 (table joins — adapter-generalized `emit_array_join` + table-srcB probe)** — table-arc stage 5 (branch `bbatkin/linq-table-each-kv`; plan: `benchmarks/sql/LINQ_TO_TABLE.md`). Two halves. (1) **Lead generalization**: `emit_array_join` no longer hand-rolls its `for (tup_a in srcA)` — the lead loop, bind name, and lead invoke-param spelling come from the adapter (`wrap_source_loop(LoopDispatch(Each=null))` / `bind_name(at)` / new `SourceAdapter.invoke_param_type()` capability, default `invoke_src_param_type(loop_source_expr())` (named `arrayTop()` at the time)), so `TableAdapter` just sets `can_join() = true` and routes `emit_join_hook` to the same emitter: a table-lead join walks the kv usage-pruned slot iterator(s) — a join body touching only `c.value.*` walks `values(tab)` alone — and group joins stay outer over every slot. decs/xml/json hooks untouched (nested-callback walks). (2) **Table-srcB probe**: when the join's srcb is `each_kv(tab)` / `keys(set)` joined on its **bare key** (`join_srcb_table_call` + `join_keyb_is_bare_key` on the peeled keyb), the emitter skips the internal `table<KEY; array<TUPB>>` + build loop entirely — srcB binds the user's table (const param) and the per-A probe is a key lookup, usage-pruned like the point-lookup fold (count-no-where / key-only → `key_exists`, value shapes → by-ref bind off `unsafe(tab?[k])`, whole-pair → kv-tuple bind). Unique table keys ⇒ probe ≡ hash semantics exactly; a bare field read is pure by construction so skipping keyb's per-B evaluation is unobservable; non-bare keybs and `group_join` (result consumes the whole bucket) keep the hashed build. Plumbing: per-pair statements factored into `build_join_pair_core` (`JoinPairCore`), shared by `build_join_standalone_pieces` (keeps the group-join arm + `get`-bucket wrap — hash-mode AST unchanged for the decs/xml/json callers) and the new `build_join_probe_pieces`. m7: `join_count` / `join_where_count` (table lead) leave tier-2; new `join_probe` vs `join_probe_build` A/B lanes.
 - **2026-06-11 (`to_table` sink — fused insert-loop terminator)** — table-arc stage 6 (branch `bbatkin/linq-table-each-kv`; plan: `benchmarks/sql/LINQ_TO_TABLE.md`). Two layers. (1) **Tier-2 surface** (`daslib/linq.das`): selector-free `to_table` over iterators and arrays — `iterator<tuple<K;V> const>` → `table<K;V>` map (insert via `tab[x._0] := x._1`, builtin `to_table` clone semantics), `iterator<K const>` → `table<K>` set (`__builtin_table_set_insert`), plus borrowing `array<tuple<K;V>>` / `array<K>` forms with reserve (builtin only had the consuming `to_table_move` for dynamic arrays). The iterator params are **const-qualified** (`tuple<…> const` / `auto(keyT) const`) — the 50609 mangler-ICE defuse — so the `-const` flavor from `each_kv` chains and the `-&` flavor from `to_sequence` converge on one instantiation. The named kv tuple (`tuple<key:K; value:V>`) matches the positional `tuple<auto;auto>` generic directly. Duplicate keys keep the last occurrence (das insert semantics, not C#'s throw). (2) **Fused emit**: `to_table` joins `loop_terminator_family` + `classify_terminator`'s ARRAY (materializer) lane; the new arm in `emit_loop_or_count_lane` rides `emit_fold_array_lane` via a new `FoldArraySpec.bufDeclStmt` slot (replaces the array buffer decl with `var acc : table<…>`) — where/select/ranges/post-take-where plumbing all shared. Per-element insert by shape: a `(k => v)` `ExprMakeTuple` projection **splits** so key and value each evaluate exactly once (`acc[k] = v` direct, no tuple temp); other projections bind `let kvb := proj` once then index; pass-through spells the kv access with the element tuple's **real field names** (`.key`/`.value`) so the kv usage-pruner maps them (positional `._0` would not bind) — a bare `each_kv(tab).to_table()` is a reserve-ahead table clone through the pruned walk, and `keys(tab)` chains land in the set form via `insert`. Reserve fires only on unfiltered walks (`can_reserve_by_length` + no where — a thinned table over-reserves hash buckets, stricter than the array arm), with the take-min variant. Map-vs-set falls out of the terminator call's resolved type (`secondType == void`). Declines that keep tier-2: the 3-arg selector `to_table(key, elementSelector)`, decs sources (explicit guard in `emit_loop_or_count_lane_decs` — its implicit-to_array fall-through would mis-emit an array for a table-typed expr), MakeTuple projections of arity ≠ 2. m7: `to_table` 32.3 vs `to_table_staged` (fused-to_array + builtin `to_table_move`) 71.5 ns/elem INTERP (~2.2×).
 
 ## Open questions
diff --git a/daslib/linq_fold_array.das b/daslib/linq_fold_array.das
index 1b80b82a3..b6830fcae 100644
--- a/daslib/linq_fold_array.das
+++ b/daslib/linq_fold_array.das
@@ -13,7 +13,6 @@ require daslib/ast_boost
 require daslib/ast_match
 require daslib/templates_boost
 require daslib/macro_boost
-require strings
 require daslib/linq_fold_common public
 
 // 2-source lockstep zip payload — feeds `for (itA, itB in srcA, srcB) { body }` and the 2-source outer invoke.
@@ -52,10 +51,10 @@ class ArrayAdapter : SourceAdapter {
     def override emit_join_hook(var c : Captures; var ctx : EmitCtx; at : LineInfo) : Expression? {
         return emit_array_join(c, ctx, at)
     }
-    def override arrayTop() : Expression? {
+    def override loop_source_expr() : Expression? {
         return top
     }
-    def override arraySrcName() : string {
+    def override loop_source_name() : string {
         return srcName
     }
     def override bind_name(at : LineInfo) : string {
@@ -120,44 +119,27 @@ class ArrayAdapter : SourceAdapter {
         if (c.single |> key_exists("upstream_join")) {
             // srcA = chain top; srcB = join's second arg. plan_group_by_core consumes the ArrayJoin adapter via wrap_source_loop.
             var joinCall = c.single["upstream_join"]
-            var srcbSrc = joinCall.arguments[1]
-            if (srcbSrc == null || srcbSrc._type == null || srcbSrc._type.firstType == null
-                    || !(srcbSrc._type.isGoodArrayType || srcbSrc._type.isArray || srcbSrc._type.isIterator)) {
+            var core = extract_upstream_join_core(joinCall)
+            var srcb = extract_upstream_join_array_srcb(joinCall)
+            if (core == null || srcb == null) {
                 return null
             }
-            var keyaLam = joinCall.arguments[2]
-            var keybLam = joinCall.arguments[3]
-            var resultLam = joinCall.arguments[4]
-            if (keyaLam == null || keybLam == null || resultLam == null
-                    || keyaLam._type == null || resultLam._type == null
-                    || resultLam._type.firstType == null) {
-                return null
-            }
-            var keyType = strip_const_ref(clone_type(keyaLam._type.firstType))
-            if (!is_primitive_join_key_type(keyType)) {
-                return null
-            }
-            var resultType = strip_const_ref(clone_type(resultLam._type.firstType))
-            var tupAType = strip_const_ref(clone_type(top._type.firstType))
-            var tupBType = strip_const_ref(clone_type(srcbSrc._type.firstType))
             var topClone = clone_expression(top)
             topClone.genFlags.alwaysSafe = true
-            var srcbClone = clone_expression(srcbSrc)
-            srcbClone.genFlags.alwaysSafe = true
             return new ArrayJoinAdapter(shape = ArrayJoinShape(
                 srcAExpr = topClone,
-                srcBExpr = srcbClone,
+                srcBExpr = srcb.srcbClone,
                 srcAName = qn("ajoin_srcA", at),
                 srcBName = qn("ajoin_srcB", at),
                 srcAType = invoke_src_param_type(top),
-                srcBType = invoke_src_param_type(srcbSrc),
-                tupAType = tupAType,
-                tupBType = tupBType,
-                keyaLam = keyaLam,
-                keybLam = keybLam,
-                resultLam = resultLam,
-                keyType = keyType,
-                resultType = resultType,
+                srcBType = srcb.srcbType,
+                tupAType = strip_const_ref(clone_type(top._type.firstType)),
+                tupBType = srcb.tupBType,
+                keyaLam = core.keyaLam,
+                keybLam = core.keybLam,
+                resultLam = core.resultLam,
+                keyType = core.keyType,
+                resultType = core.resultType,
                 resBindName = qn("ajoin_jres", at)))
         }
         return new ArrayAdapter(top = top, srcName = qn("source", at))
diff --git a/daslib/linq_fold_common.das b/daslib/linq_fold_common.das
index dbe37f40a..d6a3b1eb5 100644
--- a/daslib/linq_fold_common.das
+++ b/daslib/linq_fold_common.das
@@ -11,6 +11,7 @@ module linq_fold_common shared public
 require daslib/linq public
 require daslib/ast_boost
 require daslib/ast_match
+require daslib/match
 require daslib/templates_boost
 require daslib/macro_boost
 require strings
@@ -119,15 +120,17 @@ class SourceAdapter {
     def count_shortcut(opName : string; at : LineInfo) : Expression? {
         return null
     }
-    // transitional source-data getters (removed once all consumers move into subclass methods)
-    def arrayTop() : Expression? {
+    // Generic-lane source feed: the shared array-shaped lanes emit `for`/`length()` against
+    // loop_source_name() and read loop_source_expr() for compile-time facts (reserve hints, srcB
+    // checks; null = decline). Overriding the pair lights up those lanes — see linq_fold.md.
+    def loop_source_expr() : Expression? {
         return null
     }
-    def arraySrcName() : string {
+    def loop_source_name() : string {
         return ""
     }
     def invoke_param_type() : TypeDeclPtr {     // invoke-param spelling for the source argument (join lane's lead param)
-        var top = arrayTop()
+        var top = loop_source_expr()
         return top != null ? invoke_src_param_type(top) : null
     }
 }
@@ -2216,7 +2219,7 @@ def emit_loop_or_count_lane(var c : Captures; var ctx : EmitCtx; at : LineInfo)
     // ctx.src IS the source adapter (ArrayAdapter for arrays, XmlAdapter for XML) — the lanes ride it
     // directly. There's no leading-select slot on this pattern, so ctx.src is never a ProjectedSourceAdapter.
     var top = ctx.top
-    let srcName = ctx.src->arraySrcName()
+    let srcName = ctx.src->loop_source_name()
     let itName = qn("it", at)
     let accName = qn("acc", at)
     let names <- make_range_names(at)
@@ -2436,7 +2439,7 @@ def emit_loop_or_count_lane(var c : Captures; var ctx : EmitCtx; at : LineInfo)
         // (worse than an array's slack), so the gate is stricter than the array arm's.
         var reserveStmts : array<Expression?>
         if (ctx.src->can_reserve_by_length() && whereCond == null && postTakeWhereCond == null) {
-            let rtop = ctx.src->arrayTop()
+            let rtop = ctx.src->loop_source_expr()
             if (rtop != null && rtop._type != null && type_has_length(rtop._type)) {
                 if (takeExpr != null) {
                     reserveStmts |> push <| qmacro_expr() {
@@ -3385,8 +3388,8 @@ def emit_fused_prefilter(var c : Captures; var ctx : EmitCtx; at : LineInfo) : E
     }
     var reserveStmts : array<Expression?>
     if (ctx.src->can_reserve_by_length() && oc.distinctName == "") {
-        let top = ctx.src->arrayTop()
-        let srcName = ctx.src->arraySrcName()
+        let top = ctx.src->loop_source_expr()
+        let srcName = ctx.src->loop_source_name()
         if (top != null && top._type != null && type_has_length(top._type)) {
             reserveStmts |> push <| qmacro_expr() {
                 $i(bufName) |> reserve(length($i(srcName)))
@@ -3707,7 +3710,7 @@ def emit_reverse_walk_overwrite_scalar(var c : Captures; var ctx : EmitCtx; at :
 [macro_function]
 def emit_reverse_backward_index_walk(var c : Captures; var ctx : EmitCtx; at : LineInfo) : Expression? {
     var top = ctx.top
-    let srcName = ctx.src->arraySrcName()
+    let srcName = ctx.src->loop_source_name()
     if (!(c.single |> key_exists("take")) || top._type == null || top._type.firstType == null) return null
     var bufElemType = strip_const_ref(clone_type(top._type.firstType))
     var termselLam : Expression?
@@ -3774,7 +3777,7 @@ def emit_reverse_backward_index_walk(var c : Captures; var ctx : EmitCtx; at : L
 [macro_function]
 def emit_reverse_backward_walk_dset_gate(var c : Captures; var ctx : EmitCtx; at : LineInfo) : Expression? {
     var top = ctx.top
-    let srcName = ctx.src->arraySrcName()
+    let srcName = ctx.src->loop_source_name()
     if (!(c.single |> key_exists("dist"))) return null
     let distinctCall = c.single["dist"]
     let distinctName = call_norm_name(distinctCall)
@@ -4035,8 +4038,8 @@ def emit_reverse_buffer_inplace(var c : Captures; var ctx : EmitCtx; at : LineIn
     // Reserve hint: array source with no prefilter has known length. Decs has no random-access length — skip.
     var reserveStmts : array<Expression?>
     if (ctx.src->can_reserve_by_length() && whereCond == null) {
-        let top = ctx.src->arrayTop()
-        let srcName = ctx.src->arraySrcName()
+        let top = ctx.src->loop_source_expr()
+        let srcName = ctx.src->loop_source_name()
         if (top != null && top._type != null && type_has_length(top._type)) {
             reserveStmts |> push <| qmacro_expr() {
                 $i(bufName) |> reserve(length($i(srcName)))
@@ -4404,14 +4407,14 @@ def is_bucket_reducer_call(expr : Expression?; bindName : string) : tuple<bool;
     if (c.func.fromGeneric != null) {
         fnName = string(c.func.fromGeneric.name)
     }
-    if ((c.arguments |> length) != 1) return (false, "", null)
+    let nArgs = c.arguments |> length
     // Bare reducer: `<reducer>(<bind>._1)`
-    if (fnName == "length" || fnName == "count" || fnName == "long_count"
-            || fnName == "sum" || fnName == "min" || fnName == "max" || fnName == "first" || fnName == "average") {
+    if (nArgs == 1 && (fnName == "length" || fnName == "count" || fnName == "long_count"
+            || fnName == "sum" || fnName == "min" || fnName == "max" || fnName == "first" || fnName == "average")) {
         if (peel_tuple_field_read(c.arguments[0], bindName, 1)) return (true, fnName, null)
     }
     // Inner-select reducer: `<reducer>(select(<bind>._1, <lambda>))`
-    if (fnName == "sum" || fnName == "min" || fnName == "max" || fnName == "first" || fnName == "average") {
+    if (nArgs == 1 && (fnName == "sum" || fnName == "min" || fnName == "max" || fnName == "first" || fnName == "average")) {
         let inner = c.arguments[0]
         if (inner != null && inner is ExprCall) {
             let innerCall = inner as ExprCall
@@ -4426,9 +4429,31 @@ def is_bucket_reducer_call(expr : Expression?; bindName : string) : tuple<bool;
             }
         }
     }
+    // Direct selector reducer: `<reducer>(<bind>._1, <lambda>)` ≡ inner-select form (the 2-arg
+    // tier-2 overloads in linq.das). An identity lambda canonicalizes to the bare reducer.
+    // `first`/`count` stay 1-arg only — their C# 2-arg forms take a PREDICATE, not a selector.
+    if (nArgs == 2 && (fnName == "sum" || fnName == "min" || fnName == "max" || fnName == "average")) {
+        if (peel_tuple_field_read(c.arguments[0], bindName, 1)) {
+            if (is_identity_lambda(c.arguments[1])) return (true, fnName, null)
+            return (true, "{fnName}_inner_select", c.arguments[1])
+        }
+    }
     return (false, "", null)
 }
 
+[macro_function]
+def is_identity_lambda(lam : Expression const?) : bool {
+    if (lam == null || !(lam is ExprMakeBlock)) return false
+    let blk = (lam as ExprMakeBlock)._block
+    if (blk == null || !(blk is ExprBlock)) return false
+    let eb = blk as ExprBlock
+    if (length(eb.arguments) != 1 || length(eb.list) != 1) return false
+    let st = eb.list[0]
+    if (st == null || !(st is ExprReturn)) return false
+    let body = match_peel_r2v((st as ExprReturn).subexpr)
+    return body != null && body is ExprVar && (body as ExprVar).name == eb.arguments[0].name
+}
+
 struct ReducerSpec {
     slot : int                  // target tuple slot: entry._{slot}
     name : string               // bare reducer or "<reducer>_inner_select"
@@ -5617,6 +5642,59 @@ def is_primitive_join_key_type(keyType : TypeDeclPtr) : bool {
             || keyType.baseType == Type.tBool))
 }
 
+// UpstreamJoinCore — source-independent half of a `join |> group_by` upstream-join arm: keya/keyb/result
+// lambdas validated, primitive-key gate applied, key/result types extracted. The source's
+// build_group_by_adapter adds its own srcB validation and builds its Join shape. null = decline to tier-2.
+struct UpstreamJoinCore {
+    keyaLam    : Expression?
+    keybLam    : Expression?
+    resultLam  : Expression?
+    keyType    : TypeDeclPtr
+    resultType : TypeDeclPtr
+}
+
+[macro_function]
+def extract_upstream_join_core(var joinCall : ExprCall?) : UpstreamJoinCore? {
+    var keyaLam = joinCall.arguments[2]
+    var keybLam = joinCall.arguments[3]
+    var resultLam = joinCall.arguments[4]
+    if (keyaLam == null || keybLam == null || resultLam == null
+            || keyaLam._type == null || resultLam._type == null
+            || resultLam._type.firstType == null) {
+        return null
+    }
+    var keyType = strip_const_ref(clone_type(keyaLam._type.firstType))
+    if (!is_primitive_join_key_type(keyType)) {
+        return null
+    }
+    return new UpstreamJoinCore(keyaLam = keyaLam, keybLam = keybLam, resultLam = resultLam,
+                                keyType = keyType,
+                                resultType = strip_const_ref(clone_type(resultLam._type.firstType)))
+}
+
+// Array-shaped srcB of the upstream-join arm (join's second arg): alwaysSafe clone, invoke-param
+// spelling, element tuple type. Used by sources hashing srcB from an array walk (array/json/xml);
+// decs validates srcB via extract_decs_bridge instead.
+struct UpstreamJoinArraySrcB {
+    srcbClone : Expression?
+    srcbType  : TypeDeclPtr
+    tupBType  : TypeDeclPtr
+}
+
+[macro_function]
+def extract_upstream_join_array_srcb(var joinCall : ExprCall?) : UpstreamJoinArraySrcB? {
+    var srcbSrc = joinCall.arguments[1]
+    if (srcbSrc == null || srcbSrc._type == null || srcbSrc._type.firstType == null
+            || !(srcbSrc._type.isGoodArrayType || srcbSrc._type.isArray || srcbSrc._type.isIterator)) {
+        return null
+    }
+    var srcbClone = clone_expression(srcbSrc)
+    srcbClone.genFlags.alwaysSafe = true
+    return new UpstreamJoinArraySrcB(srcbClone = srcbClone,
+                                     srcbType = invoke_src_param_type(srcbSrc),
+                                     tupBType = strip_const_ref(clone_type(srcbSrc._type.firstType)))
+}
+
 // JoinStandalonePieces — shared output of the standalone hashed-join shape builder. Returned to emit_decs_join_impl / emit_array_join / emit_xml_join which then wrap the source loops differently (decs uses for_each_archetype + build_decs_inner_for; array uses plain `for`; xml uses the DOM walk).
 struct JoinStandalonePieces {
     preludeStmts  : array<Expression?>
@@ -5884,25 +5962,23 @@ def join_srcb_table_call(var joinCall : ExprCall?) : ExprCall? {
     return call
 }
 
-// keyb (peeled, binder renamed to bindName) selects the table key itself: bare `kv.key` (kv lane) or the
-// bare element (keys lane). Then the join key IS the table key and a lookup replaces the bucket walk.
+// body (peeled, binder renamed to bindName) selects the table key itself: bare `kv.key` (kv lane) or the
+// bare element (keys lane). Shared by join keyb (key IS the table key ⇒ lookup replaces the bucket walk)
+// and the point-lookup probe matcher in linq_fold_table.
 [macro_function]
-def join_keyb_is_bare_key(var keybBody : Expression?; bindName : string; kvLane : bool) : bool {
-    var k = keybBody
-    if (k != null && k is ExprRef2Value) {
-        k = (k as ExprRef2Value).subexpr
-    }
-    if (!kvLane) {
-        return k != null && k is ExprVar && (k as ExprVar).name == bindName
-    }
-    if (k == null || !(k is ExprField)) return false
-    var f = k as ExprField
-    if (f.name != "key") return false
-    var base = f.value
-    if (base != null && base is ExprRef2Value) {
-        base = (base as ExprRef2Value).subexpr
+def is_bare_key_ref(keyBody : Expression?; bindName : string; kvLane : bool) : bool {
+    match (keyBody) {
+        if (ExprField(name = "key", value = ExprVar(name = match_expr(bindName)))) {
+            return kvLane
+        }
+        if (ExprVar(name = match_expr(bindName))) {
+            return !kvLane
+        }
+        if (_) {
+            return false
+        }
     }
-    return base != null && base is ExprVar && (base as ExprVar).name == bindName
+    return false
 }
 
 // Table-srcB twin of build_join_standalone_pieces: unique table keys ⇒ bucket size ≤ 1, so there is no
@@ -6080,7 +6156,7 @@ def emit_array_join(var c : Captures; var ctx : EmitCtx; at : LineInfo) : Expres
     // (only the result projection + an optional lead `where` splice for group joins), or any non-count
     // iterator-typed context.
     if ((isGroupJoin && (countOnly || whereLam != null || selectLam != null)) || (!countOnly && ctx.expr_is_iterator)) return null
-    var topSrc = ctx.src->arrayTop()
+    var topSrc = ctx.src->loop_source_expr()
     if (topSrc == null || topSrc._type == null || topSrc._type.firstType == null) return null
     var srcbSrc = joinCall.arguments[1]
     if (srcbSrc == null || srcbSrc._type == null || srcbSrc._type.firstType == null) return null
@@ -6089,7 +6165,7 @@ def emit_array_join(var c : Captures; var ctx : EmitCtx; at : LineInfo) : Expres
     if (keyaLam == null || keybLam == null || keyaLam._type == null) return null
     let keyType = strip_const_ref(clone_type(keyaLam._type.firstType))
     if (!is_primitive_join_key_type(keyType)) return null
-    let srcAName  = ctx.src->arraySrcName()
+    let srcAName  = ctx.src->loop_source_name()
     let srcBName  = qn("ajoin_srcB", at)
     let tupAName  = ctx.src->bind_name(at)
     let tupBName  = qn("ajoin_tup_b", at)
@@ -6100,7 +6176,7 @@ def emit_array_join(var c : Captures; var ctx : EmitCtx; at : LineInfo) : Expres
     let tupBType = strip_const_ref(clone_type(srcbSrc._type.firstType))
     var srcbTab = join_srcb_table_call(joinCall)
     let srcbKv = srcbTab != null && get_call_short_name(srcbTab) == "each_kv"
-    let probeMode = srcbTab != null && !isGroupJoin && join_keyb_is_bare_key(keybBody, tupBName, srcbKv)
+    let probeMode = srcbTab != null && !isGroupJoin && is_bare_key_ref(keybBody, tupBName, srcbKv)
     var pieces : JoinStandalonePieces?
     if (probeMode) {
         pieces = build_join_probe_pieces(joinCall, whereLam, selectLam, leadWhereLam, countOnly, keyaBody, tupAName, srcBName, srcbKv, "ajoin", at)
diff --git a/daslib/linq_fold_decs.das b/daslib/linq_fold_decs.das
index 1d6b661e4..aac8539fe 100644
--- a/daslib/linq_fold_decs.das
+++ b/daslib/linq_fold_decs.das
@@ -96,29 +96,19 @@ class DecsAdapter : SourceAdapter {
             if (bridgeB == null) {
                 return null
             }
-            var keyaLam = joinCall.arguments[2]
-            var keybLam = joinCall.arguments[3]
-            var resultLam = joinCall.arguments[4]
-            if (keyaLam == null || keybLam == null || resultLam == null
-                    || keyaLam._type == null || resultLam._type == null
-                    || resultLam._type.firstType == null) {
+            var core = extract_upstream_join_core(joinCall)
+            if (core == null) {
                 return null
             }
-            var keyType = strip_const_ref(clone_type(keyaLam._type.firstType))
-            if (!is_primitive_join_key_type(keyType)) {
-                return null
-            }
-            var resultType = strip_const_ref(clone_type(resultLam._type.firstType))
-            var tupBType = strip_const_ref(clone_type(bridgeB.elementType))
             return new DecsJoinAdapter(shape = DecsJoinShape(
                 bridgeA = bridge,
                 bridgeB = bridgeB,
-                keyaLam = keyaLam,
-                keybLam = keybLam,
-                resultLam = resultLam,
-                keyType = keyType,
-                tupBType = tupBType,
-                resultType = resultType,
+                keyaLam = core.keyaLam,
+                keybLam = core.keybLam,
+                resultLam = core.resultLam,
+                keyType = core.keyType,
+                tupBType = strip_const_ref(clone_type(bridgeB.elementType)),
+                resultType = core.resultType,
                 resBindName = qn("djoin_jres", at)))
         }
         return new DecsAdapter(bridge = bridge, tupName = qn("decs_tup", at))
diff --git a/daslib/linq_fold_json.das b/daslib/linq_fold_json.das
index 3b8f3acbd..94736006c 100644
--- a/daslib/linq_fold_json.das
+++ b/daslib/linq_fold_json.das
@@ -20,7 +20,6 @@ module linq_fold_json shared public
 require daslib/ast_boost
 require daslib/ast_match
 require daslib/templates_boost
-require strings
 require daslib/linq_fold_common public
 
 // Field-pruning row-usage scanning (RowUsageScanner / collect_row_usage / RowFieldFlattener /
@@ -278,47 +277,31 @@ class JsonAdapter : SourceAdapter {
             // join |> group_by over JSON: srcA = the JSON value (array-walked), srcB = join's second arg.
             // plan_group_by_core drives the bucket-fill through JsonJoinAdapter.wrap_source_loop (pruning free).
             var joinCall = c.single["upstream_join"]
-            var srcbSrc = joinCall.arguments[1]
-            if (srcbSrc == null || srcbSrc._type == null || srcbSrc._type.firstType == null
-                    || !(srcbSrc._type.isGoodArrayType || srcbSrc._type.isArray || srcbSrc._type.isIterator)) {
+            var core = extract_upstream_join_core(joinCall)
+            var srcb = extract_upstream_join_array_srcb(joinCall)
+            if (core == null || srcb == null) {
                 return null
             }
-            var keyaLam = joinCall.arguments[2]
-            var keybLam = joinCall.arguments[3]
-            var resultLam = joinCall.arguments[4]
-            if (keyaLam == null || keybLam == null || resultLam == null
-                    || keyaLam._type == null || resultLam._type == null
-                    || resultLam._type.firstType == null) {
-                return null
-            }
-            var keyType = strip_const_ref(clone_type(keyaLam._type.firstType))
-            if (!is_primitive_join_key_type(keyType)) {
-                return null
-            }
-            var resultType = strip_const_ref(clone_type(resultLam._type.firstType))
-            var tupBType = strip_const_ref(clone_type(srcbSrc._type.firstType))
-            var srcbClone = clone_expression(srcbSrc)
-            srcbClone.genFlags.alwaysSafe = true
             return new JsonJoinAdapter(shape = JsonJoinShape(
                 jsonExpr = clone_expression(jsonExpr),
                 srcName = qn("jsrc", at),
                 elemType = clone_type(elemType),
-                srcBExpr = srcbClone,
+                srcBExpr = srcb.srcbClone,
                 srcBName = qn("jjoin_srcB", at),
-                srcBType = invoke_src_param_type(srcbSrc),
-                keyaLam = keyaLam,
-                keybLam = keybLam,
-                resultLam = resultLam,
-                keyType = keyType,
-                tupBType = tupBType,
-                resultType = resultType,
+                srcBType = srcb.srcbType,
+                keyaLam = core.keyaLam,
+                keybLam = core.keybLam,
+                resultLam = core.resultLam,
+                keyType = core.keyType,
+                tupBType = srcb.tupBType,
+                resultType = core.resultType,
                 resBindName = qn("jjoin_jres", at)))
         }
         // join |> group_by deferred to tier-2 unless it hit the upstream-join arm; plain group_by clones self.
         return new JsonAdapter(jsonExpr = clone_expression(jsonExpr), srcName = qn("jsrc", at),
                                elemType = clone_type(elemType))
     }
-    def override arraySrcName() : string {
+    def override loop_source_name() : string {
         return srcName
     }
     def override bind_name(at : LineInfo) : string {
diff --git a/daslib/linq_fold_table.das b/daslib/linq_fold_table.das
index 5ee1ea0a2..8cb5d8e71 100644
--- a/daslib/linq_fold_table.das
+++ b/daslib/linq_fold_table.das
@@ -17,6 +17,7 @@ module linq_fold_table shared public
 
 require daslib/ast_boost
 require daslib/ast_match
+require daslib/match
 require daslib/templates_boost
 require daslib/macro_boost
 require daslib/linq_fold_common public
@@ -120,7 +121,7 @@ class TableAdapter : SourceAdapter {
         return true   // single flat for-loop inside the invoke; a mid-loop `return` exits the invoke
     }
     def override can_reserve_by_length() : bool {
-        return true   // length(tab) is O(1); the shared reserve hint reads arrayTop/arraySrcName
+        return true   // length(tab) is O(1); the shared reserve hint reads loop_source_expr/loop_source_name
     }
     def override const can_join() : bool {
         return true   // rides emit_array_join: direct-return lead loop via wrap_source_loop
@@ -137,12 +138,12 @@ class TableAdapter : SourceAdapter {
         return new TableAdapter(tabExpr = clone_expression(tabExpr), srcName = qn("tsrc", at),
                                 elemType = clone_type(elemType), lane = lane)
     }
-    def override arrayTop() : Expression? {
+    def override loop_source_expr() : Expression? {
         // Feeds the reserve hint (type_has_length covers tables). The backward-index reverse lanes that
-        // also read arrayTop gate on array_source, which is false here — matchTop stays iterator-typed.
+        // also read loop_source_expr gate on array_source, which is false here — matchTop stays iterator-typed.
         return tabExpr
     }
-    def override arraySrcName() : string {
+    def override loop_source_name() : string {
         return srcName
     }
     def override bind_name(at : LineInfo) : string {
@@ -197,24 +198,11 @@ class TableAdapter : SourceAdapter {
 
 [macro_function]
 def private match_key_probe_side(var keySide, otherSide : Expression?; lane : TableLane; bindName : string) : Expression? {
-    var k = keySide
-    if (k != null && k is ExprRef2Value) {
-        k = (k as ExprRef2Value).subexpr
-    }
-    if (lane == TableLane.KV) {
-        if (k == null || !(k is ExprField)) return null
-        var f = k as ExprField
-        if (f.name != "key") return null
-        var base = f.value
-        if (base != null && base is ExprRef2Value) {
-            base = (base as ExprRef2Value).subexpr
-        }
-        if (base == null || !(base is ExprVar) || (base as ExprVar).name != bindName) return null
-    } else {
-        if (k == null || !(k is ExprVar) || (k as ExprVar).name != bindName) return null
-    }
     // X must be loop-invariant AND side-effect free — the scan evaluates X per element, a probe once
-    if (expr_uses_var(otherSide, bindName) || has_sideeffects(otherSide)) return null
+    if (!is_bare_key_ref(keySide, bindName, lane == TableLane.KV)
+            || expr_uses_var(otherSide, bindName) || has_sideeffects(otherSide)) {
+        return null
+    }
     return clone_expression(otherSide)
 }
 
@@ -231,12 +219,11 @@ def private extract_key_probe(var pred : Expression?; lane : TableLane; bindName
         conjuncts |> push(andOp.right)
         leaf = andOp.left
     }
-    if (leaf == null || !(leaf is ExprOp2)) return null
-    var op2 = leaf as ExprOp2
-    if (op2.op != "==") return null
-    var probe = match_key_probe_side(op2.left, op2.right, lane, bindName)
+    var lhs, rhs : ExpressionPtr
+    if (!qmatch(leaf, $e(lhs) == $e(rhs)).matched) return null
+    var probe = match_key_probe_side(lhs, rhs, lane, bindName)
     if (probe == null) {
-        probe = match_key_probe_side(op2.right, op2.left, lane, bindName)
+        probe = match_key_probe_side(rhs, lhs, lane, bindName)
     }
     if (probe == null) return null
     for (c in conjuncts) {
diff --git a/daslib/match.das b/daslib/match.das
index 31ba2ad1d..dc9ca4217 100644
--- a/daslib/match.das
+++ b/daslib/match.das
@@ -110,12 +110,43 @@ def match_copy(what : TypeDeclPtr; wths : Expression?; makeType : TypeDeclPtr; a
     return match_any(makeType, wths, access_new, to)
 }
 
+def public match_peel_r2v(var e : Expression?) : Expression? {
+    //! Strip post-typer ``ExprRef2Value`` wrappers. The ``match`` macro emits this around the
+    //! source side of AST class patterns so a clean pattern (``ExprField(...)``) matches a
+    //! typer-wrapped source, mirroring ``qmatch``'s transparency rule. ``$v`` captures of
+    //! Expression-typed values go through it too, so they bind the peeled node.
+    while (e != null && e is ExprRef2Value) {
+        e = (e as ExprRef2Value).subexpr
+    }
+    return e
+}
+
+def public match_peel_r2v(e : Expression const?) : Expression const? {
+    //! Pointee-const flavor — same peel, preserves the caller's const view.
+    var c : Expression const? = e
+    while (c != null && c is ExprRef2Value) {
+        c = (c as ExprRef2Value).subexpr
+    }
+    return c
+}
+
 [macro_function]
 def match_struct(what : TypeDeclPtr; wths : ExprMakeStruct?; access : Expression?; var to : MatchTo) {
     // if its a pointer to ast::Expression or such, we need to 'as' cast first
     if (wths.makeType.isHandle && isExpression(wths.makeType)
             && what.isPointer && what.firstType != null && isExpression(what.firstType)) {
-        // `is` on a null AST pointer panics — guard like the pointer-to-structure path below
+        // Post-typer ExprRef2Value wrappers are invisible (no surface syntax) — peel the source
+        // side before the RTTI guard, mirroring qmatch. An explicit ExprRef2Value(...) pattern
+        // still matches the wrapper itself. `is` on a null AST pointer panics — both arms guard
+        // for null like the pointer-to-structure path below.
+        if (wths.makeType.annotation.name != "ExprRef2Value") {
+            var peeled = qmacro(match_peel_r2v($e(access)))
+            peeled |> force_at(wths.at)
+            var cond_pz = qmacro($e(peeled) != null)
+            cond_pz |> force_at(wths.at)
+            to.conditions |> emplace(cond_pz)
+            return match_as_is(what, wths, wths.makeType, wths.makeType.annotation.name, peeled, to)
+        }
         var cond_nz = qmacro($e(access) != null)
         cond_nz |> force_at(wths.at)
         to.conditions |> emplace(cond_nz)
@@ -390,7 +421,8 @@ def match_expr(what : TypeDeclPtr; wthmt : ExprCall?; access : Expression?; var
         return false
     }
     assume value = wthmt.arguments[0]
-    if (value._type != null && !is_same_type(what, value._type, RefMatters.no, ConstMatters.no, TemporaryMatters.no)) {
+    if (value._type != null && !is_same_type(what, value._type, RefMatters.no, ConstMatters.no, TemporaryMatters.no)
+            && !(is_string_like(what) && is_string_like(value._type))) {   // das_string == string is a language-level compare
         to.errors |> match_error("mismatching expression type {describe(what)} vs {describe(value._type)}", value.at)
         return false
     }
@@ -474,7 +506,7 @@ def match_or(what : TypeDeclPtr; wth : ExprOp2?; access : Expression?; var to :
     return true
 }
 
-[macro_function, unused_argument(what)]
+[macro_function]
 def match_tag(what : TypeDeclPtr; wth : Expression?; access : Expression?; var to : MatchTo) {
     var tag = wth as ExprTag
     if (tag.name != "v") {
@@ -487,8 +519,14 @@ def match_tag(what : TypeDeclPtr; wth : Expression?; access : Expression?; var t
             to.errors |> match_error("duplicate variable {tname}", tag.at)
             return false
         }
+        var cap = clone_expression(access)
+        // an Expression-typed capture binds the peeled node (see match_peel_r2v)
+        if (what != null && what.isPointer && what.firstType != null && isExpression(what.firstType)) {
+            cap = qmacro(match_peel_r2v($e(cap)))
+            cap |> force_at(tag.at)
+        }
         unsafe {
-            to.declarations[tname] = clone_expression(access)
+            to.declarations[tname] = cap
         }
         log_m("\tadd variable {tname} as {describe(access)}\n")
         return true
diff --git a/doc/reflections/das2rst.das b/doc/reflections/das2rst.das
index eba876524..b4da6ef1a 100644
--- a/doc/reflections/das2rst.das
+++ b/doc/reflections/das2rst.das
@@ -987,7 +987,7 @@ def document_module_fuzzer(_root : string) {
 def document_module_match(_root : string) {
     var mod = find_module("match")
     var groups <- array<DocGroup>(
-        hide_group(group_by_regex("Implementation details", mod, %regex~(match_type|match_expr)$%%))
+        hide_group(group_by_regex("Implementation details", mod, %regex~(match_type|match_expr|match_peel_r2v)$%%))
     )
     document("Pattern matching", mod, "match.rst", groups)
 }
@@ -1017,7 +1017,7 @@ def document_module_linq(_root : string) {
 def document_module_linq_boost(_root : string) {
     var mod = find_module("linq_boost")
     var groups <- array<DocGroup>(
-        hide_group(group_by_regex("Implementation details", mod, %regex~(match_type|match_expr)$%%))
+        hide_group(group_by_regex("Implementation details", mod, %regex~(match_type|match_expr|match_peel_r2v)$%%))
     )
     document("Boost module for LINQ", mod, "linq_boost.rst", groups)
 }
diff --git a/doc/source/conf.py b/doc/source/conf.py
index addfc9b45..f1886d28e 100644
--- a/doc/source/conf.py
+++ b/doc/source/conf.py
@@ -260,8 +260,10 @@
 
 # Additional stuff for the LaTeX preamble.
 #
-# These unicode characters is not known to LaTeX. If new used in .rst
-# they should be added here.
+# Unicode characters pdflatex doesn't know (sphinx's bundled set covers most
+# common ones — only stragglers land here). When a new char breaks the
+# docs/pdflatex preflight gate, add one line; keep \ensuremath so the mapping
+# works in both text and math mode.
 'preamble': r'''
 \newunicodechar{✓}{\checkmark}
 \newunicodechar{✗}{\texttimes}
@@ -273,6 +275,13 @@
 \DeclareUnicodeCharacter{2265}{\ensuremath{\geq}}
 \DeclareUnicodeCharacter{21D2}{\ensuremath{\Rightarrow}}
 \DeclareUnicodeCharacter{00D7}{\ensuremath{\times}}
+\DeclareUnicodeCharacter{2248}{\ensuremath{\approx}}
+\DeclareUnicodeCharacter{2208}{\ensuremath{\in}}
+\DeclareUnicodeCharacter{2209}{\ensuremath{\notin}}
+\DeclareUnicodeCharacter{221A}{\ensuremath{\surd}}
+\DeclareUnicodeCharacter{221E}{\ensuremath{\infty}}
+\DeclareUnicodeCharacter{2227}{\ensuremath{\wedge}}
+\DeclareUnicodeCharacter{2228}{\ensuremath{\vee}}
 ''',
 
 # Latex figure (float) alignment
diff --git a/doc/source/reference/linq_fold_patterns.rst b/doc/source/reference/linq_fold_patterns.rst
index f9035f631..0ef4c8abe 100644
--- a/doc/source/reference/linq_fold_patterns.rst
+++ b/doc/source/reference/linq_fold_patterns.rst
@@ -224,7 +224,7 @@ Array-source patterns
      - Theme 8 (audit 3b). The where_+order fused-loop path generalizes: when upstream ``distinct[_by]`` is present, declare ``var order_dset : table<...>`` and wrap the per-element ``push_clone`` with a set-gated ``if (!key_exists(...))`` block. Single source pass + in-place sort, no ``distinct_by_to_array`` intermediate iterator setup. Composes with ``where_`` (filter before distinct gate) and terminal ``_select`` (project at return). **Bails** (cascades) on ``distinct[_by] + order_by + first[_or_default]`` (streaming-min path has no dset hook) and on chains where ``take(N)`` is present (use the bounded-heap path via Theme 3 Phase 3 instead).
    * - ``._group_by(K)._select(reduce).to_array()``
      - pattern ``group_by_array`` (sub-codegen ``plan_group_by_core`` → ``reducer_emitters`` lookup)
-     - Per-key bucket reducer; single hash, one entry per group. PR D1: reducer dispatch is now a ``table<string; ReducerEmitterFn>`` lookup into named ``mk_reducer_*`` fns.
+     - Per-key bucket reducer; single hash, one entry per group. PR D1: reducer dispatch is now a ``table<string; ReducerEmitterFn>`` lookup into named ``mk_reducer_*`` fns. Accepted ``reduce`` spellings per slot (``is_bucket_reducer_call``): bare ``_._1 |> <r>()`` for ``length`` / ``count`` / ``long_count`` / ``sum`` / ``min`` / ``max`` / ``first`` / ``average``; inner-select ``_._1 |> select(L) |> <r>()`` and the equivalent direct selector ``_._1 |> <r>(L)`` (the 2-arg tier-2 overloads) for ``sum`` / ``min`` / ``max`` / ``average`` — an identity ``L`` canonicalizes to the bare form. ``first`` / ``count`` take no selector by design (their C# 2-arg forms are predicates). Untyped ``L`` params are fine on the bucket surface — the ``_select`` macro stamps the bucket element type before inference (``BucketLambdaStamper``), so fused and unfused chains accept the same spelling.
    * - ``._group_by(K)._having(P)._select(...).to_array()``
      - pattern ``group_by_array`` (sub-codegen ``plan_group_by_core``)
      - HAVING filter on the bucket reference (pre-aggregate); can lift hidden reducer slots referenced by ``P`` but absent from the select.
diff --git a/modules/dasPUGIXML/daslib/linq_fold_xml.das b/modules/dasPUGIXML/daslib/linq_fold_xml.das
index dcfb1fbe9..e2ee9052b 100644
--- a/modules/dasPUGIXML/daslib/linq_fold_xml.das
+++ b/modules/dasPUGIXML/daslib/linq_fold_xml.das
@@ -285,49 +285,33 @@ class XmlAdapter : SourceAdapter {
             // join |> group_by over XML: srcA = the XML node (DOM-walked), srcB = join's second arg.
             // plan_group_by_core drives the bucket-fill through XmlJoinAdapter.wrap_source_loop (pruning free).
             var joinCall = c.single["upstream_join"]
-            var srcbSrc = joinCall.arguments[1]
-            if (srcbSrc == null || srcbSrc._type == null || srcbSrc._type.firstType == null
-                    || !(srcbSrc._type.isGoodArrayType || srcbSrc._type.isArray || srcbSrc._type.isIterator)) {
+            var core = extract_upstream_join_core(joinCall)
+            var srcb = extract_upstream_join_array_srcb(joinCall)
+            if (core == null || srcb == null) {
                 return null
             }
-            var keyaLam = joinCall.arguments[2]
-            var keybLam = joinCall.arguments[3]
-            var resultLam = joinCall.arguments[4]
-            if (keyaLam == null || keybLam == null || resultLam == null
-                    || keyaLam._type == null || resultLam._type == null
-                    || resultLam._type.firstType == null) {
-                return null
-            }
-            var keyType = strip_const_ref(clone_type(keyaLam._type.firstType))
-            if (!is_primitive_join_key_type(keyType)) {
-                return null
-            }
-            var resultType = strip_const_ref(clone_type(resultLam._type.firstType))
-            var tupBType = strip_const_ref(clone_type(srcbSrc._type.firstType))
-            var srcbClone = clone_expression(srcbSrc)
-            srcbClone.genFlags.alwaysSafe = true
             var nameJoinClone : Expression? = nameArg != null ? clone_expression(nameArg) : null
             return new XmlJoinAdapter(shape = XmlJoinShape(
                 nodeExpr = clone_expression(nodeExpr),
                 srcName = qn("xnode", at),
                 elemType = clone_type(elemType),
                 nameArg = nameJoinClone,
-                srcBExpr = srcbClone,
+                srcBExpr = srcb.srcbClone,
                 srcBName = qn("xjoin_srcB", at),
-                srcBType = invoke_src_param_type(srcbSrc),
-                keyaLam = keyaLam,
-                keybLam = keybLam,
-                resultLam = resultLam,
-                keyType = keyType,
-                tupBType = tupBType,
-                resultType = resultType,
+                srcBType = srcb.srcbType,
+                keyaLam = core.keyaLam,
+                keybLam = core.keybLam,
+                resultLam = core.resultLam,
+                keyType = core.keyType,
+                tupBType = srcb.tupBType,
+                resultType = core.resultType,
                 resBindName = qn("xjoin_jres", at)))
         }
         var nameClone : Expression? = nameArg != null ? clone_expression(nameArg) : null
         return new XmlAdapter(nodeExpr = clone_expression(nodeExpr), srcName = qn("xnode", at),
                               elemType = clone_type(elemType), nameArg = nameClone)
     }
-    def override arraySrcName() : string {
+    def override loop_source_name() : string {
         return srcName
     }
     def override bind_name(at : LineInfo) : string {
diff --git a/skills/das_macros.md b/skills/das_macros.md
index 8599639c6..2014071fc 100644
--- a/skills/das_macros.md
+++ b/skills/das_macros.md
@@ -144,7 +144,7 @@ Note adapters can still *emit* code referencing the contributor's symbols by nam
 
 ## `qmatch` — predicate-style pattern matching
 
-Prefer `qmatch(expr, <pattern>).matched` over hand-rolled `is X / as X` cascades when matching structural AST shapes. `qmatch` is RTTI-strict and won't traverse `ExprRef2Value` — peel first via `qm_peel_ref2value`.
+Prefer `qmatch(expr, <pattern>).matched` over hand-rolled `is X / as X` cascades when matching structural AST shapes. `ExprRef2Value` wrappers are transparent on both sides (pattern + source) — see ast_match.das's header; `$e` captures bind the peeled node.
 
 ```das
 // HAND-ROLLED (avoid)
@@ -179,6 +179,39 @@ Canonical examples in `modules/dasSQLITE/daslib/sqlite_linq.das` — search for
 
 **Not every probe fits qmatch.** Shapes with cross-statement constraints (e.g., "3 statements with specific types where push target equals res var and recordNames count matches sources count") exceed qmatch's grammar — fall back to hand-rolled `is X / as X` for those. Self-circular file dependencies are also out: `ast_match.das` itself can't use `qmatch` to define its own grammar.
 
+## `match` (daslib/match) — node-class destructuring; pairs with `qmatch`
+
+`daslib/match` is the OTHER pattern matcher, and for AST work the two divide cleanly. Pick by what the pattern looks like:
+
+- **`qmatch`** when the pattern is *daslang source syntax* — `qmatch(that, $e(fa) * $e(fb) + $e(other))`. Operator trees, call shapes, field chains spelled as code.
+- **`match`** when the pattern is *node classes and fields* — `match (e) { if (ExprSwizzle(mask = "xy", value = $v(v))) … }` — or plain value dispatch (`match (op) { if ("*") … }`, enum tables like flatten's `zero_const_of`).
+
+What `match` does that hand-rolled ladders and qmatch don't (all test-pinned in `tests/match/`):
+
+```das
+match (keySide) {
+    // nested class patterns, literal field values, capture; null guard + is/as + ExprRef2Value
+    // peel are all emitted for you
+    if (ExprField(name = "key", value = ExprVar(name = match_expr(bindName)))) {
+        return lane == TableLane.KV
+    }
+    if (ExprVar(name = match_expr(bindName))) {
+        return lane != TableLane.KV
+    }
+    if (_) {
+        return false
+    }
+}
+return false   // match is statement-shaped; flow analysis wants the trailing return
+```
+
+- **Alternation `||` works in field position** — `ExprOp2(op = "+" || "-")` — and at arm level; guards compose with `&&` referencing captures and locals.
+- **`match_expr(localVar)`** compares a field against a runtime expression (das_string fields compare against `string` locals directly).
+- **das-vector fields can NOT be destructured** — `ExprCall.arguments` / `ExprBlock.list` are `dasvector`-backed and the array-pattern arm rejects them ("is not an array"). Capture the node and index/length-check manually; this is why deep block-shape probes (`extract_decs_bridge`) stay hand-rolled.
+- **Statement-shaped, not expression-shaped** — a tuple-returning recognizer that mixes name dispatch with structural probes (`is_bucket_reducer_call`) usually reads better hand-rolled; convert only when the ladder is the function.
+
+Canonical conversions: `is_key_ref` / `join_keyb_is_bare_key` (daslib/linq_fold_table.das / linq_fold_common.das), flatten_opt's `component_read_of` + `zero_const_of`. flatten_opt and the linq_fold family require BOTH libraries and use each where it fits — do the same.
+
 ## `qmacro` vs `quote` (code generation)
 
 - **`qmacro(expr)`** — quasi-quote with reification splices (`$v()`, `$e()`, `$c()`, `$t()`, `$i()`, `$f()`, `$a()`, `$b()` etc.). Use when the generated code contains interpolated values.
@@ -418,24 +451,9 @@ Loud failure tells the user that immediately; silent `return null`
 re-queues the macro and lets the daslang pipeline emit a confusing
 infer-time cascade instead.
 
-### Peel `ExprRef2Value` before `qmatch`
-
-Post-Mode-2-expansion AST walking will see field reads wrapped in `ExprRef2Value`. `qmatch` is RTTI-strict — it matches `ExprField` but not `ExprRef2Value(ExprField(...))`. **Route through `qm_peel_ref2value`** (the single source of truth in `daslib/ast_match.das`) instead of hand-rolling either a `while`-peel or an `if`-peel:
-
-```das
-require daslib/ast_match
-
-qm_peel_ref2value(node)
-if (node == null) {
-    macro_error(prog, at, "_where: ExprRef2Value with null subexpr")
-    return ""
-}
-// now `qmatch(node, _.$f(name))` etc. work as expected
-```
-
-`qm_peel_ref2value` currently uses `while (e is ExprRef2Value)` rather than single-`if`-peeling. The conservative loop is intentional until block-folding is fully audited — `tests/ast_match/test_ref2value_skip.das` exercises a triple-wrap shape, and `src/ast/ast_block_folding.cpp` synthesis paths could theoretically produce a nested wrapper. Once that audit lands, the helper switches to single-`if` peel in one place and every consumer follows automatically.
+### `ExprRef2Value` transparency (qmatch + match)
 
-Auto-peel **inside** `qmatch` itself remains a TODO documented in `daslib/ast_match.das`. Until then, every analyzer entry point that takes an expression coming out of post-expansion (predicate body, projection body, classifier helpers like `is_const_or_captured_var`) needs the `qm_peel_ref2value` call at the top.
+Post-typer AST walking sees field reads wrapped in `ExprRef2Value` (no surface syntax). Both matchers peel it automatically: `qmatch` strips it on the pattern AND source side at every dispatch (ast_match.das header; `$e` captures bind the peeled node), and `match` peels it for AST class patterns + `$v` captures via `match_peel_r2v` (an explicit `ExprRef2Value(...)` pattern in `match` still matches the wrapper itself — only `match` can spell it). Hand-written analyzers that DON'T go through a matcher still need `qm_peel_ref2value(node)` (the in-place helper in `daslib/ast_match.das`) at their entry — never hand-roll the `while (… is ExprRef2Value)` loop.
 
 ### When you really do need raw arguments
 
diff --git a/skills/preflight.md b/skills/preflight.md
index 2197061a9..9d7dd0239 100644
--- a/skills/preflight.md
+++ b/skills/preflight.md
@@ -31,7 +31,7 @@ the CI ref, never a working-tree copy.
 | `extended_checks.yml` | every PR | linux + darwin15-arm64 + windows, ALL release modules ON |
 | `wasm_build.yml` | every PR | emscripten build of `web/` on 3 OSes + `wasm_cross` |
 | `build_eastl.yml` | every PR | EASTL shadow-config build + no-fileio build (linux clang) |
-| `doc.yml` | only if `doc/**`, `daslib/**`, or `src/builtin/**` changed | six doc gates |
+| `doc.yml` | only if `doc/**`, `daslib/**`, or `src/builtin/**` changed | seven doc gates |
 | `playground-e2e.yml` | only if `site/**` / `web/examples/ui/**` changed | Playwright on the web playground |
 
 ## build.yml — the build matrix
@@ -129,10 +129,10 @@ cmake -B build -DDAS_HV_DISABLED=OFF -DDAS_LLVM_DISABLED=OFF -DDAS_AUDIO_DISABLE
 | dasImgui install | `<daslang> utils/daspkg/main.das -- install dasImgui --branch master` | **the externals-coupling gate**: CI builds external dasImgui from ITS master against YOUR branch — an ABI break vs externals reds this on an unrelated-looking step. See `skills/abi_break_sweep.md` |
 | Coverage | `<daslang> dastest/dastest.das -- --cov-path coverage.lcov --color --test tests/language --timeout 1800` + `dascov` | rarely needed locally |
 
-## doc.yml — the six gates
+## doc.yml — the seven gates
 
 Only triggered when `doc/**`, `daslib/**`, or `src/builtin/**` changed — but
-`daslib/**` means **any daslib edit** runs all six. CI stops at the FIRST
+`daslib/**` means **any daslib edit** runs all seven. CI stops at the FIRST
 das2rst panic, so one CI round can hide N-1 further issues — loop gate 1
 locally until clean. Needs a daslang built with `DAS_HV_DISABLED=OFF` and
 `DAS_PUGIXML_DISABLED=OFF` (das2rst documents those modules). Step-by-step
@@ -146,8 +146,16 @@ workflow: `skills/make_pr.md` §4; conventions: `skills/documentation_rst.md`.
 | 4 | no untracked generated RST | `git ls-files --others --exclude-standard doc/source/stdlib/` → must be empty; `git add` the new files |
 | 5 | LaTeX sphinx, warnings-as-errors | `sphinx-build -W --keep-going -b latex -d doc/sphinx-build doc/source build/latex` |
 | 6 | HTML sphinx, warnings-as-errors | `sphinx-build -W --keep-going -b html -d doc/sphinx-build doc/source build/site` — delete `doc/sphinx-build` first; cached builds hide errors |
-
-(The PDF compile after gate 6 is `continue_on_error` — not a gate.)
+| 7 | both PDFs compile (latexmk, mirrors CI's latex-action) | `latexmk -pdf -interaction=nonstopmode -halt-on-error -cd build/latex/daslangstdlib.tex` then `…/daslang.tex` |
+
+Gate 7 is **stricter than CI on purpose**: CI marks the PDF compiles
+`continue_on_error`, so an undeclared unicode char (the U+2261 incident)
+ships broken release PDFs without ever going red. The fix is always one
+line — `\DeclareUnicodeCharacter{XXXX}{…}` in `doc/source/conf.py`
+`latex_elements['preamble']` (sphinx's bundled set covers most common chars;
+only stragglers need declaring). Tool discovery: preflight probes PATH, then
+`~/Library/Python/*/bin` + `~/.local/bin` for sphinx-build and
+`/Library/TeX/texbin` for latexmk (mac: `brew install --cask basictex`).
 
 ## wasm_build.yml
 
diff --git a/skills/writing_tests.md b/skills/writing_tests.md
index 07a393932..3a3cd110d 100644
--- a/skills/writing_tests.md
+++ b/skills/writing_tests.md
@@ -13,7 +13,20 @@ If a specific file genuinely can't AOT (emitter bug, interpreted-only by design)
 `options no_aot` IN THE FILE **and** exclude it from the directory's AOT glob, with a
 comment + issue link on both. Glob exclusion alone is NOT enough — test_aot still *runs*
 the file and trips 50101 on its missing stubs; `options no_aot` is what makes the runtime
-skip AOT linking for it.
+skip AOT linking for it. (2026-06-11: in-file `options no_aot` currently fails in the AOT
+hash itself — fix incoming on master; until it lands, interp-only tests are gated by the
+directory filter below instead.)
+
+## The `tests/.das_test` directory filter — and its root-path caveat
+
+`tests/.das_test` is a daslang script dastest compiles per run; its `can_visit_folder`
+pinvoke gates whole directories per mode — e.g. `no_aot/`, `ast/`, `ast_match/` are
+skipped under `--use-aot` / `-jit`, module dirs (dasSQLITE, dasPUGIXML…) skip when the
+module isn't built in. **The filter is looked up only at the `--test` ROOT path** —
+`--test tests` finds and applies it, but `--test tests/flatten` looks for
+`tests/flatten/.das_test` (absent) and walks into `no_aot/` unfiltered, producing
+false `error[50101]` / JIT failures. For AOT/JIT validation, sweep `--test tests`
+(CI's form) or target individual files — never a subtree that contains gated dirs.
 
 ## Test file structure
 
diff --git a/tests/linq/test_linq_reducer_shapes.das b/tests/linq/test_linq_reducer_shapes.das
new file mode 100644
index 000000000..6b1a6b667
--- /dev/null
+++ b/tests/linq/test_linq_reducer_shapes.das
@@ -0,0 +1,141 @@
+options gen2
+require daslib/linq
+require dastest/testing_boost public
+
+require daslib/linq_boost
+
+// Reducer-shape coverage (LINQ_TO_TABLE.md "Late stage" items 1+2):
+// selector overloads sum/min/max/average(src, selector), untyped lambda params
+// on the group-bucket surface, and identity-lambda canonicalization.
+
+[test]
+def test_selector_reducers_tier2(t : T?) {
+    t |> run("max with selector over array returns projected max") @(tt : T?) {
+        tt |> equal(6, [3, 1, 2] |> max(@(v : int) => v * 2))
+    }
+    t |> run("min with selector over array") @(tt : T?) {
+        tt |> equal(2, [3, 1, 2] |> min(@(v : int) => v * 2))
+    }
+    t |> run("sum with selector over array") @(tt : T?) {
+        tt |> equal(9, [3, 1, 2] |> sum(@(v : int) => v + 1))
+    }
+    t |> run("average with selector over array returns double") @(tt : T?) {
+        tt |> equal(4.0lf, [1, 2, 3] |> average(@(v : int) => v * 2))
+    }
+    t |> run("selector reducers over iterator sources") @(tt : T?) {
+        let arr312 = [3, 1, 2]
+        let arr123 = [1, 2, 3]
+        tt |> equal(6, unsafe(each(arr312)) |> max(@(v : int) => v * 2))
+        tt |> equal(9, unsafe(each(arr312)) |> sum(@(v : int) => v + 1))
+        tt |> equal(2, unsafe(each(arr312)) |> min(@(v : int) => v * 2))
+        tt |> equal(4.0lf, unsafe(each(arr123)) |> average(@(v : int) => v * 2))
+    }
+    t |> run("selector projecting to a different type") @(tt : T?) {
+        tt |> equal(3.5lf, [3, 1, 2, 8] |> average(@(v : int) => v))
+    }
+}
+
+[test]
+def test_inner_select_untyped_param(t : T?) {
+    t |> run("untyped inner-select lambda infers (fused bare form)") @(tt : T?) {
+        let got <- _fold([10, 20, 30, 11, 21, 31, 12]._group_by(_ % 3)._select(_._1 |> select(@(x) => x * 2) |> sum))
+        tt |> equal(3, length(got))
+        var total = 0
+        for (s in got) {
+            total += s
+        }
+        tt |> equal(270, total)
+    }
+    t |> run("untyped inner-select matches typed spelling (named tuple, fused + unfused)") @(tt : T?) {
+        let arr = [10, 20, 30, 11, 21, 31, 12, 22]
+        let foldGot  <- _fold(arr._group_by(_ % 4)._select((K = _._0, S = _._1 |> select(@(x) => x + 1) |> sum)))
+        let typedRef <- (arr._group_by(_ % 4)._select((K = _._0, S = _._1 |> select(@(x : int) => x + 1) |> sum)))
+        tt |> equal(length(typedRef), length(foldGot))
+        var refMap : table<int; int>
+        for (r in typedRef) {
+            refMap[r.K] = r.S
+        }
+        for (f in foldGot) {
+            tt |> equal(refMap[f.K], f.S)
+        }
+    }
+    t |> run("untyped inner-select over a table source (kv bucket)") @(tt : T?) {
+        var prices <- { "apple" => 10, "banana" => 21, "cherry" => 30 }
+        var got <- _fold(each_kv(prices)._group_by(_.value % 2)._select(_._1 |> select(@(c) => c.value) |> sum))
+        var total = 0
+        for (s in got) {
+            total += s
+        }
+        tt |> equal(61, total)
+    }
+}
+
+[test]
+def test_two_arg_reducers_over_bucket(t : T?) {
+    t |> run("2-arg max over bucket == select-then-max spelling (fused)") @(tt : T?) {
+        let arr = [10, 20, 30, 11, 21, 31, 12]
+        let a <- _fold(arr._group_by(_ % 3)._select((K = _._0, M = _._1 |> max(@(v) => v * 2))))
+        let b <- _fold(arr._group_by(_ % 3)._select((K = _._0, M = _._1 |> select(@(x : int) => x * 2) |> max)))
+        tt |> equal(length(b), length(a))
+        var refMap : table<int; int>
+        for (r in b) {
+            refMap[r.K] = r.M
+        }
+        for (f in a) {
+            tt |> equal(refMap[f.K], f.M)
+        }
+    }
+    t |> run("2-arg sum/min/average over bucket (fused bare form)") @(tt : T?) {
+        let arr = [10, 20, 30, 11, 21, 31, 12]
+        let sums <- _fold(arr._group_by(_ % 3)._select(_._1 |> sum(@(v) => v + 1)))
+        var total = 0
+        for (s in sums) {
+            total += s
+        }
+        tt |> equal(135 + 7, total)   // sum of all elements + one per element
+        let avgs <- _fold(arr._group_by(_ % 3)._select(_._1 |> average(@(v) => v * 2)))
+        tt |> equal(3, length(avgs))
+    }
+    t |> run("2-arg reducer matches unfused parity") @(tt : T?) {
+        let arr = [10, 20, 30, 11, 21, 31, 12, 22]
+        let foldGot <- _fold(arr._group_by(_ % 4)._select((K = _._0, M = _._1 |> max(@(v) => v + 5))))
+        let refGot  <- (arr._group_by(_ % 4)._select((K = _._0, M = _._1 |> max(@(v) => v + 5))))
+        tt |> equal(length(refGot), length(foldGot))
+        var refMap : table<int; int>
+        for (r in refGot) {
+            refMap[r.K] = r.M
+        }
+        for (f in foldGot) {
+            tt |> equal(refMap[f.K], f.M)
+        }
+    }
+}
+
+[test]
+def test_identity_lambda_canonicalizes(t : T?) {
+    t |> run("identity-lambda reducer == bare reducer (fused)") @(tt : T?) {
+        let arr = [10, 20, 30, 11, 21, 31, 12]
+        let a <- _fold(arr._group_by(_ % 3)._select((K = _._0, M = _._1 |> max(@(v) => v))))
+        let b <- _fold(arr._group_by(_ % 3)._select((K = _._0, M = _._1 |> max())))
+        tt |> equal(length(b), length(a))
+        var refMap : table<int; int>
+        for (r in b) {
+            refMap[r.K] = r.M
+        }
+        for (f in a) {
+            tt |> equal(refMap[f.K], f.M)
+        }
+    }
+    t |> run("identity-lambda min and sum (fused bare form)") @(tt : T?) {
+        let arr = [10, 20, 30, 11, 21, 31, 12]
+        let mins  <- _fold(arr._group_by(_ % 3)._select(_._1 |> min(@(v) => v)))
+        let bare  <- _fold(arr._group_by(_ % 3)._select(_._1 |> min()))
+        tt |> equal(length(bare), length(mins))
+        var total = 0
+        let sums <- _fold(arr._group_by(_ % 3)._select(_._1 |> sum(@(v) => v)))
+        for (s in sums) {
+            total += s
+        }
+        tt |> equal(135, total)
+    }
+}
diff --git a/tests/match/test_match_r2v_peel.das b/tests/match/test_match_r2v_peel.das
new file mode 100644
index 000000000..b9adcd35c
--- /dev/null
+++ b/tests/match/test_match_r2v_peel.das
@@ -0,0 +1,103 @@
+options gen2
+require dastest/testing_boost public
+require daslib/match
+require daslib/ast
+require daslib/ast_boost
+
+// ExprRef2Value transparency: the typer wraps reads of references in
+// ExprRef2Value (no surface syntax), so AST class patterns peel it on the
+// source side at every nesting level — mirroring qmatch. An explicit
+// ExprRef2Value(...) pattern still matches the wrapper itself, and $v
+// captures of Expression-typed values bind the peeled node.
+
+[sideeffects]
+def match_field_of_var(E : ExpressionPtr) : string {
+    match (E) {
+        if (ExprField(name = $v(fname), value = ExprVar(name = $v(vname)))) {
+            return "{vname}.{fname}"
+        }
+        if (_) {
+            return "other"
+        }
+    }
+}
+
+[sideeffects]
+def match_wrapper_explicitly(E : ExpressionPtr) : string {
+    match (E) {
+        if (ExprRef2Value(subexpr = ExprVar(name = $v(vname)))) {
+            return "wrapped {vname}"
+        }
+        if (ExprVar(name = $v(vname))) {
+            return "bare {vname}"
+        }
+        if (_) {
+            return "other"
+        }
+    }
+}
+
+[sideeffects]
+def capture_field_value(E : ExpressionPtr) : string {
+    match (E) {
+        if (ExprField(value = $v(base))) {
+            return "{base.__rtti}"
+        }
+        if (_) {
+            return "other"
+        }
+    }
+}
+
+[test]
+def test_r2v_peel(t : T?) {
+    t |> run("top-level wrapper peels for class patterns") @@(t : T?) {
+        ast_gc_guard() {
+            var v <- new ExprVar(name := "a")
+            var fld <- new ExprField(name := "foo", value = v)
+            var wrapped <- new ExprRef2Value(subexpr = fld)
+            t |> equal("a.foo", match_field_of_var(wrapped))
+        }
+    }
+    t |> run("nested wrapper peels at the field boundary") @@(t : T?) {
+        ast_gc_guard() {
+            var v <- new ExprVar(name := "b")
+            var wrappedVar <- new ExprRef2Value(subexpr = v)
+            var fld <- new ExprField(name := "bar", value = wrappedVar)
+            t |> equal("b.bar", match_field_of_var(fld))
+        }
+    }
+    t |> run("unwrapped source still matches (no-op peel)") @@(t : T?) {
+        ast_gc_guard() {
+            var v <- new ExprVar(name := "c")
+            var fld <- new ExprField(name := "baz", value = v)
+            t |> equal("c.baz", match_field_of_var(fld))
+        }
+    }
+    t |> run("explicit ExprRef2Value pattern matches the wrapper itself") @@(t : T?) {
+        ast_gc_guard() {
+            var v <- new ExprVar(name := "d")
+            var wrapped <- new ExprRef2Value(subexpr = v)
+            t |> equal("wrapped d", match_wrapper_explicitly(wrapped))
+            var bare <- new ExprVar(name := "e")
+            t |> equal("bare e", match_wrapper_explicitly(bare))
+        }
+    }
+    t |> run("$v capture of an Expression field binds the peeled node") @@(t : T?) {
+        ast_gc_guard() {
+            var v <- new ExprVar(name := "f")
+            var wrappedVar <- new ExprRef2Value(subexpr = v)
+            var fld <- new ExprField(name := "qux", value = wrappedVar)
+            t |> equal("ExprVar", capture_field_value(fld))
+        }
+    }
+    t |> run("double wrapper peels fully") @@(t : T?) {
+        ast_gc_guard() {
+            var v <- new ExprVar(name := "g")
+            var w1 <- new ExprRef2Value(subexpr = v)
+            var w2 <- new ExprRef2Value(subexpr = w1)
+            var fld <- new ExprField(name := "deep", value = w2)
+            t |> equal("g.deep", match_field_of_var(fld))
+        }
+    }
+}
diff --git a/utils/preflight/README.md b/utils/preflight/README.md
index 1f071f29a..206d32df3 100644
--- a/utils/preflight/README.md
+++ b/utils/preflight/README.md
@@ -9,7 +9,7 @@ manual commands this tool automates live in
 # on changed C++ (full src+tests-cpp sweep when a header changed)
 daslang utils/preflight/main.das
 
-# full tier: adds dasgen freshness, CI-only-das compile sweep, the six doc
+# full tier: adds dasgen freshness, CI-only-das compile sweep, the seven doc
 # gates, ctest -L small, interpreter/JIT/AOT suites, sequence smoke
 daslang utils/preflight/main.das -- --full
 
diff --git a/utils/preflight/main.das b/utils/preflight/main.das
index bb2d82d09..3a8d2d326 100644
--- a/utils/preflight/main.das
+++ b/utils/preflight/main.das
@@ -9,7 +9,7 @@ options persistent_heap
 // frontend (syntax-only) pass on changed C++ — escalates to a full
 // src+tests-cpp sweep when a header changed, since a header edit can break
 // template instantiation in TUs the diff never touched. --full adds dasgen
-// freshness, the CI-only-das compile sweep, the six doc gates, tests-cpp
+// freshness, the CI-only-das compile sweep, the seven doc gates, tests-cpp
 // (ctest -L small), interpreter/JIT/AOT suites, and the sequence smoke.
 //
 // Cross-platform: Windows / macOS / Linux+WSL. Subprocesses go through
@@ -243,6 +243,44 @@ def find_clang(cli_path, build_dir : string) : tuple<exe : string; is_cl : bool>
     return (exe = "", is_cl = false)
 }
 
+// sphinx-build / latexmk are often OFF PATH on dev boxes: pip --user puts
+// sphinx in ~/Library/Python/<ver>/bin (mac) / ~/.local/bin (linux); BasicTeX
+// puts latexmk in /Library/TeX/texbin (mac). PATH wins when present.
+def find_sphinx_build() : string {
+    if (tool_available("sphinx-build")) return "sphinx-build"
+    if (!has_env_variable("HOME")) return ""
+    let home = get_env_variable("HOME")
+    var candidates : array<string>
+    if (get_platform_name() == "darwin") {
+        expand_glob("{home}/Library/Python/*/bin/sphinx-build", candidates)
+    }
+    candidates |> push("{home}/.local/bin/sphinx-build")
+    for (p in candidates) {
+        if (fexist(p) && tool_available(p)) return p
+    }
+    return ""
+}
+
+def find_latexmk() : string {
+    if (tool_available("latexmk")) return "latexmk"
+    let mac_tex = "/Library/TeX/texbin/latexmk"
+    if (get_platform_name() == "darwin" && fexist(mac_tex) && tool_available(mac_tex)) return mac_tex
+    return ""
+}
+
+// latexmk logs run thousands of lines; the actionable error sits at the end.
+def last_lines(s : string; n : int) : string {
+    let lines <- split(s, "\n")
+    let total = length(lines)
+    if (total <= n) return s
+    var keep : array<string>
+    keep |> reserve(n)
+    for (i in range(total - n, total)) {
+        keep |> push(lines[i])
+    }
+    return join(keep, "\n")
+}
+
 // TUs the clang frontend pass can compile standalone: core + tests-cpp.
 // Module/util TUs need external or build-generated headers; CI's clang-cl
 // lane covers those.
@@ -437,20 +475,19 @@ def gate_dasgen(ctx : PreflightCtx) : GateResult {
 
 // ===== docs gates (mirror doc.yml 1-6) =====
 
-def skip_sphinx_gates(reason : string; var results : array<GateResult>&) {
-    let names = ["docs/sphinx-latex", "docs/sphinx-html"]
+def skip_gates(names : array<string>; reason : string; var results : array<GateResult>&) {
     results |> reserve(length(results) + length(names))
     for (g in names) {
         results |> emplace(GateResult(name = g, status = GateStatus.Skip, detail = reason))
     }
 }
 
+def skip_sphinx_gates(reason : string; var results : array<GateResult>&) {
+    skip_gates(["docs/sphinx-latex", "docs/sphinx-html", "docs/pdflatex"], reason, results)
+}
+
 def docs_skip_all(reason : string; var results : array<GateResult>&) {
-    let names = ["docs/das2rst", "docs/stubs", "docs/uncategorized", "docs/untracked", "docs/sphinx-latex", "docs/sphinx-html"]
-    results |> reserve(length(results) + length(names))
-    for (g in names) {
-        results |> emplace(GateResult(name = g, status = GateStatus.Skip, detail = reason))
-    }
+    skip_gates(["docs/das2rst", "docs/stubs", "docs/uncategorized", "docs/untracked", "docs/sphinx-latex", "docs/sphinx-html", "docs/pdflatex"], reason, results)
 }
 
 def gate_docs(ctx : PreflightCtx; var results : array<GateResult>&) {
@@ -511,8 +548,9 @@ def gate_docs(ctx : PreflightCtx; var results : array<GateResult>&) {
 
     // gates 5+6: sphinx -W, latex then html. Delete the doctree cache first —
     // CI builds from a fresh clone, a stale local cache hides warnings.
-    if (!tool_available("sphinx-build")) {
-        skip_sphinx_gates("sphinx-build not found — pip install -r doc/requirements.txt", results)
+    let sphinx = find_sphinx_build()
+    if (empty(sphinx)) {
+        skip_sphinx_gates("sphinx-build not found (PATH, ~/Library/Python/*/bin, ~/.local/bin) — pip install -r doc/requirements.txt", results)
         return
     }
     rmdir_rec("doc/sphinx-build")
@@ -523,13 +561,40 @@ def gate_docs(ctx : PreflightCtx; var results : array<GateResult>&) {
         return
     }
     t0 = ref_time_ticks()
-    let r5 = run_argv(["sphinx-build", "-W", "--keep-going", "-b", "latex", "-d", "doc/sphinx-build", "doc/source", "build/latex"])
+    let r5 = run_argv([sphinx, "-W", "--keep-going", "-b", "latex", "-d", "doc/sphinx-build", "doc/source", "build/latex"])
     results |> emplace(GateResult(name = "docs/sphinx-latex", status = r5.rc == 0 ? GateStatus.Pass : GateStatus.Fail,
         seconds = seconds_since(t0), detail = r5.rc == 0 ? "" : "latex builder warnings (warnings-as-errors)", output = r5.out))
     t0 = ref_time_ticks()
-    let r6 = run_argv(["sphinx-build", "-W", "--keep-going", "-b", "html", "-d", "doc/sphinx-build", "doc/source", "build/site"])
+    let r6 = run_argv([sphinx, "-W", "--keep-going", "-b", "html", "-d", "doc/sphinx-build", "doc/source", "build/site"])
     results |> emplace(GateResult(name = "docs/sphinx-html", status = r6.rc == 0 ? GateStatus.Pass : GateStatus.Fail,
         seconds = seconds_since(t0), detail = r6.rc == 0 ? "" : "html builder warnings (warnings-as-errors)", output = r6.out))
+
+    // gate 7: compile both PDFs like CI's latex-action does (latexmk; -cd runs in
+    // build/latex so the sphinx .sty files resolve). CI marks these steps
+    // continue-on-error, so an undeclared unicode char ships broken release PDFs
+    // silently — locally it is a FAIL with a one-line remedy.
+    if (r5.rc != 0) {
+        results |> emplace(GateResult(name = "docs/pdflatex", status = GateStatus.Skip,
+            detail = "sphinx-latex failed — no .tex sources to compile"))
+        return
+    }
+    let latexmk = find_latexmk()
+    if (empty(latexmk)) {
+        results |> emplace(GateResult(name = "docs/pdflatex", status = GateStatus.Skip,
+            detail = "latexmk not found (PATH, /Library/TeX/texbin) — brew install --cask basictex (mac) / texlive (linux)"))
+        return
+    }
+    t0 = ref_time_ticks()
+    for (root in ["daslangstdlib.tex", "daslang.tex"]) {
+        let rl = run_argv([latexmk, "-pdf", "-interaction=nonstopmode", "-halt-on-error", "-cd", "build/latex/{root}"], 900.0)
+        if (rl.rc != 0) {
+            results |> emplace(GateResult(name = "docs/pdflatex", status = GateStatus.Fail, seconds = seconds_since(t0),
+                detail = "{root}: pdflatex rejected the docs — new unicode char? add \\DeclareUnicodeCharacter to doc/source/conf.py 'preamble'",
+                output = last_lines(rl.out, 60)))
+            return
+        }
+    }
+    results |> emplace(GateResult(name = "docs/pdflatex", status = GateStatus.Pass, seconds = seconds_since(t0)))
 }
 
 // ===== CI-only das compile sweep =====
@@ -763,7 +828,7 @@ def gate_table() : array<GateInfo> {
         GateInfo(name = "cpp-syntax", full_only = false, doc = "clang frontend pass on changed C++; header change → full src+tests-cpp sweep"),
         GateInfo(name = "dasgen", full_only = true, doc = "gen_bind.das freshness vs include/daScript/builtin/"),
         GateInfo(name = "ci-das", full_only = true, doc = "compile-only sweep of CI-only das surface (ci_only_das.txt)"),
-        GateInfo(name = "docs", full_only = true, doc = "the six doc.yml gates (das2rst, stubs, uncategorized, untracked, sphinx ×2)"),
+        GateInfo(name = "docs", full_only = true, doc = "the seven doc.yml gates (das2rst, stubs, uncategorized, untracked, sphinx ×2, pdflatex)"),
         GateInfo(name = "tests-cpp", full_only = true, doc = "ctest -L small"),
         GateInfo(name = "tests-interp", full_only = true, doc = "full interpreter suite"),
         GateInfo(name = "tests-jit", full_only = true, doc = "full JIT suite (skips when dasLLVM absent)"),