From 8751bb9ba4e8de07512b55203e1ccf4090ef5757 Mon Sep 17 00:00:00 2001
From: Boris Batkin <bbatkin@gmail.com>
Date: Wed, 10 Jun 2026 23:06:18 -0700
Subject: [PATCH 01/11] =?UTF-8?q?each=5Fkv(table)=20=E2=80=94=20kv-pair=20?=
 =?UTF-8?q?iteration=20as=20named=20tuples;=20fix=20generator=20zip=20lowe?=
 =?UTF-8?q?ring=20over=20an=20empty=20source?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

each_kv(tab) yields (key, value) named tuples — read-only copies, strict can_copy gate on the
value type (no clone fallback; matches insert's own gate). Explicit reject overloads for
table<K> void values ("iterate keys() instead") and dim-array values, which otherwise mis-bind
and cascade. Pure daslib: a generator zipping the keys/values builtin slot-walk iterators.
PR1 of the LINQ table-source arc (plan: benchmarks/sql/LINQ_TO_TABLE.md).

Also fixes a pre-existing generator-lowering bug exposed by the empty-table test: the yield-for
lowering emitted `loop &&= _builtin_iterator_first(...)` per source, short-circuiting first()
on later sources when an earlier one came up empty — but end_loop closes ALL sources, and
closing a never-opened container iterator unlocks a container whose lock magic was already
cleared ("table/array magic mismatch on unlock"). Reachable on master by any generator zipping
two lockable containers with the first one empty. Now emits `loop = first(...) && loop`,
matching SimNode_ForWithIterator's always-evaluate-first semantics.
Regression: tests/language/generator_zip_empty.das (written first, failed, now green).

Validation: full INTERP suite 10891/0 fail; AOT tests/language 1054 + tests/linq 1893; JIT lane
green on new files; lint (MCP + CI) clean; das2rst no stubs/Uncategorized; Sphinx clean.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 benchmarks/sql/LINQ_TO_TABLE.md               | 107 ++++++++++++++++++
 daslib/builtin.das                            |  46 ++++++++
 doc/reflections/das2rst.das                   |   2 +-
 ...ion-builtin-each_kv-0xdb81e5ca7a0e3baa.rst |   1 +
 src/ast/ast_generate.cpp                      |  11 +-
 tests/language/failed_each_kv.das             |  23 ++++
 tests/language/generator_zip_empty.das        |  65 +++++++++++
 tests/language/table_each_kv.das              |  84 ++++++++++++++
 tests/linq/test_linq_table_source.das         |  32 ++++++
 9 files changed, 367 insertions(+), 4 deletions(-)
 create mode 100644 benchmarks/sql/LINQ_TO_TABLE.md
 create mode 100644 doc/source/stdlib/handmade/function-builtin-each_kv-0xdb81e5ca7a0e3baa.rst
 create mode 100644 tests/language/failed_each_kv.das
 create mode 100644 tests/language/generator_zip_empty.das
 create mode 100644 tests/language/table_each_kv.das
 create mode 100644 tests/linq/test_linq_table_source.das
diff --git a/benchmarks/sql/LINQ_TO_TABLE.md b/benchmarks/sql/LINQ_TO_TABLE.md
new file mode 100644
index 000000000..579356b8b
--- /dev/null
+++ b/benchmarks/sql/LINQ_TO_TABLE.md
@@ -0,0 +1,107 @@
+# LINQ → TABLE — arc plan
+
+Sibling of [LINQ.md](LINQ.md) / [LINQ_TO_DECS.md](LINQ_TO_DECS.md). Plan of record for
+`table<K;V>` / `table<K>` as the 6th `_fold` source, plus the `to_table` sink.
+Edited in-place as PRs land.
+
+Status: **PR1 in flight** (`each_kv` builtin).
+
+PR1 findings:
+- **Pre-existing generator-lowering bug, fixed in PR1**: the yield-for lowering emitted
+  `loop &&= _builtin_iterator_first(...)` per source — short-circuiting `first()` on later
+  sources when an earlier one came up empty, while the end-of-loop path closes ALL sources.
+  Closing a never-opened container iterator unlocks a container whose lock magic was already
+  cleared → "table/array magic mismatch on unlock". Reachable before each_kv (any generator
+  zipping two lockable containers, first one empty). Fix: `loop = first(...) && loop`
+  (ast_generate.cpp), matching SimNode_ForWithIterator. Regression:
+  `tests/language/generator_zip_empty.das`.
+- `each_kv` needs explicit reject overloads for `table<K>` (void values → "iterate keys()
+  instead") and dim-array values — the bare generic otherwise mis-binds (valT drops the dim)
+  and cascades confusing errors from inside builtin.das.
+- Tier-2 chain heads need `unsafe(each_kv(tab))` — same `[unsafe_outside_of_for]` contract as
+  `each(arr)`; fused chains (PR2) rewrite the head before inference so the wrap disappears.
+- builtin module documents via handmade RST (it is a `get_module` C++-flow module in das2rst),
+  so each_kv has both `//!` in-source docs and a filled handmade file.
+
+## Settled decisions
+
+- **kv surface** = `kv.key` / `kv.value` named tuple, **read-only** (a by-value tuple has no
+  write-through; in-place mutation stays the domain of `for (k, v in keys(t), values(t))`).
+- **Pipe head** = `each_kv(tab)`; `keys(tab)` / `values(tab)` are recognized as table sources too.
+- **`each_kv` is pure daslib** with a strict `can_copy` gate on the value type — no clone
+  fallback, ever (a hidden per-element `clone` of an `array<…>` value is the exact sadness the
+  gate bans). Matches existing language ergonomics: plain `insert` already concept-asserts
+  `can_copy` on values ([builtin.das ~921](../../daslib/builtin.das)), so non-copyable-valued
+  tables only arise via `insert_clone` / `tab[k] <- v`.
+- **Uniform gate enforcement falls out free**: for non-copyable values `extract_table_source`
+  returns null → the chain defers to tier-2 → the real `each_kv` instantiates → `concept_assert`
+  fires (error 31400). One error source; deferral never silently changes semantics.
+- Shape (probe-validated): two const/var overloads mirroring `keys`,
+  `generator<tuple<key:keyT; value:valT> -const> capture(<- kit, <- vit)` zipping the two builtin
+  slot-walk iterators. Multi-source `for` + `yield` works in *generators*; iterator
+  *comprehensions* reject it ("can't yield from inside the block") — hence the generator form.
+- No profiling pre-PR; straight to m7 bench lanes. Scan lanes before the join probe. Sink in
+  this arc.
+
+## PR sequence
+
+1. **`each_kv` in builtin.das** — the validated shape next to `keys`/`values`
+   (`[unsafe_outside_of_for, nodiscard]`); das2rst "Containers" group; tests
+   (`tests/language/table_each_kv.das` + `failed_` can_copy compile-fail); INTERP/AOT/JIT.
+   Standalone value: a kv iterator for plain `for` loops.
+2. **`TableAdapter` core (`daslib/linq_fold_table.das`) + m7.** `extract_table_source`
+   name-matches `each_kv`/`keys`/`values` at the spine head, **type-gated on the arg being a
+   table** (names too generic to trust bare). Three lanes: keys (by value), values (by ref),
+   kv — `wrap_source_loop` emits `for (k, v in keys(t), values(t))`, `RowFieldFlattener`
+   rewrites the field reads, and **usage pruning** drops to a keys-only / values-only
+   single-iterator walk when the body touches one side (the table analog of XML field-pruning).
+   Capabilities: `can_reserve_by_length` / `supports_direct_return` = true, `count_shortcut` →
+   `length(tab)`, any/empty → `!empty(tab)`, `distinct` on keys/kv → identity (keys unique by
+   construction; values-lane distinct stays real). `can_group_by`/`can_join` = false → tier-2.
+   New `benchmarks/sql/table.das` with `<family>_m7` runners (fixture `table<int; Car>`;
+   expected values order-insensitive — slot order ≠ insertion order), results.md re-sweep,
+   linq_fold_patterns.rst rows, linq_fold.md module-layout update, fused-vs-tier-2 agreement
+   tests.
+3. **`%linq!` `from_in` arm.** `from kv in tab` → `each_kv(tab)` (table-typed value dispatch,
+   no annotation needed — like arrays); set form `from k in s` over `table<K>` → `keys(s)`.
+   linq_das.rst update.
+4. **Point-lookup folds.** `where(kv.key == X)` + terminator, X loop-invariant:
+   `any`/`contains` → `key_exists`, `first`/`first_or_default` (± trailing
+   `select(kv.value…)`) → `tab?[X]` probe, `count` → `key_exists ? 1 : 0`; set-form
+   `contains(x)` → `key_exists`. The table analog of the JSON const-key fold. m7 point-lookup
+   bench lane vs linear scan.
+5. **Join probe.** `emit_join_hook`: when srcB is `each_kv(tab)`/`keys(tab)` and the b-key
+   selector is bare `kv.key`, probe the user's table instead of building the join's internal
+   `table<key;…>`. Semantics are exactly inner-equi-join with unique B keys — which a das table
+   guarantees. Bench vs the build-side baseline.
+6. **`to_table` / `to_table_move` terminators.** Chain of `tuple<K;V>` (incl. kv elements) →
+   `table<K;V>`; chain of bare hashable K → `table<K>` set. Selector-free — key/value shaping
+   composes via a preceding `select(k => v)`, matching the existing `to_table` vocabulary over
+   tuple arrays ([builtin.das ~1664](../../daslib/builtin.das)). Tier-2 generic in linq.das +
+   fused insert-loop emit (reserve when count is known). Duplicate-key policy: das `insert`
+   semantics (last-wins), documented — not C#'s throw.
+
+End of arc: `skills/linq.md` + linq docs mention the table source.
+
+## Risks / watch items
+
+- **Mangler ICE 50609** (iterator element-const collision) — `each_kv` yields `-const` non-ref
+  tuples; the known footgun lives in iterator-typed generic params on the tier-2 side;
+  mitigation (const-qualify) is known.
+- **Lock semantics unchanged**: fused loops use the same builtin iterators as hand code —
+  mutating the table mid-chain panics exactly as today.
+- `values()` on `table<K>` already concept-asserts, so set-form `each_kv` errors cleanly for
+  free.
+
+## Deferred edges (named, not built)
+
+- **Key-as-handle deferred materialization**: for `order_by` over kv with large (copyable)
+  values, buffer `(orderKey, key)` surrogates and materialize survivors via `tab?[key]` — K
+  probes instead of N value copies. The table handle is its key; clean fit for the existing
+  4-hook surface. Revisit once m7 numbers show whether it matters.
+- Set-ops probe (`except`/`intersect` where the *other* side is a `table<K>`) — rides the
+  engine-wide set-ops edge.
+- Fused-kv-over-non-copyable values (loosening the uniform gate) — only if a real use case
+  begs.
+- Dim-array-valued tables (`table<K; V[N]>`) in `each_kv` — `keys`/`values` carry dedicated
+  overloads; add an `each_kv` one only on demand.
diff --git a/daslib/builtin.das b/daslib/builtin.das
index 0ae5b349c..389955c11 100644
--- a/daslib/builtin.das
+++ b/daslib/builtin.das
@@ -1392,6 +1392,52 @@ def values(var a : table<auto(keyT); auto(valT)[]> ==const | #) : iterator<valT[
     return <- it
 }
 
+def each_kv(a : table<auto(keyT)> ==const | #) {
+    concept_assert(false, "can't each_kv a table<...; void> — iterate keys() instead")
+}
+
+def each_kv(var a : table<auto(keyT)> ==const | #) {
+    concept_assert(false, "can't each_kv a table<...; void> — iterate keys() instead")
+}
+
+def each_kv(a : table<auto(keyT); auto(valT)[]> ==const | #) {
+    concept_assert(false, "each_kv of a table with dim-array values is not supported")
+}
+
+def each_kv(var a : table<auto(keyT); auto(valT)[]> ==const | #) {
+    concept_assert(false, "each_kv of a table with dim-array values is not supported")
+}
+
+[unsafe_outside_of_for, nodiscard]
+def each_kv(a : table<auto(keyT); auto(valT)> ==const | #) : iterator<tuple<key : keyT; value : valT> -const> {
+    //! Iterates over a table as `(key, value)` named tuples. Both fields are copies (read-only view);
+    //! requires a copyable value type — non-copyable values (arrays, tables) are rejected at compile time.
+    concept_assert(typeinfo can_copy(type<valT>), "each_kv requires a copyable value type")
+    var kit <- unsafe(keys(a))
+    var vit <- unsafe(values(a))
+    return <- generator<tuple<key : keyT; value : valT> -const> capture(<- kit, <- vit) {
+        for (k, v in kit, vit) {
+            yield (key = k, value = v)
+        }
+        return false
+    }
+}
+
+[unsafe_outside_of_for, nodiscard]
+def each_kv(var a : table<auto(keyT); auto(valT)> ==const | #) : iterator<tuple<key : keyT; value : valT> -const> {
+    //! Iterates over a table as `(key, value)` named tuples. Both fields are copies (read-only view);
+    //! requires a copyable value type — non-copyable values (arrays, tables) are rejected at compile time.
+    concept_assert(typeinfo can_copy(type<valT>), "each_kv requires a copyable value type")
+    var kit <- unsafe(keys(a))
+    var vit <- unsafe(values(a))
+    return <- generator<tuple<key : keyT; value : valT> -const> capture(<- kit, <- vit) {
+        for (k, v in kit, vit) {
+            yield (key = k, value = v)
+        }
+        return false
+    }
+}
+
 def get_key(a : table<auto(keyT)> ==const; value) {
     concept_assert(false, "can't get_key of a table<...; void>")
 }
diff --git a/doc/reflections/das2rst.das b/doc/reflections/das2rst.das
index 8c2e94975..80400e330 100644
--- a/doc/reflections/das2rst.das
+++ b/doc/reflections/das2rst.das
@@ -161,7 +161,7 @@ def document_module_builtin(_root : string) {
         hide_group(group_by_regex("Internal pointer arithmetics", mod, %regex~i_das_%%)),
         hide_group(group_by_regex("Internal clone infrastructure", mod, %regex~clone%%)),
         hide_group(group_by_regex("Internal finalize infrastructure", mod, %regex~finalize%%)),
-        group_by_regex("Containers", mod, %regex~(capacity|clear|length|resize|resize_no_init|reserve|each|emplace|emplace_from|erase|find|
+        group_by_regex("Containers", mod, %regex~(capacity|clear|length|resize|resize_no_init|reserve|each|each_kv|emplace|emplace_from|erase|find|
 find_for_edit|find_if_exists|find_index|find_index_if|has_value|key_exists|keys|values|get_key|lock|each_enum|each_ref|
 find_for_edit_if_exists|lock_forever|next|nothing|pop|push|push_from|push_clone|push_clone_from|back|sort|stable_sort|to_array|to_table|to_array_move|
 to_table_move|empty|subarray|insert|move_to_ref|copy_to_local|move_to_local|get|remove_value|erase_if|resize_and_init|
diff --git a/doc/source/stdlib/handmade/function-builtin-each_kv-0xdb81e5ca7a0e3baa.rst b/doc/source/stdlib/handmade/function-builtin-each_kv-0xdb81e5ca7a0e3baa.rst
new file mode 100644
index 000000000..bc34a40cd
--- /dev/null
+++ b/doc/source/stdlib/handmade/function-builtin-each_kv-0xdb81e5ca7a0e3baa.rst
@@ -0,0 +1 @@
+Iterates over a table as ``(key, value)`` named tuples. Both fields are copies — a read-only view in unspecified (slot) order; the value type must be copyable, and non-copyable values (arrays, tables) are rejected at compile time.
diff --git a/src/ast/ast_generate.cpp b/src/ast/ast_generate.cpp
index 9325f3f03..55c934521 100644
--- a/src/ast/ast_generate.cpp
+++ b/src/ast/ast_generate.cpp
@@ -1631,13 +1631,18 @@ namespace das {
             vvar->init = rein;
             veqt->variables.push_back(vvar);
             blk->list.push_back(veqt);
-            // loop &= _builtin_iterator_first(it0,pvar0)
+            // loop = _builtin_iterator_first(it0,pvar0) && loop
+            // first() on the LEFT so it runs for EVERY source even when an earlier one came up
+            // empty (matches SimNode_ForWithIterator) — end_loop closes all sources, and closing
+            // a never-opened container iterator unlocks a container whose lock was never taken.
             auto cbif = new ExprCall(expr->at, "_builtin_iterator_first");
             cbif->generated = true;
             cbif->arguments.push_back(new ExprVar(expr->at, srcName));
             cbif->arguments.push_back(new ExprVar(expr->at, pVarName));
-            auto lande = new ExprOp2(expr->at,"&&=",
-                                              new ExprVar(expr->at,loopVar),cbif);
+            auto land = new ExprOp2(expr->at,"&&",
+                                              cbif,new ExprVar(expr->at,loopVar));
+            auto lande = new ExprCopy(expr->at,
+                                              new ExprVar(expr->at,loopVar),land);
             blk->list.push_back(lande);
         }
         auto bll = new ExprLabel(expr->at, begin_loop_label,
diff --git a/tests/language/failed_each_kv.das b/tests/language/failed_each_kv.das
new file mode 100644
index 000000000..f151fe0b7
--- /dev/null
+++ b/tests/language/failed_each_kv.das
@@ -0,0 +1,23 @@
+// each_kv compile-time rejections: non-copyable value type, void values (set form — use keys()),
+// and dim-array values (no dedicated overload). One statement per reject; the void/dim arms
+// cascade the standard "void iteration" pair (30192/30107) behind the concept_assert.
+options gen2
+
+expect 31400:3, 30192:2, 30107:2
+
+[export]
+def main() {
+    var nonCopyable : table<int; array<int>>
+    var n = 0
+    for (kv in each_kv(nonCopyable)) {
+        n++
+    }
+    var voidValues : table<int>
+    for (kv in each_kv(voidValues)) {
+        n++
+    }
+    var dimValues : table<int; int[3]>
+    for (kv in each_kv(dimValues)) {
+        n++
+    }
+}
diff --git a/tests/language/generator_zip_empty.das b/tests/language/generator_zip_empty.das
new file mode 100644
index 000000000..6db1ca3f2
--- /dev/null
+++ b/tests/language/generator_zip_empty.das
@@ -0,0 +1,65 @@
+options gen2
+
+require dastest/testing_boost public
+
+// Regression: the generator for-loop lowering emitted `loop &&= _builtin_iterator_first(...)`,
+// short-circuiting first() on later sources when an earlier one came up empty — but the
+// end-of-loop path closes ALL sources, and closing a never-opened container iterator unlocks
+// a container whose lock magic was already cleared ("magic mismatch on unlock").
+// first() must run for every source, matching SimNode_ForWithIterator.
+
+[test]
+def test_generator_zip_empty_source(t : T?) {
+    t |> run("two array iterators, first empty") @(t : T?) {
+        let a : array<int>
+        var b : array<int>
+        b |> push(1)
+        var ait <- unsafe(each(a))
+        var bit <- unsafe(each(b))
+        var g <- generator<int -const> capture(<- ait, <- bit) {
+            for (x, y in ait, bit) {
+                yield x + y
+            }
+            return false
+        }
+        var n = 0
+        for (_v in g) {
+            n++
+        }
+        t |> equal(n, 0)
+    }
+    t |> run("two array iterators, second empty") @(t : T?) {
+        var a : array<int>
+        let b : array<int>
+        a |> push(1)
+        var ait <- unsafe(each(a))
+        var bit <- unsafe(each(b))
+        var g <- generator<int -const> capture(<- ait, <- bit) {
+            for (x, y in ait, bit) {
+                yield x + y
+            }
+            return false
+        }
+        var n = 0
+        for (_v in g) {
+            n++
+        }
+        t |> equal(n, 0)
+    }
+    t |> run("two table iterators, table empty") @(t : T?) {
+        let tab : table<int; int>
+        var kit <- unsafe(keys(tab))
+        var vit <- unsafe(values(tab))
+        var g <- generator<tuple<k : int; v : int> -const> capture(<- kit, <- vit) {
+            for (k, v in kit, vit) {
+                yield (k = k, v = v)
+            }
+            return false
+        }
+        var n = 0
+        for (_kv in g) {
+            n++
+        }
+        t |> equal(n, 0)
+    }
+}
diff --git a/tests/language/table_each_kv.das b/tests/language/table_each_kv.das
new file mode 100644
index 000000000..5e7529130
--- /dev/null
+++ b/tests/language/table_each_kv.das
@@ -0,0 +1,84 @@
+options gen2
+
+require dastest/testing_boost public
+
+struct Pt {
+    x : int
+    y : int
+}
+
+[test]
+def test_each_kv(t : T?) {
+    t |> run("int keys, int values") @(t : T?) {
+        var tab : table<int; int>
+        for (i in range(10)) {
+            tab |> insert(i, i * 10)
+        }
+        var n = 0
+        var ksum = 0
+        var vsum = 0
+        for (kv in each_kv(tab)) {
+            n++
+            ksum += kv.key
+            vsum += kv.value
+        }
+        t |> equal(n, 10)
+        t |> equal(ksum, 45)
+        t |> equal(vsum, 450)
+    }
+    t |> run("string keys, float values") @(t : T?) {
+        var tab : table<string; float>
+        tab |> insert("a", 1.5)
+        tab |> insert("b", 2.5)
+        var s = 0.0
+        for (kv in each_kv(tab)) {
+            s += kv.value
+        }
+        t |> equal(s, 4.0)
+    }
+    t |> run("struct values") @(t : T?) {
+        var tab : table<int; Pt>
+        tab |> insert(1, Pt(x = 10, y = 20))
+        tab |> insert(2, Pt(x = 30, y = 40))
+        var xs = 0
+        var ys = 0
+        for (kv in each_kv(tab)) {
+            xs += kv.value.x
+            ys += kv.value.y
+        }
+        t |> equal(xs, 40)
+        t |> equal(ys, 60)
+    }
+    t |> run("empty table yields nothing") @(t : T?) {
+        let tab : table<int; int>
+        var n = 0
+        for (_kv in each_kv(tab)) {
+            n++
+        }
+        t |> equal(n, 0)
+    }
+    t |> run("agrees with zipped keys/values") @(t : T?) {
+        var tab : table<int; string>
+        tab |> insert(1, "one")
+        tab |> insert(2, "two")
+        tab |> insert(3, "three")
+        var roundTrip : table<int; string>
+        for (kv in each_kv(tab)) {
+            roundTrip |> insert(kv.key, kv.value)
+        }
+        t |> equal(length(roundTrip), length(tab))
+        for (k, v in keys(tab), values(tab)) {
+            t |> success(key_exists(roundTrip, k))
+            t |> equal(roundTrip?[k] ?? "", v)
+        }
+    }
+    t |> run("element is a copy — table unchanged") @(t : T?) {
+        var tab : table<int; Pt>
+        tab |> insert(1, Pt(x = 10, y = 20))
+        for (kv in each_kv(tab)) {
+            var local = kv
+            local.value.x = 999
+        }
+        t |> equal((tab?[1] ?? Pt()).x, 10)
+    }
+}
diff --git a/tests/linq/test_linq_table_source.das b/tests/linq/test_linq_table_source.das
new file mode 100644
index 000000000..e85407216
--- /dev/null
+++ b/tests/linq/test_linq_table_source.das
@@ -0,0 +1,32 @@
+options gen2
+
+require dastest/testing_boost public
+require daslib/linq_boost
+
+// Tier-2 LINQ over a table source via each_kv (the fused TableAdapter lands separately).
+// each_kv is [unsafe_outside_of_for], so a chain head needs the explicit unsafe(...) wrap —
+// same contract as each(arr) outside a fused chain.
+
+[test]
+def test_each_kv_tier2(t : T?) {
+    t |> run("where/select/to_array over each_kv") @(t : T?) {
+        var tab : table<string; int>
+        tab |> insert("a", 1)
+        tab |> insert("b", 2)
+        tab |> insert("c", 3)
+        var vals <- unsafe(each_kv(tab)) |> _where(_.value > 1) |> _select(_.value) |> to_array()
+        vals |> sort()  // slot order is unspecified
+        t |> equal(length(vals), 2)
+        t |> equal(vals[0], 2)
+        t |> equal(vals[1], 3)
+        delete vals
+    }
+    t |> run("keys participate in the chain") @(t : T?) {
+        var tab : table<int; int>
+        for (i in range(6)) {
+            tab |> insert(i, i * i)
+        }
+        let n = unsafe(each_kv(tab)) |> _where(_.key % 2 == 0) |> count()
+        t |> equal(n, 3)
+    }
+}

From 7b93056d7552cd3a614e5f988f169dc92f81ffce Mon Sep 17 00:00:00 2001
From: Boris Batkin <bbatkin@gmail.com>
Date: Wed, 10 Jun 2026 23:11:25 -0700
Subject: [PATCH 02/11] linq-table arc: whole story stays on this branch; PR
 after the fixed-array rework merges in

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 benchmarks/sql/LINQ_TO_TABLE.md | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/benchmarks/sql/LINQ_TO_TABLE.md b/benchmarks/sql/LINQ_TO_TABLE.md
index 579356b8b..7206bda75 100644
--- a/benchmarks/sql/LINQ_TO_TABLE.md
+++ b/benchmarks/sql/LINQ_TO_TABLE.md
@@ -4,7 +4,14 @@ Sibling of [LINQ.md](LINQ.md) / [LINQ_TO_DECS.md](LINQ_TO_DECS.md). Plan of reco
 `table<K;V>` / `table<K>` as the 6th `_fold` source, plus the `to_table` sink.
 Edited in-place as PRs land.
 
-Status: **PR1 in flight** (`each_kv` builtin).
+Status: **stage 1 committed** (`each_kv` builtin, 8751bb9ba).
+
+**Branch strategy (Boris, 2026-06-10):** the ENTIRE arc stays on `bbatkin/linq-table-each-kv`
+as stacked stage commits — no per-stage PRs. A major fixed-array rework is in flight on master;
+merging that INTO this branch once (after it lands) beats making every rework merge fight this
+work. Cut the PR only after the rework has landed and been merged in here. At that merge,
+re-validate the `each_kv` dim-array-value reject overload and `auto(valT)[]` matching — fixed
+arrays are exactly what is being reworked.
 
 PR1 findings:
 - **Pre-existing generator-lowering bug, fixed in PR1**: the yield-for lowering emitted
@@ -43,7 +50,7 @@ PR1 findings:
 - No profiling pre-PR; straight to m7 bench lanes. Scan lanes before the join probe. Sink in
   this arc.
 
-## PR sequence
+## Stage sequence (commits on this branch)
 
 1. **`each_kv` in builtin.das** — the validated shape next to `keys`/`values`
    (`[unsafe_outside_of_for, nodiscard]`); das2rst "Containers" group; tests

From 571fe879e5a4d487d885fbea92987e1a6e14b266 Mon Sep 17 00:00:00 2001
From: Boris Batkin <bbatkin@gmail.com>
Date: Wed, 10 Jun 2026 23:58:44 -0700
Subject: [PATCH 03/11] =?UTF-8?q?linq=5Ffold:=20TableAdapter=20=E2=80=94?=
 =?UTF-8?q?=20table<K;V>/table<K>=20as=20the=206th=20source,=20m7=20bench?=
 =?UTF-8?q?=20lane?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

daslib/linq_fold_table.das: TableAdapter + extract_table_source. each_kv/keys/values chain heads
(name + table-typed-arg match) emit fused slot walks inside a single-param invoke binding the
table. The kv lane usage-prunes the walk from the body's it.key/it.value reads: one side touched
-> single-iterator keys()/values() walk (half the slot-skip work), both -> zipped two-iterator
for, whole-pair escape -> named-tuple bind (copyable values only; non-copyable falls through and
the surviving each_kv instantiation concept-asserts). Bare count/long_count folds to O(1)
length(tab); plain distinct over raw keys/kv elements is dropped (keys unique by construction;
uniqueness-preserving prefixes only). group_by/join/reverse defer to tier-2 (staged: point-lookup
folds, join probe — see benchmarks/sql/LINQ_TO_TABLE.md).

Notable mechanics: the qmacro grammar allows $i() only in the FIRST iterator slot of a
multi-source for, so the kv zip header uses literal loop-var names (ZipAdapter's itA/itB trade);
keys() yields non-const elements, so the engine-visible bind is a let-rebind (workhorse copy,
free); the dispatcher clears removeConstant on cloned element types so the -const iterator
spelling doesn't leak into buffer types and break push_clone unification.

benchmarks/sql/table.das: m7 lane (45 families, kv-form chains, order-insensitive guards) +
fixture_table in _common + m7 column in _update_results + results.md re-sweep (2026-06-10).
INTERP profile: pruned scans sit between array and XML (sum_aggregate 13.4 ns/elem vs array 2.1 /
XML 54.3 / JSON 146.7; contains_match 6.6 keys-pruned); deferred markers groupby ~160-190 /
join ~195-230 / reverse_take 58.7 flag the staged tier-2 cells.

tests/linq/test_linq_table_source.das: 24 fused-vs-hand-loop agreement tests across all lanes
(count shortcuts, accumulators, early-exit, to_array slot-order, order/distinct/take, dropped-
distinct correctness, values-distinct stays real, iterator-typed result, set form, tier-2 heads).
Docs: linq_fold_patterns.rst source row, linq_fold.md layout, LINQ_TO_TABLE.md findings.

Validation: full INTERP suite 10912/0 fail; AOT tests/linq 1914; JIT lane green; MCP + CI lint
clean; Sphinx clean; full 6-lane bench sweep regenerated.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 benchmarks/sql/LINQ_TO_TABLE.md             |  18 +-
 benchmarks/sql/_common.das                  |  18 +
 benchmarks/sql/_update_results.das          |   4 +-
 benchmarks/sql/results.md                   | 328 ++++++-----
 benchmarks/sql/table.das                    | 620 ++++++++++++++++++++
 daslib/linq_fold.das                        |  23 +
 daslib/linq_fold.md                         |   2 +-
 daslib/linq_fold_table.das                  | 212 +++++++
 doc/source/reference/linq_fold_patterns.rst |   3 +
 tests/linq/test_linq_table_source.das       | 207 ++++++-
 10 files changed, 1266 insertions(+), 169 deletions(-)
 create mode 100644 benchmarks/sql/table.das
 create mode 100644 daslib/linq_fold_table.das

diff --git a/benchmarks/sql/LINQ_TO_TABLE.md b/benchmarks/sql/LINQ_TO_TABLE.md
index 7206bda75..72acd2be6 100644
--- a/benchmarks/sql/LINQ_TO_TABLE.md
+++ b/benchmarks/sql/LINQ_TO_TABLE.md
@@ -4,7 +4,23 @@ Sibling of [LINQ.md](LINQ.md) / [LINQ_TO_DECS.md](LINQ_TO_DECS.md). Plan of reco
 `table<K;V>` / `table<K>` as the 6th `_fold` source, plus the `to_table` sink.
 Edited in-place as PRs land.
 
-Status: **stage 1 committed** (`each_kv` builtin, 8751bb9ba).
+Status: **stage 2 committed** (TableAdapter + m7; stage 1 = `each_kv` builtin, 8751bb9ba).
+
+Stage 2 findings:
+- m7 INTERP profile (2026-06-10 sweep): pruned scans sit between array and XML — `sum_aggregate`
+  13.4 ns/elem (array 2.1, XML 54.3, JSON 146.7), `contains_match` 6.6 via the keys-pruned walk,
+  pure-select `count` hits the O(1) shortcut (0.0). Deferred markers: `groupby_count` 162.6 /
+  `groupby_sum` 192.8 / `join_count` 195.0 / `join_where_count` 229.1 / `reverse_take` 58.7 —
+  the tier-2 cells stages 4–5 erase.
+- The qmacro grammar only allows `$i()` in the FIRST iterator slot of a multi-source `for` — the
+  kv zip header uses literal `_tab_kv_key_` / `_tab_kv_value_` names (ZipAdapter's itA/itB trade).
+- `keys()` yields NON-const elements (writable temp copies) — the engine-visible bind is a `let`
+  rebind (workhorse copy, free); push_clone's `==const` composition needs it.
+- `keys`/`each_kv` spell their element `-const` (iterator variance); the dispatcher clears
+  `removeConstant` on the cloned types or `array<tuple<…> -const>` buffer spellings break
+  push_clone unification.
+- Bare `<src>.to_array()` is not a recognized chain for ANY source (only suffix variants like
+  `where_to_array` exist) — a keys-snapshot needs an op in the chain. Shared engine edge.
 
 **Branch strategy (Boris, 2026-06-10):** the ENTIRE arc stays on `bbatkin/linq-table-each-kv`
 as stacked stage commits — no per-stage PRs. A major fixed-array rework is in flight on master;
diff --git a/benchmarks/sql/_common.das b/benchmarks/sql/_common.das
index 587c47ec6..657469dc8 100644
--- a/benchmarks/sql/_common.das
+++ b/benchmarks/sql/_common.das
@@ -92,6 +92,24 @@ def public fixture_json(n : int) : JsonValue? {
     return JV([for (c in fixture_array(n)); JV(c)])
 }
 
+// Table fold lane (m7): same Car schema keyed by id in a table<int; Car>. Same deterministic row
+// generator as fixture_array so the table lane is directly comparable to the array (m3f) lane; table
+// slot order is unspecified, so m7 expectations stay order-insensitive (aggregates / counts).
+def public fixture_table(n : int) : table<int; Car> {
+    var t <- {
+        for (i in range(n));
+        i + 1 => Car(
+            id = i + 1,
+            name = "Car{i}",
+            price = (i * 37) % 1000,
+            brand = i % BRAND_COUNT,
+            year = 2010 + (i * 7) % 16,
+            dealer_id = (i % DEALER_COUNT) + 1
+        )
+    }
+    return <- t
+}
+
 def public fixture_dealers_array() : array<Dealer> {
     var arr : array<Dealer>
     arr |> resize(DEALER_COUNT)
diff --git a/benchmarks/sql/_update_results.das b/benchmarks/sql/_update_results.das
index b6cb2569e..4017c43d4 100644
--- a/benchmarks/sql/_update_results.das
+++ b/benchmarks/sql/_update_results.das
@@ -46,8 +46,8 @@ struct Config {
     help : bool
 }
 
-let LANES = ["m1", "m3f", "m4", "m5f", "m6f"]
-let HEADERS = ["SQL (m1)", "Array (m3f)", "Decs (m4)", "XML fold (m5f)", "JSON fold (m6f)"]
+let LANES = ["m1", "m3f", "m4", "m5f", "m6f", "m7"]
+let HEADERS = ["SQL (m1)", "Array (m3f)", "Decs (m4)", "XML fold (m5f)", "JSON fold (m6f)", "Table fold (m7)"]
 let BEGIN_MARKER = "<!-- BENCH:TABLES BEGIN -->"
 let END_MARKER = "<!-- BENCH:TABLES END -->"
 
diff --git a/benchmarks/sql/results.md b/benchmarks/sql/results.md
index 73614dec5..519254b0b 100644
--- a/benchmarks/sql/results.md
+++ b/benchmarks/sql/results.md
@@ -1,19 +1,22 @@
-# Benchmarks — SQL / Array / Decs / XML / JSON comparison
+# Benchmarks — SQL / Array / Decs / XML / JSON / Table comparison
 
-Five lanes run the same query families over one `Car` schema (n = 100 000 cars, 100 dealers,
+Six lanes run the same query families over one `Car` schema (n = 100 000 cars, 100 dealers,
 5 brands); cells are ns/op, `—` = intentionally absent lane (see "Missing lanes"). The tables
 between the `BENCH:TABLES` markers are machine-generated (see "How to re-run"); all other text
 is hand-edited.
 
-Each lane lives in its own file (`array.das` / `decs.das` / `xml.das` / `json.das` / `sql.das`) with
-the source fixture built once in `[init]`; the sweep runs one process per file, so a lane is never
-contaminated by another lane's code in the same process (this is why JIT cells are stable now).
+Each lane lives in its own file (`array.das` / `decs.das` / `xml.das` / `json.das` / `sql.das` /
+`table.das`) with the source fixture built once in `[init]`; the sweep runs one process per file,
+so a lane is never contaminated by another lane's code in the same process (this is why JIT cells
+are stable now).
 
 - **m1 SQL** — `_fold(db |> select_from(type<Car>) |> …)` over in-memory SQLite; `_fold` passes the chain to `_sql`.
 - **m3f Array** — `_fold` over `each(array<Car>)`.
 - **m4 Decs** — `_fold` over `from_decs_template(type<DecsCar>)` (per-archetype walk).
 - **m5f XML** — `_fold` over `from_xml_node(root, type<Car>)` (`XmlAdapter` fuses + field-prunes).
 - **m6f JSON** — `_fold` over `from_json(jv, type<Car>)` (`JsonAdapter`, same machinery, array walk).
+- **m7 Table** — `_fold` over `each_kv(table<int; Car>)` (`TableAdapter`; kv usage-pruning picks keys-only /
+  values-only / zipped slot walks; group_by / join / reverse defer to tier-2 until their stages land).
 
 `0.00` = early-exit terminator below timer resolution ("free"). Chain shapes are in
 `benchmarks/README.md`; the splice arms each fires are in `doc/source/reference/linq_fold_patterns.rst`.
@@ -22,169 +25,169 @@ contaminated by another lane's code in the same process (this is why JIT cells a
 signal, JIT deltas as indicative.**
 
 <!-- BENCH:TABLES BEGIN -->
-*Generated 2026-06-06 by `benchmarks/sql/_update_results.das` — ns/op; `—` = absent lane. Edit the prose around the markers, not the tables.*
+*Generated 2026-06-10 by `benchmarks/sql/_update_results.das` — ns/op; `—` = absent lane. Edit the prose around the markers, not the tables.*
 
 ## INTERP
 
-| Benchmark | SQL (m1) | Array (m3f) | Decs (m4) | XML fold (m5f) | JSON fold (m6f) |
-|---|---:|---:|---:|---:|---:|
-| `aggregate_match` | 35.1 | 5.9 | 5.9 | 60.9 | 159.9 |
-| `all_match` | 28.0 | 3.5 | 3.4 | 56.6 | 154.0 |
-| `any_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
-| `average_aggregate` | 30.7 | 5.9 | 8.8 | 60.4 | 164.6 |
-| `bare_last` | — | 4.2 | 0.0 | 0.0 | 4.2 |
-| `bare_order_where` | 283.1 | 116.2 | 128.0 | 304.3 | 288.6 |
-| `chained_select_collapse` | — | 17.8 | 17.8 | 70.4 | 162.7 |
-| `chained_where` | 37.0 | 6.6 | 7.1 | 104.8 | 185.2 |
-| `contains_match` | 0.0 | 2.3 | 1.5 | 29.1 | 73.0 |
-| `count_aggregate` | 29.9 | 4.1 | 4.2 | 64.1 | 154.6 |
-| `cross_join` | 12610.5 | 3738.5 | — | 4039.6 | 4042.5 |
-| `decs_count_bare_pred` | — | — | 4.2 | — | — |
-| `distinct_by_count` | 41.7 | 16.1 | 16.0 | 70.7 | 163.5 |
-| `distinct_by_order_take` | 240.1 | 22.0 | 23.3 | 123.9 | 161.9 |
-| `distinct_by_order_to_array` | 242.0 | 22.3 | 23.4 | 124.4 | 162.7 |
-| `distinct_count` | 41.6 | 15.8 | 15.8 | 71.2 | 161.8 |
-| `distinct_count_pred` | 252.2 | 15.8 | 15.9 | 112.2 | 178.7 |
-| `distinct_take` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
-| `element_at_match` | 0.0 | 0.0 | 0.0 | 0.4 | 0.3 |
-| `first_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
-| `first_or_default_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
-| `groupby_average` | 173.2 | 29.5 | 29.4 | 122.6 | 195.2 |
-| `groupby_count` | 142.1 | 19.2 | 19.2 | 74.5 | 167.8 |
-| `groupby_first` | 252.4 | 19.1 | 19.8 | 71.7 | 163.3 |
-| `groupby_having_count` | 141.9 | 19.1 | 19.2 | 74.2 | 167.7 |
-| `groupby_having_hidden_sum` | 176.5 | 22.6 | 22.8 | 118.4 | 192.1 |
-| `groupby_having_post_where` | 171.4 | 19.6 | 19.2 | 114.6 | 188.3 |
-| `groupby_max` | 174.1 | 25.1 | 25.0 | 120.0 | 193.1 |
-| `groupby_min` | 173.7 | 25.2 | 24.9 | 120.1 | 193.1 |
-| `groupby_multi_reducer` | 190.3 | 30.9 | 30.6 | 124.8 | 196.7 |
-| `groupby_select_order` | 171.8 | 19.2 | 19.1 | 115.1 | 189.8 |
-| `groupby_select_sum` | 201.8 | 38.5 | 37.9 | 102.8 | 195.0 |
-| `groupby_sum` | 172.7 | 19.1 | 19.1 | 115.3 | 188.2 |
-| `groupby_where_count` | 76.3 | 13.9 | 14.2 | 116.0 | 186.3 |
-| `groupby_where_sum` | 87.7 | 14.0 | 14.5 | 116.3 | 186.7 |
-| `join_count` | 38.7 | 51.5 | 64.2 | 113.9 | 183.3 |
-| `join_groupby_count` | 156.9 | 77.3 | 89.9 | 178.0 | 230.6 |
-| `join_groupby_to_array` | 190.3 | 78.4 | 90.8 | 215.4 | 212.7 |
-| `join_select` | 150.2 | 72.6 | 85.0 | 189.2 | 215.2 |
-| `join_where_count` | 40.1 | 61.4 | 76.7 | 162.3 | 199.0 |
-| `last_match` | 0.0 | 5.8 | 13.8 | 65.5 | 160.3 |
-| `long_count_aggregate` | 29.9 | 4.2 | 4.1 | 64.0 | 154.6 |
-| `max_aggregate` | 31.5 | 6.0 | 6.7 | 58.9 | 163.7 |
-| `min_aggregate` | 31.2 | 6.0 | 6.8 | 59.2 | 162.9 |
-| `order_by_multi_key` | 345.5 | 281.8 | 285.5 | 460.6 | 445.5 |
-| `order_distinct_take` | 138.4 | 15.7 | 100.4 | 72.9 | 163.4 |
-| `order_reverse_normalized` | 38.6 | 16.3 | 20.0 | 70.2 | 170.3 |
-| `order_take_desc` | 38.5 | 16.2 | 20.0 | 70.1 | 171.7 |
-| `reverse_distinct_by` | 296.5 | 21.3 | 27.6 | 70.9 | 162.9 |
-| `reverse_take` | 0.1 | 0.0 | 0.2 | 0.0 | 26.0 |
-| `reverse_take_select` | 0.0 | 0.0 | 0.2 | 0.0 | 26.1 |
-| `select_count` | 0.1 | 0.0 | 2.2 | 69.7 | 2.2 |
-| `select_many` | — | 191.3 | — | — | — |
-| `select_where` | 196.2 | 11.2 | 19.4 | 196.7 | 183.3 |
-| `select_where_count` | 33.0 | 5.8 | 7.4 | 65.0 | 157.8 |
-| `select_where_order_take` | 37.3 | 12.3 | 14.9 | 72.8 | 167.2 |
-| `select_where_sum` | 37.3 | 7.4 | 7.4 | 66.8 | 163.2 |
-| `single_match` | 0.0 | 2.8 | 5.5 | 58.6 | 155.5 |
-| `skip_take` | 0.5 | 0.1 | 0.2 | 3.0 | 2.8 |
-| `skip_while_match` | 3.5 | 5.3 | 5.3 | 60.7 | 153.9 |
-| `sort_first` | 38.5 | 11.1 | 13.3 | 64.8 | 168.2 |
-| `sort_take` | 38.7 | 16.3 | 20.2 | 70.9 | 170.8 |
-| `sort_take_select` | 38.5 | 16.3 | 20.7 | 71.3 | 171.0 |
-| `sum_aggregate` | 30.3 | 2.1 | 2.1 | 54.4 | 153.3 |
-| `sum_where` | 33.1 | 4.4 | 4.3 | 64.2 | 154.7 |
-| `take_count` | 3.7 | 0.2 | 0.4 | 2.9 | 2.7 |
-| `take_count_filtered` | 1.1 | 0.2 | 0.2 | 1.3 | 1.1 |
-| `take_sum_aggregate` | 0.8 | 0.1 | 0.1 | 0.6 | 0.5 |
-| `take_where_count` | 0.9 | 0.1 | 0.1 | 0.7 | 0.6 |
-| `take_while_match` | 7.9 | 2.4 | 2.4 | 30.3 | 77.7 |
-| `to_array_filter` | 69.8 | 11.7 | 12.1 | 71.9 | 165.4 |
-| `where_join_count` | 39.7 | 29.2 | 41.8 | 133.1 | 168.8 |
-| `zip_count_pred` | 39.3 | 15.8 | — | 315.5 | 317.1 |
-| `zip_dot_product` | 46.9 | 12.7 | 10.7 | 317.9 | 314.0 |
-| `zip_dot_product_3arg` | 46.7 | 12.6 | — | 310.7 | 314.2 |
-| `zip_reverse_to_array` | — | 31.8 | — | 344.1 | 349.6 |
+| Benchmark | SQL (m1) | Array (m3f) | Decs (m4) | XML fold (m5f) | JSON fold (m6f) | Table fold (m7) |
+|---|---:|---:|---:|---:|---:|---:|
+| `aggregate_match` | 34.7 | 5.9 | 5.8 | 60.1 | 152.3 | 19.0 |
+| `all_match` | 27.3 | 3.5 | 3.4 | 55.6 | 147.0 | 15.8 |
+| `any_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
+| `average_aggregate` | 29.8 | 5.9 | 8.8 | 58.3 | 156.2 | 17.2 |
+| `bare_last` | — | 4.2 | 0.0 | 0.0 | 4.2 | 29.2 |
+| `bare_order_where` | 277.1 | 118.1 | 126.8 | 300.9 | 292.2 | 166.4 |
+| `chained_select_collapse` | — | 17.7 | 17.4 | 70.1 | 155.4 | 27.8 |
+| `chained_where` | 35.8 | 6.6 | 7.1 | 104.2 | 174.7 | 24.1 |
+| `contains_match` | 0.0 | 2.2 | 1.4 | 27.5 | 68.5 | 6.6 |
+| `count_aggregate` | 29.2 | 4.1 | 4.1 | 63.4 | 147.5 | 20.2 |
+| `cross_join` | 13122.7 | 3685.9 | — | 3995.6 | 4066.2 | — |
+| `decs_count_bare_pred` | — | — | 4.1 | — | — | — |
+| `distinct_by_count` | 40.8 | 15.6 | 15.6 | 70.2 | 154.0 | 26.4 |
+| `distinct_by_order_take` | 240.7 | 22.1 | 23.4 | 122.7 | 161.6 | 48.5 |
+| `distinct_by_order_to_array` | 239.2 | 22.2 | 23.5 | 123.6 | 161.7 | 48.4 |
+| `distinct_count` | 40.7 | 15.9 | 15.7 | 70.5 | 155.8 | 26.9 |
+| `distinct_count_pred` | 251.0 | 16.1 | 15.8 | 111.5 | 178.0 | 26.3 |
+| `distinct_take` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
+| `element_at_match` | 0.0 | 0.0 | 0.0 | 0.4 | 0.3 | 0.0 |
+| `first_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
+| `first_or_default_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
+| `groupby_average` | 173.3 | 29.3 | 29.3 | 122.9 | 190.0 | — |
+| `groupby_count` | 143.5 | 19.4 | 19.4 | 75.4 | 161.0 | 162.6 |
+| `groupby_first` | 251.7 | 19.5 | 20.1 | 72.1 | 156.9 | — |
+| `groupby_having_count` | 140.7 | 19.5 | 19.5 | 74.7 | 161.2 | — |
+| `groupby_having_hidden_sum` | 176.1 | 22.5 | 22.6 | 118.0 | 183.5 | — |
+| `groupby_having_post_where` | 172.8 | 20.8 | 20.8 | 114.1 | 180.4 | — |
+| `groupby_max` | 173.5 | 24.8 | 25.3 | 119.7 | 185.2 | — |
+| `groupby_min` | 173.8 | 25.2 | 25.1 | 119.8 | 184.7 | — |
+| `groupby_multi_reducer` | 189.5 | 30.5 | 30.6 | 124.3 | 188.4 | — |
+| `groupby_select_order` | 169.9 | 20.8 | 20.8 | 114.3 | 180.9 | — |
+| `groupby_select_sum` | 196.9 | 38.6 | 38.1 | 101.6 | 186.6 | — |
+| `groupby_sum` | 170.5 | 21.2 | 20.8 | 114.4 | 180.2 | 192.8 |
+| `groupby_where_count` | 75.6 | 14.1 | 14.3 | 115.2 | 177.8 | — |
+| `groupby_where_sum` | 86.4 | 14.1 | 14.6 | 116.2 | 178.1 | — |
+| `join_count` | 38.0 | 51.2 | 64.2 | 112.7 | 176.9 | 195.0 |
+| `join_groupby_count` | 157.7 | 86.1 | 88.2 | 177.4 | 221.8 | — |
+| `join_groupby_to_array` | 194.9 | 80.3 | 91.7 | 214.8 | 212.1 | — |
+| `join_select` | 150.3 | 72.4 | 84.4 | 187.8 | 209.0 | — |
+| `join_where_count` | 39.0 | 61.6 | 76.7 | 159.8 | 193.6 | 229.1 |
+| `last_match` | 0.0 | 5.9 | 13.9 | 64.9 | 152.3 | 31.0 |
+| `long_count_aggregate` | 28.7 | 4.1 | 4.1 | 63.3 | 147.5 | 20.3 |
+| `max_aggregate` | 30.6 | 6.0 | 6.8 | 58.4 | 156.1 | 17.0 |
+| `min_aggregate` | 30.5 | 6.0 | 6.8 | 58.4 | 155.1 | 17.0 |
+| `order_by_multi_key` | 338.7 | 272.3 | 286.1 | 457.7 | 448.2 | 333.0 |
+| `order_distinct_take` | 138.4 | 15.9 | 99.2 | 72.4 | 156.5 | 31.0 |
+| `order_reverse_normalized` | 37.9 | 16.3 | 20.0 | 70.4 | 162.9 | — |
+| `order_take_desc` | 37.8 | 16.3 | 20.3 | 69.8 | 163.3 | 33.2 |
+| `reverse_distinct_by` | 294.1 | 21.2 | 28.0 | 70.8 | 155.4 | — |
+| `reverse_take` | 0.1 | 0.0 | 0.2 | 0.0 | 26.1 | 58.7 |
+| `reverse_take_select` | 0.0 | 0.0 | 0.2 | 0.0 | 26.1 | — |
+| `select_count` | 0.1 | 0.0 | 2.2 | 64.8 | 2.2 | 0.0 |
+| `select_many` | — | 191.0 | — | — | — | — |
+| `select_where` | 194.7 | 11.5 | 19.3 | 195.9 | 185.7 | 37.5 |
+| `select_where_count` | 32.3 | 5.1 | 7.4 | 64.6 | 150.7 | 21.8 |
+| `select_where_order_take` | 36.2 | 12.2 | 15.0 | 72.3 | 158.5 | 34.4 |
+| `select_where_sum` | 37.1 | 7.5 | 7.5 | 66.3 | 160.5 | 23.2 |
+| `single_match` | 0.0 | 2.9 | 5.5 | 56.9 | 151.1 | 22.8 |
+| `skip_take` | 0.5 | 0.1 | 0.2 | 3.0 | 2.8 | 0.3 |
+| `skip_while_match` | 3.5 | 5.3 | 5.3 | 57.3 | 146.6 | 18.2 |
+| `sort_first` | 37.6 | 11.1 | 13.3 | 64.6 | 159.5 | 31.7 |
+| `sort_take` | 38.0 | 16.2 | 20.9 | 70.2 | 161.9 | 33.0 |
+| `sort_take_select` | 37.6 | 16.3 | 20.9 | 70.8 | 162.7 | 33.3 |
+| `sum_aggregate` | 29.7 | 2.1 | 2.1 | 54.3 | 146.7 | 13.4 |
+| `sum_where` | 31.9 | 4.3 | 4.3 | 63.6 | 148.1 | 20.5 |
+| `take_count` | 3.6 | 0.2 | 0.4 | 2.9 | 2.7 | 0.5 |
+| `take_count_filtered` | 1.1 | 0.2 | 0.2 | 1.3 | 1.1 | 0.3 |
+| `take_sum_aggregate` | 0.8 | 0.1 | 0.1 | 0.6 | 0.5 | 0.1 |
+| `take_where_count` | 0.9 | 0.1 | 0.1 | 0.7 | 0.6 | 0.2 |
+| `take_while_match` | 7.8 | 2.4 | 2.4 | 28.8 | 71.4 | 16.8 |
+| `to_array_filter` | 70.3 | 11.8 | 11.7 | 71.1 | 157.4 | 28.8 |
+| `where_join_count` | 41.0 | 29.0 | 41.5 | 133.0 | 163.1 | — |
+| `zip_count_pred` | 39.0 | 15.8 | — | 313.5 | 319.6 | — |
+| `zip_dot_product` | 46.1 | 12.6 | 10.5 | 308.6 | 317.2 | — |
+| `zip_dot_product_3arg` | 46.1 | 12.8 | — | 308.7 | 316.5 | — |
+| `zip_reverse_to_array` | — | 31.6 | — | 343.1 | 351.0 | — |
 
 ## JIT
 
-| Benchmark | SQL (m1) | Array (m3f) | Decs (m4) | XML fold (m5f) | JSON fold (m6f) |
-|---|---:|---:|---:|---:|---:|
-| `aggregate_match` | 35.9 | 0.3 | 0.7 | 16.7 | 26.7 |
-| `all_match` | 27.8 | 0.3 | 0.2 | 16.6 | 25.7 |
-| `any_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
-| `average_aggregate` | 30.7 | 1.0 | 3.6 | 16.6 | 25.3 |
-| `bare_last` | — | 0.4 | 0.0 | 0.0 | 0.0 |
-| `bare_order_where` | 187.8 | 34.6 | 35.7 | 105.3 | 53.0 |
-| `chained_select_collapse` | — | 1.1 | 1.1 | 21.4 | 34.0 |
-| `chained_where` | 37.0 | 0.6 | 0.8 | 33.9 | 31.4 |
-| `contains_match` | 0.0 | 0.2 | 0.1 | 17.3 | 9.3 |
-| `count_aggregate` | 29.6 | 0.3 | 0.6 | 16.7 | 25.9 |
-| `cross_join` | 5984.0 | 751.9 | — | 833.5 | 768.9 |
-| `decs_count_bare_pred` | — | — | 0.6 | — | — |
-| `distinct_by_count` | 42.0 | 1.1 | 1.1 | 21.4 | 34.3 |
-| `distinct_by_order_take` | 238.6 | 1.7 | 2.9 | 45.9 | 40.7 |
-| `distinct_by_order_to_array` | 241.0 | 1.7 | 2.7 | 46.1 | 40.3 |
-| `distinct_count` | 41.5 | 1.1 | 1.1 | 21.4 | 33.0 |
-| `distinct_count_pred` | 253.3 | 1.1 | 1.3 | 38.6 | 45.2 |
-| `distinct_take` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
-| `element_at_match` | 0.0 | 0.0 | 0.0 | 0.1 | 0.0 |
-| `first_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
-| `first_or_default_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
-| `groupby_average` | 170.2 | 1.6 | 1.9 | 36.5 | 45.6 |
-| `groupby_count` | 141.7 | 1.4 | 1.5 | 21.8 | 34.1 |
-| `groupby_first` | 253.3 | 1.3 | 2.3 | 21.8 | 34.8 |
-| `groupby_having_count` | 140.7 | 1.3 | 1.5 | 21.5 | 34.2 |
-| `groupby_having_hidden_sum` | 175.1 | 1.5 | 1.7 | 36.4 | 45.3 |
-| `groupby_having_post_where` | 169.9 | 1.4 | 1.9 | 36.3 | 44.5 |
-| `groupby_max` | 173.3 | 1.5 | 1.9 | 36.4 | 45.9 |
-| `groupby_min` | 173.0 | 1.5 | 1.8 | 36.4 | 46.0 |
-| `groupby_multi_reducer` | 189.8 | 1.6 | 2.0 | 36.7 | 46.1 |
-| `groupby_select_order` | 170.2 | 1.4 | 1.9 | 36.3 | 44.8 |
-| `groupby_select_sum` | 198.8 | 2.8 | 3.2 | 31.3 | 40.2 |
-| `groupby_sum` | 170.7 | 1.4 | 1.6 | 36.3 | 44.2 |
-| `groupby_where_count` | 75.7 | 0.9 | 1.3 | 36.6 | 42.3 |
-| `groupby_where_sum` | 87.0 | 0.9 | 1.3 | 36.6 | 43.7 |
-| `join_count` | 39.0 | 10.8 | 11.9 | 42.9 | 75.7 |
-| `join_groupby_count` | 156.8 | 17.2 | 19.2 | 69.8 | 95.1 |
-| `join_groupby_to_array` | 190.7 | 18.3 | 20.1 | 80.7 | 37.6 |
-| `join_select` | 93.3 | 20.0 | 21.8 | 75.3 | 100.1 |
-| `join_where_count` | 39.8 | 19.0 | 20.7 | 63.1 | 81.0 |
-| `last_match` | 0.0 | 0.5 | 1.4 | 17.5 | 26.2 |
-| `long_count_aggregate` | 29.7 | 0.3 | 0.6 | 16.7 | 25.9 |
-| `max_aggregate` | 31.2 | 0.3 | 0.5 | 16.7 | 27.3 |
-| `min_aggregate` | 31.2 | 0.3 | 0.5 | 16.9 | 27.4 |
-| `order_by_multi_key` | 250.8 | 54.1 | 54.9 | 124.3 | 72.9 |
-| `order_distinct_take` | 140.2 | 1.1 | 75.2 | 21.6 | 36.8 |
-| `order_reverse_normalized` | 38.5 | 0.7 | 1.4 | 21.7 | 28.1 |
-| `order_take_desc` | 38.5 | 0.7 | 1.3 | 21.7 | 28.2 |
-| `reverse_distinct_by` | 297.5 | 1.6 | 3.2 | 21.8 | 35.5 |
-| `reverse_take` | 0.0 | 0.0 | 0.0 | 0.0 | 3.8 |
-| `reverse_take_select` | 0.0 | 0.0 | 0.0 | 0.0 | 3.8 |
-| `select_count` | 0.1 | 0.0 | 0.0 | 66.8 | 0.0 |
-| `select_many` | — | 63.2 | — | — | — |
-| `select_where` | 110.6 | 4.2 | 5.3 | 75.2 | 22.6 |
-| `select_where_count` | 32.8 | 0.3 | 0.6 | 16.9 | 26.8 |
-| `select_where_order_take` | 37.1 | 0.7 | 1.4 | 17.6 | 27.6 |
-| `select_where_sum` | 38.3 | 0.4 | 0.6 | 16.6 | 25.6 |
-| `single_match` | 0.0 | 0.4 | 1.1 | 46.5 | 22.4 |
-| `skip_take` | 0.3 | 0.0 | 0.0 | 1.2 | 0.2 |
-| `skip_while_match` | 3.5 | 0.4 | 0.4 | 46.5 | 22.2 |
-| `sort_first` | 38.4 | 0.4 | 1.3 | 16.7 | 27.0 |
-| `sort_take` | 38.9 | 0.7 | 1.4 | 21.7 | 28.0 |
-| `sort_take_select` | 38.5 | 0.7 | 1.3 | 21.8 | 27.6 |
-| `sum_aggregate` | 30.2 | 0.3 | 0.1 | 16.9 | 24.6 |
-| `sum_where` | 33.2 | 0.3 | 0.6 | 16.6 | 26.4 |
-| `take_count` | 1.9 | 0.1 | 0.1 | 1.2 | 0.2 |
-| `take_count_filtered` | 1.1 | 0.0 | 0.0 | 0.4 | 0.1 |
-| `take_sum_aggregate` | 0.8 | 0.0 | 0.0 | 0.2 | 0.0 |
-| `take_where_count` | 0.9 | 0.0 | 0.0 | 0.2 | 0.0 |
-| `take_while_match` | 7.8 | 0.2 | 0.3 | 17.0 | 9.1 |
-| `to_array_filter` | 49.0 | 3.3 | 3.3 | 20.1 | 35.9 |
-| `where_join_count` | 40.0 | 5.9 | 6.7 | 47.9 | 44.9 |
-| `zip_count_pred` | 39.4 | 0.1 | — | 115.0 | 34.0 |
-| `zip_dot_product` | 46.9 | 0.1 | 0.1 | 117.9 | 33.9 |
-| `zip_dot_product_3arg` | 47.1 | 0.1 | — | 115.0 | 34.0 |
-| `zip_reverse_to_array` | — | 4.7 | — | 126.6 | 51.1 |
+| Benchmark | SQL (m1) | Array (m3f) | Decs (m4) | XML fold (m5f) | JSON fold (m6f) | Table fold (m7) |
+|---|---:|---:|---:|---:|---:|---:|
+| `aggregate_match` | 35.0 | 0.3 | 0.6 | 21.7 | 27.3 | 13.6 |
+| `all_match` | 27.8 | 0.3 | 0.2 | 18.1 | 25.9 | 13.5 |
+| `any_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
+| `average_aggregate` | 29.9 | 1.0 | 3.6 | 18.0 | 24.4 | 13.4 |
+| `bare_last` | — | 0.4 | 0.0 | 0.0 | 0.0 | 17.1 |
+| `bare_order_where` | 186.2 | 34.0 | 35.3 | 106.3 | 52.4 | 78.7 |
+| `chained_select_collapse` | — | 1.1 | 1.1 | 20.4 | 33.0 | 14.0 |
+| `chained_where` | 35.9 | 0.6 | 0.8 | 35.5 | 31.5 | 17.6 |
+| `contains_match` | 0.0 | 0.2 | 0.1 | 14.8 | 9.2 | 4.7 |
+| `count_aggregate` | 29.5 | 0.3 | 0.6 | 20.4 | 25.1 | 13.4 |
+| `cross_join` | 5964.4 | 734.4 | — | 834.2 | 772.7 | — |
+| `decs_count_bare_pred` | — | — | 0.6 | — | — | — |
+| `distinct_by_count` | 41.0 | 1.1 | 1.1 | 20.4 | 32.0 | 14.0 |
+| `distinct_by_order_take` | 237.4 | 1.7 | 2.6 | 48.4 | 37.1 | 29.9 |
+| `distinct_by_order_to_array` | 237.2 | 1.7 | 2.7 | 47.5 | 36.8 | 30.0 |
+| `distinct_count` | 40.8 | 1.1 | 1.1 | 20.5 | 31.9 | 14.0 |
+| `distinct_count_pred` | 249.8 | 1.1 | 1.3 | 37.6 | 41.7 | 14.0 |
+| `distinct_take` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
+| `element_at_match` | 0.0 | 0.0 | 0.0 | 0.1 | 0.0 | 0.0 |
+| `first_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
+| `first_or_default_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
+| `groupby_average` | 170.1 | 1.5 | 1.9 | 35.7 | 43.0 | — |
+| `groupby_count` | 141.1 | 1.3 | 1.5 | 20.5 | 32.2 | 43.0 |
+| `groupby_first` | 251.0 | 1.3 | 2.3 | 20.5 | 32.9 | — |
+| `groupby_having_count` | 141.1 | 1.3 | 1.5 | 20.5 | 32.1 | — |
+| `groupby_having_hidden_sum` | 173.9 | 1.5 | 1.7 | 35.8 | 42.7 | — |
+| `groupby_having_post_where` | 170.2 | 1.4 | 1.9 | 35.8 | 41.8 | — |
+| `groupby_max` | 172.3 | 1.5 | 1.9 | 35.9 | 43.6 | — |
+| `groupby_min` | 173.0 | 1.5 | 1.8 | 35.8 | 43.6 | — |
+| `groupby_multi_reducer` | 191.8 | 1.6 | 1.9 | 36.1 | 43.7 | — |
+| `groupby_select_order` | 170.5 | 1.4 | 1.9 | 35.8 | 42.0 | — |
+| `groupby_select_sum` | 195.5 | 2.8 | 3.2 | 32.3 | 37.6 | — |
+| `groupby_sum` | 169.8 | 1.4 | 1.6 | 35.8 | 42.0 | 51.2 |
+| `groupby_where_count` | 75.7 | 0.9 | 1.3 | 35.9 | 39.7 | — |
+| `groupby_where_sum` | 86.4 | 0.9 | 1.3 | 35.9 | 39.6 | — |
+| `join_count` | 37.9 | 11.0 | 11.7 | 43.4 | 68.3 | 62.9 |
+| `join_groupby_count` | 156.2 | 18.2 | 20.0 | 68.3 | 86.7 | — |
+| `join_groupby_to_array` | 189.2 | 17.5 | 19.4 | 80.2 | 36.1 | — |
+| `join_select` | 92.8 | 19.6 | 21.6 | 74.4 | 94.1 | — |
+| `join_where_count` | 39.1 | 18.9 | 20.6 | 64.5 | 77.9 | 80.0 |
+| `last_match` | 0.0 | 0.5 | 1.4 | 18.6 | 25.9 | 22.9 |
+| `long_count_aggregate` | 28.7 | 0.3 | 0.6 | 20.4 | 26.6 | 13.4 |
+| `max_aggregate` | 30.6 | 0.3 | 0.5 | 18.1 | 26.7 | 13.4 |
+| `min_aggregate` | 30.6 | 0.3 | 0.5 | 18.2 | 26.3 | 13.4 |
+| `order_by_multi_key` | 247.0 | 53.4 | 54.8 | 125.3 | 70.3 | 128.9 |
+| `order_distinct_take` | 137.9 | 1.1 | 75.6 | 20.9 | 34.1 | 14.0 |
+| `order_reverse_normalized` | 37.8 | 0.7 | 1.3 | 24.6 | 27.0 | — |
+| `order_take_desc` | 38.0 | 0.7 | 1.3 | 24.5 | 26.9 | 17.7 |
+| `reverse_distinct_by` | 295.4 | 1.5 | 3.2 | 20.4 | 32.7 | — |
+| `reverse_take` | 0.0 | 0.0 | 0.0 | 0.0 | 3.7 | 26.8 |
+| `reverse_take_select` | 0.0 | 0.0 | 0.0 | 0.0 | 3.8 | — |
+| `select_count` | 0.1 | 0.0 | 0.0 | 63.4 | 0.0 | 0.0 |
+| `select_many` | — | 61.5 | — | — | — | — |
+| `select_where` | 110.5 | 4.3 | 5.3 | 76.1 | 22.1 | 27.9 |
+| `select_where_count` | 32.1 | 0.3 | 0.6 | 18.4 | 25.9 | 13.3 |
+| `select_where_order_take` | 36.3 | 0.7 | 1.4 | 18.9 | 26.6 | 22.9 |
+| `select_where_sum` | 37.0 | 0.4 | 0.6 | 17.9 | 24.9 | 13.3 |
+| `single_match` | 0.0 | 0.4 | 1.1 | 43.4 | 22.2 | 17.2 |
+| `skip_take` | 0.3 | 0.0 | 0.0 | 1.3 | 0.2 | 0.1 |
+| `skip_while_match` | 3.5 | 0.4 | 0.4 | 43.5 | 21.8 | 13.2 |
+| `sort_first` | 37.7 | 0.4 | 1.4 | 17.9 | 26.1 | 17.1 |
+| `sort_take` | 38.0 | 0.7 | 1.5 | 24.5 | 26.8 | 17.7 |
+| `sort_take_select` | 37.8 | 0.7 | 1.3 | 24.5 | 26.9 | 17.7 |
+| `sum_aggregate` | 29.6 | 0.3 | 0.1 | 23.3 | 24.3 | 13.4 |
+| `sum_where` | 32.1 | 0.3 | 0.6 | 18.4 | 25.9 | 13.3 |
+| `take_count` | 1.8 | 0.1 | 0.1 | 1.2 | 0.3 | 0.4 |
+| `take_count_filtered` | 1.1 | 0.0 | 0.0 | 0.4 | 0.1 | 0.2 |
+| `take_sum_aggregate` | 0.8 | 0.0 | 0.0 | 0.2 | 0.0 | 0.1 |
+| `take_where_count` | 0.8 | 0.0 | 0.0 | 0.2 | 0.0 | 0.1 |
+| `take_while_match` | 7.8 | 0.2 | 0.3 | 14.7 | 9.0 | 13.3 |
+| `to_array_filter` | 47.1 | 3.3 | 3.3 | 21.3 | 33.6 | 20.0 |
+| `where_join_count` | 39.0 | 5.8 | 6.7 | 49.5 | 40.6 | — |
+| `zip_count_pred` | 39.1 | 0.1 | — | 116.7 | 33.5 | — |
+| `zip_dot_product` | 46.3 | 0.1 | 0.1 | 116.6 | 33.4 | — |
+| `zip_dot_product_3arg` | 46.1 | 0.1 | — | 116.5 | 33.4 | — |
+| `zip_reverse_to_array` | — | 4.6 | — | 127.7 | 50.0 | — |
 <!-- BENCH:TABLES END -->
 
 ## Missing lanes (the `—` cells)
@@ -200,6 +203,7 @@ Each empty cell's reason is also in the bench `.das` file's comment; SQL gaps ar
 - **`reverse_distinct_by` m4 / m5f** — array uses the backward-index walk; non-array sources fuse the forward keep-last splice (decs 27.6/5.0, XML 74.5/22.2); SQL uses MAX(pk).
 - **`order_distinct_take` m4 vs m3f** — `unique_key` hashes workhorse keys directly (array `int`) but string-interpolates structs (decs `DecsBrand`); the gap is per-element string hashing, not decs-walk. `distinct_by_count` is the key-based variant (m4 parity).
 - **`zip_reverse_to_array` / `zip_*` SQL / Decs** — `reverse` has no SQL order key; zip is not relational / not expressible over one archetype walk. By design. (XML/JSON zip lanes are lit, partially fused.)
+- **m7 absent families** — `zip_*` / `cross_join` (lockstep over an unordered slot walk is meaningless), `select_many` (flat fixture, no nested array field), `order_reverse_normalized` / `reverse_take_select` / `reverse_distinct_by` (no backward slot walk; `reverse_take` is kept as the single deferral marker), the group-by tail beyond `groupby_count`/`groupby_sum` and joins beyond `join_count`/`join_where_count` (table group_by/join fusion is staged — see `LINQ_TO_TABLE.md`; the four marker cells track the tier-2 cost until then), `decs_count_bare_pred` (decs-only).
 
 ## Accepted floors
 
diff --git a/benchmarks/sql/table.das b/benchmarks/sql/table.das
new file mode 100644
index 000000000..66564b963
--- /dev/null
+++ b/benchmarks/sql/table.das
@@ -0,0 +1,620 @@
+options gen2
+options persistent_heap
+
+require _common public
+
+// Per-source table benchmark lane (m7): the same Car rows keyed by id in a table<int; Car>, chains in
+// each_kv form (`_.key` / `_.value.<field>`) so the kv usage-pruner picks the cheapest iterator set.
+// Table slot order is unspecified — guards stay order-insensitive. Functions stay named <family>_m7.
+
+let N = 100000
+
+typedef CarKV = tuple<key : int; value : Car>
+
+var g_t : table<int; Car>
+var g_dealers : array<Dealer>
+
+[init]
+def table_bench_init {
+    g_t <- fixture_table(N)
+    g_dealers <- fixture_dealers_array()
+}
+
+[finalize]
+def table_bench_fini {
+    delete g_t
+    delete g_dealers
+}
+
+[benchmark]
+def aggregate_match_m7(b : B?) {
+    b |> run("aggregate_match", N) {
+        let total = _fold(unsafe(each_kv(g_t))._where(_.value.price > 200)
+            .aggregate(0, $(acc : int, c : CarKV) => acc + c.value.price))
+        b |> accept(total)
+        if (total == 0) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def all_match_m7(b : B?) {
+    b |> run("all_match", N) {
+        let yes = _fold(unsafe(each_kv(g_t))._all(_.value.price < 9999))
+        b |> accept(yes)
+        if (!yes) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def any_match_m7(b : B?) {
+    b |> run("any_match", N) {
+        let yes = _fold(unsafe(each_kv(g_t))._any(_.value.price > 500))
+        b |> accept(yes)
+        if (!yes) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def average_aggregate_m7(b : B?) {
+    b |> run("average_aggregate", N) {
+        let a = _fold(unsafe(each_kv(g_t))._select(double(_.value.price)).average())
+        b |> accept(a)
+        if (a == 0.0lf) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def bare_last_m7(b : B?) {
+    b |> run("bare_last", N) {
+        let row = _fold(unsafe(each_kv(g_t)).last())
+        b |> accept(row)
+        if (row.key == 0) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def bare_order_where_m7(b : B?) {
+    b |> run("bare_order_where", N) {
+        let rows <- _fold(unsafe(each_kv(g_t))._where(_.value.price > 500)
+                                   ._order_by(_.value.price)
+                                   .to_array())
+        b |> accept(rows)
+        if (empty(rows)) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def chained_select_collapse_m7(b : B?) {
+    b |> run("chained_select_collapse", N) {
+        let c = _fold(unsafe(each_kv(g_t)) |> _select(_.value.brand) |> _select(_ + 1) |> distinct() |> count())
+        b |> accept(c)
+        if (c == 0) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def chained_where_m7(b : B?) {
+    b |> run("chained_where", N) {
+        let c = _fold(unsafe(each_kv(g_t))._where(_.value.price > 500)
+                               ._where(_.value.year >= 2015)
+                               .count())
+        b |> accept(c)
+        if (c == 0) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def contains_match_m7(b : B?) {
+    b |> run("contains_match", N) {
+        let yes = _fold(unsafe(each_kv(g_t))._select(_.key).contains(50000))
+        b |> accept(yes)
+        if (!yes) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def count_aggregate_m7(b : B?) {
+    b |> run("count_aggregate", N) {
+        let c = _fold(unsafe(each_kv(g_t))._where(_.value.price > 500).count())
+        b |> accept(c)
+        if (c == 0) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def distinct_by_count_m7(b : B?) {
+    b |> run("distinct_by_count", N) {
+        let c = _fold(unsafe(each_kv(g_t))._distinct_by(_.value.brand).count())
+        b |> accept(c)
+        if (c == 0) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def distinct_by_order_take_m7(b : B?) {
+    b |> run("distinct_by_order_take", N) {
+        unsafe {
+            let rows <- _fold(each_kv(g_t)._distinct_by(_.value.dealer_id)._order_by(_.value.price).take(10).to_array())
+            b |> accept(rows)
+            if (empty(rows)) {
+                b->failNow()
+            }
+        }
+    }
+}
+
+[benchmark]
+def distinct_by_order_to_array_m7(b : B?) {
+    b |> run("distinct_by_order_to_array", N) {
+        unsafe {
+            let rows <- _fold(each_kv(g_t)._distinct_by(_.value.dealer_id)._order_by(_.value.price).to_array())
+            b |> accept(rows)
+            if (empty(rows)) {
+                b->failNow()
+            }
+        }
+    }
+}
+
+[benchmark]
+def distinct_count_m7(b : B?) {
+    b |> run("distinct_count", N) {
+        let rows <- _fold(unsafe(each_kv(g_t))._select(_.value.brand).distinct().to_array())
+        b |> accept(rows)
+        if (empty(rows)) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def distinct_count_pred_m7(b : B?) {
+    b |> run("distinct_count_pred", N) {
+        unsafe {
+            let c = _fold(each_kv(g_t) |> _distinct_by(_.value.brand) |> count($(c) => c.value.year > 2009))
+            b |> accept(c)
+            if (c == 0) {
+                b->failNow()
+            }
+        }
+    }
+}
+
+[benchmark]
+def distinct_take_m7(b : B?) {
+    b |> run("distinct_take", N) {
+        unsafe {
+            let rows <- _fold(each_kv(g_t)._select(_.value.brand).distinct().take(3).to_array())
+            b |> accept(rows)
+            if (empty(rows)) {
+                b->failNow()
+            }
+        }
+    }
+}
+
+[benchmark]
+def element_at_match_m7(b : B?) {
+    b |> run("element_at_match", N) {
+        let row = _fold(unsafe(each_kv(g_t))._where(_.value.price > 500).element_at(100))
+        b |> accept(row)
+        if (row.key == 0) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def first_match_m7(b : B?) {
+    b |> run("first_match", N) {
+        let row = _fold(unsafe(each_kv(g_t))._where(_.value.price > 500).first())
+        b |> accept(row)
+        if (row.value.price <= 500) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def first_or_default_match_m7(b : B?) {
+    let sentinel : CarKV = (key = -1, value = Car(id = -1, name = "none", price = 0, brand = 0, year = 0, dealer_id = 0))
+    b |> run("first_or_default_match", N) {
+        let row = _fold(unsafe(each_kv(g_t))._where(_.value.price > 500).first_or_default(sentinel))
+        b |> accept(row)
+        if (row.key == -1) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def groupby_count_m7(b : B?) {
+    b |> run("groupby_count", N) {
+        let groups <- _fold(unsafe(each_kv(g_t))
+                            ._group_by(_.value.brand)
+                            ._select((Brand = _._0, N = _._1 |> length))
+                            .to_array())
+        b |> accept(groups)
+        if (empty(groups)) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def groupby_sum_m7(b : B?) {
+    b |> run("groupby_sum", N) {
+        let groups <- _fold(unsafe(each_kv(g_t))
+                            ._group_by(_.value.brand)
+                            ._select((Brand = _._0,
+                                      TotalPrice = _._1 |> select($(c : CarKV) => c.value.price) |> sum()))
+                            .to_array())
+        b |> accept(groups)
+        if (empty(groups)) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def join_count_m7(b : B?) {
+    b |> run("join_count", N) {
+        let c = _fold(unsafe(each_kv(g_t)) |> _join(g_dealers,
+                                                    $(c : CarKV, d : Dealer) => c.value.dealer_id == d.id,
+                                                    $(c : CarKV, d : Dealer) => (c.value.name, d.name))
+                                           |> count())
+        b |> accept(c)
+        if (c == 0) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def join_where_count_m7(b : B?) {
+    b |> run("join_where_count", N) {
+        let c = _fold(unsafe(each_kv(g_t)) |> _join(g_dealers,
+                                $(c : CarKV, d : Dealer) => c.value.dealer_id == d.id,
+                                $(c : CarKV, d : Dealer) => (CarPrice = c.value.price, DealerId = d.id))
+                       |> _where(_.CarPrice > 500)
+                       |> count())
+        b |> accept(c)
+        if (c == 0) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def last_match_m7(b : B?) {
+    b |> run("last_match", N) {
+        let row = _fold(unsafe(each_kv(g_t))._where(_.value.price > 500).last())
+        b |> accept(row)
+        if (row.value.price <= 500) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def long_count_aggregate_m7(b : B?) {
+    b |> run("long_count_aggregate", N) {
+        let c = _fold(unsafe(each_kv(g_t))._where(_.value.price > 500).long_count())
+        b |> accept(c)
+        if (c == 0l) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def max_aggregate_m7(b : B?) {
+    b |> run("max_aggregate", N) {
+        let m = _fold(unsafe(each_kv(g_t))._select(_.value.price).max())
+        b |> accept(m)
+        if (m == 0) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def min_aggregate_m7(b : B?) {
+    b |> run("min_aggregate", N) {
+        let m = _fold(unsafe(each_kv(g_t))._select(_.value.price).min())
+        b |> accept(m)
+        if (m > 999) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def order_by_multi_key_m7(b : B?) {
+    b |> run("order_by_multi_key", N) {
+        let rows <- _fold(unsafe(each_kv(g_t))._where(_.value.price > 500)
+                                   ._order_by_keys((_.value.brand, _.value.price), 0u)
+                                   .to_array())
+        b |> accept(rows)
+        if (empty(rows)) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def order_distinct_take_m7(b : B?) {
+    b |> run("order_distinct_take", N) {
+        unsafe {
+            let rows <- _fold(each_kv(g_t)._select(_.value.brand)._order_by(_).distinct().take(5).to_array())
+            b |> accept(rows)
+            if (empty(rows)) {
+                b->failNow()
+            }
+        }
+    }
+}
+
+[benchmark]
+def order_take_desc_m7(b : B?) {
+    b |> run("order_take_desc", N) {
+        let rows <- _fold(unsafe(each_kv(g_t))._order_by_descending(_.value.price).take(10).to_array())
+        b |> accept(rows)
+        if (empty(rows)) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def reverse_take_m7(b : B?) {
+    b |> run("reverse_take", N) {
+        unsafe {
+            let rows <- _fold(each_kv(g_t).reverse().take(10).to_array())
+            b |> accept(rows)
+            if (empty(rows)) {
+                b->failNow()
+            }
+        }
+    }
+}
+
+[benchmark]
+def select_count_m7(b : B?) {
+    b |> run("select_count", N) {
+        let c = _fold(unsafe(each_kv(g_t))._select(_.value.price * 2).count())
+        b |> accept(c)
+        if (c == 0) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def select_where_m7(b : B?) {
+    b |> run("select_where", N) {
+        let rows <- _fold(unsafe(each_kv(g_t))._where(_.value.price > 500).to_array())
+        b |> accept(rows)
+        if (empty(rows)) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def select_where_count_m7(b : B?) {
+    b |> run("select_where_count", N) {
+        let c = _fold(unsafe(each_kv(g_t))._select(_.value.price * 2)._where(_ > 1000).count())
+        b |> accept(c)
+        if (c == 0) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def select_where_order_take_m7(b : B?) {
+    b |> run("select_where_order_take", N) {
+        let rows <- _fold(unsafe(each_kv(g_t))._where(_.value.price > 500)
+                                   ._order_by(_.value.price)
+                                   .take(10)
+                                   .to_array())
+        b |> accept(rows)
+        if (empty(rows)) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def select_where_sum_m7(b : B?) {
+    b |> run("select_where_sum", N) {
+        let s = _fold(unsafe(each_kv(g_t))._select(_.value.price * 2)._where(_ > 1000).sum())
+        b |> accept(s)
+        if (s == 0) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def single_match_m7(b : B?) {
+    b |> run("single_match", N) {
+        let row = _fold(unsafe(each_kv(g_t))._where(_.key == 42).single())
+        b |> accept(row)
+        if (row.key != 42) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def skip_take_m7(b : B?) {
+    b |> run("skip_take", N) {
+        let rows <- _fold(unsafe(each_kv(g_t)).skip(1000).take(100).to_array())
+        b |> accept(rows)
+        if (empty(rows)) {
+            b->failNow()
+        }
+    }
+}
+
+// skip_while / take_while measure the gated walk: slot order is unspecified, so the predicates are
+// chosen to be uniformly false (skip nothing) / uniformly true (take everything) — full deterministic walks.
+[benchmark]
+def skip_while_match_m7(b : B?) {
+    b |> run("skip_while_match", N) {
+        let total = _fold(unsafe(each_kv(g_t))._skip_while(_.key < 0).count())
+        b |> accept(total)
+        if (total == 0) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def sort_first_m7(b : B?) {
+    b |> run("sort_first", N) {
+        let row = _fold(unsafe(each_kv(g_t))._order_by(_.value.price).first())
+        b |> accept(row)
+        if (row.key == 0) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def sort_take_m7(b : B?) {
+    b |> run("sort_take", N) {
+        unsafe {
+            let rows <- _fold(each_kv(g_t)._order_by(_.value.price).take(10).to_array())
+            b |> accept(rows)
+            if (empty(rows)) {
+                b->failNow()
+            }
+        }
+    }
+}
+
+[benchmark]
+def sort_take_select_m7(b : B?) {
+    b |> run("sort_take_select", N) {
+        unsafe {
+            let rows <- _fold(each_kv(g_t)._order_by(_.value.price).take(10)._select(_.value.name).to_array())
+            b |> accept(rows)
+            if (empty(rows)) {
+                b->failNow()
+            }
+        }
+    }
+}
+
+[benchmark]
+def sum_aggregate_m7(b : B?) {
+    b |> run("sum_aggregate", N) {
+        let s = _fold(unsafe(each_kv(g_t))._select(_.value.price).sum())
+        b |> accept(s)
+        if (s == 0) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def sum_where_m7(b : B?) {
+    b |> run("sum_where", N) {
+        let s = _fold(unsafe(each_kv(g_t))._where(_.value.price > 500)._select(_.value.price).sum())
+        b |> accept(s)
+        if (s == 0) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def take_count_m7(b : B?) {
+    b |> run("take_count", N) {
+        let rows <- _fold(unsafe(each_kv(g_t)).take(1000).to_array())
+        b |> accept(rows)
+        if (empty(rows)) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def take_count_filtered_m7(b : B?) {
+    b |> run("take_count_filtered", N) {
+        let c = _fold(unsafe(each_kv(g_t))._where(_.value.price > 500).take(1000).count())
+        b |> accept(c)
+        if (c == 0) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def take_sum_aggregate_m7(b : B?) {
+    b |> run("take_sum_aggregate", N) {
+        let s = _fold(unsafe(each_kv(g_t))._select(_.value.price).take(1000).sum())
+        b |> accept(s)
+        if (s == 0) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def take_where_count_m7(b : B?) {
+    b |> run("take_where_count", N) {
+        let c = _fold(unsafe(each_kv(g_t)).take(1000)._where(_.value.price > 500).count())
+        b |> accept(c)
+        if (c == 0) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def take_while_match_m7(b : B?) {
+    b |> run("take_while_match", N) {
+        let total = _fold(unsafe(each_kv(g_t))._take_while(_.key > 0).count())
+        b |> accept(total)
+        if (total == 0) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def to_array_filter_m7(b : B?) {
+    b |> run("to_array_filter", N) {
+        let prices <- _fold(unsafe(each_kv(g_t))._where(_.value.price > 500)._select(_.value.price).to_array())
+        b |> accept(prices)
+        if (empty(prices)) {
+            b->failNow()
+        }
+    }
+}
diff --git a/daslib/linq_fold.das b/daslib/linq_fold.das
index 71cc33a1d..d09f9f027 100644
--- a/daslib/linq_fold.das
+++ b/daslib/linq_fold.das
@@ -36,6 +36,7 @@ require daslib/linq_fold_common public
 require daslib/linq_fold_array public
 require daslib/linq_fold_decs public
 require daslib/linq_fold_json public          // in-tree JSON source adapter — emits by name, pulls in no json dep
+require daslib/linq_fold_table public         // in-tree table source adapter — each_kv/keys/values chain heads
 require ?pugixml pugixml/linq_fold_xml       // optional XML source adapter — loaded only when pugixml is linked
 require ?sqlite sqlite/linq_fold_sql         // optional SQL source pass-through — loaded only when sqlite is linked
 
@@ -211,6 +212,28 @@ def private try_splice_patterns(prog : ProgramPtr; var expr : Expression?) : Exp
             new JsonAdapter(jsonExpr = clone_expression(jsonCall.arguments[0]), srcName = qn("jsrc", at),
                             elemType = clone_type(jsonCall._type.firstType)), exprIsIter, at)
     }
+    // Table adapter (in-tree, no static_if — extract_table_source name-matches each_kv/keys/values with a table-typed arg). The kv lane fuses only copyable values: a non-copyable-valued each_kv falls through to the array tier, where the surviving each_kv instantiation concept-asserts with the user-facing message.
+    var tabCall = extract_table_source(top)
+    if (tabCall != null) {
+        let tabName = get_call_short_name(tabCall)
+        let valT = tabCall.arguments[0]._type.secondType
+        if (tabName != "each_kv" || (valT != null && valT.canCopy)) {
+            let lane = tabName == "each_kv" ? TableLane.KV : (tabName == "keys" ? TableLane.KEYS : TableLane.VALUES)
+            if (lane != TableLane.VALUES) {
+                drop_redundant_distinct(calls)   // keys are unique by construction; values can repeat
+            }
+            var ttopClone = clone_expression(top)
+            // keys/each_kv spell their element `-const` (iterator-variance concern); that flag must not leak into emitted var/buffer type spellings (`array<tuple<…> -const>` breaks push_clone unification).
+            if (ttopClone._type != null && ttopClone._type.firstType != null) {
+                ttopClone._type.firstType.flags.removeConstant = false
+            }
+            var elemT = clone_type(tabCall._type.firstType)
+            elemT.flags.removeConstant = false
+            return run_splice_adapter(calls, ttopClone, ttopClone,
+                new TableAdapter(tabExpr = clone_expression(tabCall.arguments[0]), srcName = qn("tsrc", at),
+                                 elemType = elemT, lane = lane), exprIsIter, at)
+        }
+    }
     top = peel_each(top)
     var topClone = clone_expression(top)
     return run_splice_adapter(calls, top, topClone,
diff --git a/daslib/linq_fold.md b/daslib/linq_fold.md
index eda6a38c1..8f313f4b4 100644
--- a/daslib/linq_fold.md
+++ b/daslib/linq_fold.md
@@ -69,7 +69,7 @@ The adapter is an abstract `class SourceAdapter` (`[macro_interface]`, so every
 
 Emit fns hold a `SourceAdapter?` (via `EmitCtx.src` or an `adapter` local) and call these virtually. **daslang classes have no `is`/`as` downcast** (variant-only), so source-specific data is never pulled off a base pointer by downcasting — it goes through virtual methods. Beyond the 4 dispatch methods the base also declares 6 default-null **per-operation hook methods** (`emit_loop_or_count` / `emit_reverse_skip_into_tail` / `emit_reverse_last_backward` / `emit_distinct_take_loop` / `build_group_by_adapter` / `emit_join_hook`) that the owning source overrides; the generic lane falls back to its inline (array) body when the hook returns null. (`XmlAdapter` overrides the two reverse hooks with a **backward DOM walk** — `last_child`/`previous_sibling`, both O(1) in pugixml: `emit_reverse_skip_into_tail` collects only the last N children for `reverse |> take(N)` (m5f `reverse_take` 88.9 → 0.0 ns/op), and `emit_reverse_last_backward` returns the last element in one step for a no-predicate `last()` / `reverse |> first`. Predicated `[where] |> last` stays on the forward walk — reverse DOM traversal is ~2× cache-hostile per node, profiled — and the named 3-arg `from_xml_node` form falls back to the buffer path since pugixml has no last-named-child primitive.) (`emit_join_hook` is the standalone-join dispatch: the single `join_general` pattern's thin `emit_join` routes to it, so each source supplies its own join body — array `for`+2-param invoke, decs `for_each_archetype`, XML field-pruned DOM walk — with no parallel per-source join pattern.) It also declares **capability methods** the source answers about itself — `can_group_by` / `can_join` / `can_reserve_by_length` / `has_own_loop_or_count_lane` (bool, default false) and `name_prefix` (string) — which replaced the old `kind() : AdapterKind` enum + per-site switches, so a new source only implements the methods (no central enum to extend). The `can_group_by` / `can_join` capabilities are queried from the `can_group_by_source` / `can_join_source` `RequiresPredicate`s (which thread the adapter), so the single `group_by` / `join_general` pattern admits any capable source and the adapter's `build_group_by_adapter` / `emit_join_hook` does the source/srcb-shape gating (null → tier-2). Two transitional getters remain — `arrayTop()`/`arraySrcName()` (default null/"" on base, overridden by `ArrayAdapter`); the decs-specific getters were removed in G2a so the base (and thus `linq_fold_common`) is free of `DecsAdapter`/ECS coupling. One decorator subclass lives in `linq_fold_common`: `ProjectedSourceAdapter` wraps any inner adapter to absorb a leading `_select(f)` source projection (the `srcsel` slot) — it binds `projName = f(rawElem)` atop the per-element body and delegates `wrap_source_loop`/`wrap_invoke`/`name_prefix` to the inner adapter, leaving the base no-op `arrayTop`/`arraySrcName`/`can_reserve_by_length` so source-direct fast paths (which would bypass the projection) stay disabled. This lets order/distinct splices fuse over `source |> _select(f) |> …` for any source.
 
-**Realized module layout (post-G3d):** `linq_fold_common` (kernel + abstract base + adapter-pure generic lanes — terminator/fold-array plus the source-generic loop_or_count / counter / accumulator / early-exit lanes, with `LoopDispatch` + the per-op `!supports_direct_return` state path that lets nested-callback sources ride the early-exit lane — + `splice_patterns` + `DecsBridgeShape`/`extract_decs_bridge`) ← `linq_fold_array` (Array/Zip/ArrayJoin adapters + the zip/join emit `emit_zip`/`emit_array_join` + array row-builders) and `linq_fold_decs` (Decs/DecsJoin adapters + decs-bridge visitors + the decs dispatcher `emit_loop_or_count_lane_decs` + the decs-specific hooks `emit_decs_count_archsize`/`emit_decs_reverse_skip_into_tail`/`emit_decs_join_impl`/`emit_decs_min_max_by` — the parallel terminator scaffold is gone, decs rides the generic lanes via `DecsAdapter`); the engine `linq_fold` requires all three and holds only the dispatcher + the `LinqFold` macro + the single `register_all_linq_fold_rows`. Adding a source = a new `linq_fold_<src>.das` subclass module + one `require` + one `build_<src>_rows()` call in the engine registrar.
+**Realized module layout (post-G3d):** `linq_fold_common` (kernel + abstract base + adapter-pure generic lanes — terminator/fold-array plus the source-generic loop_or_count / counter / accumulator / early-exit lanes, with `LoopDispatch` + the per-op `!supports_direct_return` state path that lets nested-callback sources ride the early-exit lane — + `splice_patterns` + `DecsBridgeShape`/`extract_decs_bridge`) ← `linq_fold_array` (Array/Zip/ArrayJoin adapters + the zip/join emit `emit_zip`/`emit_array_join` + array row-builders) and `linq_fold_decs` (Decs/DecsJoin adapters + decs-bridge visitors + the decs dispatcher `emit_loop_or_count_lane_decs` + the decs-specific hooks `emit_decs_count_archsize`/`emit_decs_reverse_skip_into_tail`/`emit_decs_join_impl`/`emit_decs_min_max_by` — the parallel terminator scaffold is gone, decs rides the generic lanes via `DecsAdapter`); the engine `linq_fold` requires all three and holds only the dispatcher + the `LinqFold` macro + the single `register_all_linq_fold_rows`. Adding a source = a new `linq_fold_<src>.das` subclass module + one `require` + one `build_<src>_rows()` call in the engine registrar. Later sources follow that recipe: `linq_fold_json` (`JsonAdapter`/`JsonJoinAdapter`), `pugixml/linq_fold_xml` (`XmlAdapter`, optional), `sqlite/linq_fold_sql` (pass-through detector), and `linq_fold_table` (`TableAdapter` over `each_kv`/`keys`/`values` heads — kv usage-pruned slot walks, no new rows; arc plan in `benchmarks/sql/LINQ_TO_TABLE.md`).
 
 ## Goal
 
diff --git a/daslib/linq_fold_table.das b/daslib/linq_fold_table.das
new file mode 100644
index 000000000..b089bb0b6
--- /dev/null
+++ b/daslib/linq_fold_table.das
@@ -0,0 +1,212 @@
+options gen2
+options indenting = 4
+options no_unused_block_arguments = false
+options no_unused_function_arguments = false
+options _comment_hygiene = true
+
+module linq_fold_table shared public
+
+//! linq_fold table source adapter: ``TableAdapter`` + ``extract_table_source``. Lets ``_fold`` over an
+//! ``each_kv(tab)`` / ``keys(tab)`` / ``values(tab)`` chain emit a fused slot-walk loop over the table
+//! instead of riding the generic iterator tier. The kv lane scans which sides of the pair the chain
+//! touches and prunes the walk to a keys-only / values-only single iterator (the table analog of XML
+//! field-pruning); bare ``count()`` folds to O(1) ``length(tab)``; a plain ``distinct`` over raw keys/kv
+//! elements is dropped (keys are unique by construction). In-tree companion to ``daslib/linq_fold``
+//! (required unconditionally; the matcher returns null for non-table chains). Emits ``keys`` / ``values``
+//! / the kv zip BY NAME at the user's splice site. See benchmarks/sql/LINQ_TO_TABLE.md.
+
+require daslib/ast_boost
+require daslib/ast_match
+require daslib/templates_boost
+require daslib/linq_fold_common public
+
+enum TableLane {
+    KV
+    KEYS
+    VALUES
+}
+
+// Per-element table walk; the KV lane prunes to the cheapest iterator set from the body's
+// `it.key` / `it.value` usage (the table analog of XML field-pruning, see LINQ_TO_TABLE.md).
+[macro_function]
+def private build_table_walk(lane : TableLane; srcName, bindName : string; var body : Expression?;
+                             var breakGuard : Expression?; at : LineInfo) : Expression? {
+    var inner : array<Expression?>
+    if (breakGuard != null) {
+        inner |> push <| qmacro_expr() {
+            break if ($e(breakGuard))
+        }
+    }
+    // Literal loop-var names below: the qmacro grammar only allows $i() in the FIRST iterator slot of a multi-source for, so the zip header uses fixed names (same trade ZipAdapter makes with itA/itB) — they live only inside the generated invoke. keys() deliberately yields NON-const elements (writable temp copies), so the engine-visible bind is a `let` rebind of the loop var — keys are workhorse types, the copy is free, and downstream ==const composition (push_clone of a bare projected key) needs the const.
+    if (lane == TableLane.KEYS) {
+        inner |> push <| qmacro_expr() {
+            let $i(bindName) = _tab_kv_key_
+        }
+        inner |> push(body)
+        return <- qmacro_block() {
+            for (_tab_kv_key_ in keys($i(srcName))) {
+                $b(inner)
+            }
+        }
+    }
+    if (lane == TableLane.VALUES) {
+        // values over the const table param yield `V& const` — bind directly, no rebind copy
+        inner |> push(body)
+        return <- qmacro_block() {
+            for ($i(bindName) in values($i(srcName))) {
+                $b(inner)
+            }
+        }
+    }
+    // KV lane
+    let vName = "_tab_kv_value_"
+    var (allUsed, usedFields) = collect_row_usage(body, bindName)
+    if (allUsed) {
+        inner |> push <| qmacro_expr() {
+            let $i(bindName) = (key = _tab_kv_key_, value = _tab_kv_value_)
+        }
+        inner |> push(body)
+        return <- qmacro_block() {
+            for (_tab_kv_key_, _tab_kv_value_ in keys($i(srcName)), values($i(srcName))) { // nolint:LINT002 parallel-for must bind both vars
+                $b(inner)
+            }
+        }
+    }
+    let useKey = usedFields |> has_value("key")
+    let useValue = usedFields |> has_value("value")
+    let kLocal = qn("tkey", at)
+    var fieldToLocal <- { "key" => kLocal, "value" => vName }
+    body = flatten_row_to_locals(body, bindName, fieldToLocal)
+    if (useKey) {
+        inner |> push <| qmacro_expr() {
+            let $i(kLocal) = _tab_kv_key_
+        }
+    }
+    inner |> push(body)
+    if (useValue && !useKey) {
+        return <- qmacro_block() {
+            for ($i(vName) in values($i(srcName))) {
+                $b(inner)
+            }
+        }
+    }
+    if (useKey && useValue) {
+        return <- qmacro_block() {
+            for (_tab_kv_key_, _tab_kv_value_ in keys($i(srcName)), values($i(srcName))) { // nolint:LINT002 parallel-for must bind both vars
+                $b(inner)
+            }
+        }
+    }
+    // key-only AND no-field-touched (e.g. bare count walk) both ride the cheaper keys iterator
+    return <- qmacro_block() {
+        for (_tab_kv_key_ in keys($i(srcName))) { // nolint:LINT002 body may not read the key (bare-count walk)
+            $b(inner)
+        }
+    }
+}
+
+// Single flat for-loop over the table inside a 1-param invoke binding the table, like ArrayAdapter over
+// an array — except the loop iterator(s) derive from the table param at the splice site.
+class TableAdapter : SourceAdapter {
+    tabExpr  : Expression?      // the table expression (argument of each_kv/keys/values)
+    srcName  : string           // invoke param name binding the table
+    elemType : TypeDeclPtr      // KV: tuple<key:K;value:V>; KEYS: K; VALUES: V
+    lane     : TableLane
+    def override name_prefix() : string {
+        return "tab_"
+    }
+    def override supports_direct_return() : bool {
+        return true   // single flat for-loop inside the invoke; a mid-loop `return` exits the invoke
+    }
+    def override can_reserve_by_length() : bool {
+        return true   // length(tab) is O(1); the shared reserve hint reads arrayTop/arraySrcName
+    }
+    def override arrayTop() : Expression? {
+        // Feeds the reserve hint (type_has_length covers tables). The backward-index reverse lanes that
+        // also read arrayTop gate on array_source, which is false here — matchTop stays iterator-typed.
+        return tabExpr
+    }
+    def override arraySrcName() : string {
+        return srcName
+    }
+    def override bind_name(at : LineInfo) : string {
+        return qn("it", at)
+    }
+    def override element_type() : TypeDeclPtr {
+        return clone_type(elemType)
+    }
+    def override count_shortcut(opName : string; at : LineInfo) : Expression? {
+        return emit_length_shortcut(opName, tabExpr, srcName, at)
+    }
+    def override wrap_source_loop(loopShape : LoopDispatch; var body : Expression?; at : LineInfo) : Expression? {
+        return build_table_walk(lane, srcName, bind_name(at), body, null, at)
+    }
+    def override emit_distinct_take_loop(bindName : string; takenName : string; takeLimName : string; var perElement : Expression?; at : LineInfo) : Expression? {
+        var breakGuard = qmacro($i(takenName) >= $i(takeLimName))
+        return build_table_walk(lane, srcName, bindName, perElement, breakGuard, at)
+    }
+    def override wrap_invoke(var stmts : array<Expression?>; retType : TypeDeclPtr; wrapIter : bool; at : LineInfo) : Expression? {
+        // Const-accepting param: the source table is often a `let`, and a non-const source adds-const cleanly.
+        var tabType = strip_const_ref(clone_type(tabExpr._type))
+        tabType.flags.constant = true
+        var tabClone = clone_expression(tabExpr)
+        tabClone.genFlags.alwaysSafe = true
+        let sn = srcName
+        var emission : Expression?
+        if (retType != null) {
+            emission = qmacro(invoke($($i(sn) : $t(tabType)) : $t(retType) {
+                $b(stmts)
+            }, $e(tabClone)))
+        } else {
+            emission = qmacro(invoke($($i(sn) : $t(tabType)) {
+                $b(stmts)
+            }, $e(tabClone)))
+        }
+        emission = finalize_invoke(emission, at)
+        if (wrapIter) {
+            emission = qmacro($e(emission).to_sequence_move())
+            emission.force_generated(true)
+        }
+        return emission
+    }
+}
+
+// Recognize an `each_kv(tab)` / `keys(tab)` / `values(tab)` chain top. Returns the call (caller reads
+// arguments[0] = the table, `_type.firstType` = element); null otherwise. Name + table-typed-arg match,
+// like extract_json_source — the strong arg-type gate keeps an unrelated user `keys` from firing this.
+[macro_function]
+def extract_table_source(var top : Expression?) : ExprCall? {
+    if (top == null || !(top is ExprCall)) return null
+    var c = top as ExprCall
+    let name = get_call_short_name(c)
+    if ((name != "each_kv" && name != "keys" && name != "values")
+            || c._type == null || !c._type.isIterator || c._type.firstType == null
+            || (c.arguments |> length) != 1) {
+        return null
+    }
+    let srcT = c.arguments[0]._type
+    if (srcT == null || !srcT.isGoodTableType) return null
+    return c
+}
+
+// Drop plain `distinct` over raw keys/kv elements — table keys are unique by construction, so the whole
+// dedup-set machinery is a no-op. Only when every call BEFORE the distinct preserves element uniqueness
+// (filters/ranges/reorders — never a `select`, which reshapes elements). `distinct_by` keeps its own key.
+[macro_function]
+def drop_redundant_distinct(var calls : array<tuple<ExprCall?; LinqCall?>>) {
+    var dropIdx = -1
+    for (i in range(length(calls))) {
+        let name = calls[i]._1.name
+        if (name == "distinct") {
+            dropIdx = i
+            break
+        }
+        // uniqueness-preserving prefix ops only
+        if (name != "where_" && name != "skip" && name != "take"
+                && name != "skip_while" && name != "take_while" && name != "reverse") {
+            return
+        }
+    }
+    if (dropIdx < 0 || length(calls) <= 1) return   // keep a bare `distinct`-only chain for the generic lanes
+    calls |> erase(dropIdx)
+}
diff --git a/doc/source/reference/linq_fold_patterns.rst b/doc/source/reference/linq_fold_patterns.rst
index eb0db3dda..2d5bdc524 100644
--- a/doc/source/reference/linq_fold_patterns.rst
+++ b/doc/source/reference/linq_fold_patterns.rst
@@ -148,6 +148,9 @@ Source-side entry points
    * - ``unsafe(from_xml_node(node[, name], type<Row>))``
      - ``extract_xml_source`` (``XmlAdapter``, ``modules/dasPUGIXML/daslib/linq_fold_xml.das``)
      - Optional source — only when the ``pugixml`` module is linked (``require ?pugixml`` + ``static_if (typeinfo builtin_module_exists(pugixml))``). Emits an inlined DOM child-element walk replacing the generator, and **field-prunes** the per-element materialization (pass 2b): the chain body is scanned for the ``Row`` fields it reads, and only those attributes are read via ``read_xml_field`` into scalar locals — unread fields (notably ``string`` fields, whose ``clone_string`` is the alloc cost) are never touched, so a float-only chain runs alloc-free and JIT beats the equivalent SQLite query. A whole-row escape (``to_array`` / identity ``_select(_)`` / pass-to-fn) routes to the full ``build_xml_row`` instead. The ``XmlAdapter`` **rides every pattern row** (``try_splice_patterns`` runs with no ``onlyRow`` restriction); per-row ``requires`` predicates and the adapter's capability hooks (``can_join`` / ``can_group_by`` / ``defers_materialization`` / the ``non_array_source`` gate) decide what fuses, and a shape it can't fuse cascades to tier-2 — see :ref:`linq_fold_xml_patterns` for the full fuse/defer breakdown. ``unsafe`` is required (the source is ``[unsafe_outside_of_for]``) and the node is passed by value (``var root`` — ``_fold``'s macro-arg inference skips the const&→value copy).
+   * - ``unsafe(each_kv(tab))`` / ``keys(tab)`` / ``values(tab)``
+     - ``extract_table_source`` (``TableAdapter``, ``daslib/linq_fold_table.das``)
+     - In-tree source — recognized by name **plus** a table-typed argument (``table<K;V>`` / ``table<K>``), so an unrelated user ``keys`` never fires it. The kv lane (``each_kv``) binds ``kv.key`` / ``kv.value`` and **usage-prunes the walk**: a chain touching only ``.value`` walks ``values(tab)`` alone, only ``.key`` (or neither) walks ``keys(tab)`` alone — half the slot-skip work of the zipped two-iterator form, which is emitted only when both sides (or the whole pair) are read. A whole-pair escape binds a named-tuple copy, so the kv lane fuses **copyable value types only** — a non-copyable-valued ``each_kv`` falls through and the surviving instantiation concept-asserts (error 31400; the keys/values lanes still fuse such tables). Bare ``count()`` / ``long_count()`` folds to O(1) ``length(tab)``; a plain ``distinct`` over raw keys/kv elements is **dropped** before matching (keys are unique by construction; only uniqueness-preserving prefix ops allow the drop — a preceding ``select`` keeps the distinct, and the values lane always keeps it). ``order_by`` / ``take`` / ``first`` observe the table's unspecified slot order, exactly like a hand ``for (k, v in keys(t), values(t))`` loop. ``can_join`` / ``can_group_by`` are off and reverse has no backward slot walk — those shapes cascade to tier-2 (the join probe and key-lookup folds are staged: see ``benchmarks/sql/LINQ_TO_TABLE.md``). ``unsafe`` is required at an unfused chain head (the sources are ``[unsafe_outside_of_for]``); fused chains rewrite the head before inference.
    * - ``unsafe(from_json(jv, type<Row>))``
      - ``extract_json_source`` (``JsonAdapter``, ``daslib/linq_fold_json.das``)
      - In-tree source — the adapter is compiled in unconditionally (no ``static_if`` gate, unlike XML's pugixml one), but a program only pulls JSON into scope by requiring ``json`` / ``json_boost`` itself. ``extract_json_source`` matches a ``from_json`` whose first argument is a ``json::JsonValue?``, so a JSON-less program returns null and the chain falls to the array tier. The adapter pulls in **no** json dependency — it emits ``from_json`` / ``read_json_field`` by name (resolved at the user's splice site, like ``linq_fold_decs`` emits ``for_each_archetype``; ``from_JV`` is emitted only for a non-struct element type). Emits an inlined ``for (e in jv.value as _array)`` walk replacing the generator, and **field-prunes** the per-element materialization (pass 2b): only the keys the chain reads are pulled via ``read_json_field`` by name — unread keys (notably ``string`` fields whose materialization clones) are never touched, so a scalar-only chain skips ~all of the full per-row build (3.6× over the full materialize — see ``benchmarks/micro/json_source_shapes.das``). A whole-row escape reads **every** top-level field by name (``emit_full_row_by_name``), so a custom whole-row ``from_JV(Row)`` override is **not** honored (Option B — this is a flat query source, not a deserializer; materialize the array with an explicit ``from_JV`` first for that). ``unsafe`` is required (the source is ``[unsafe_outside_of_for]``). Deferred materialization mirrors XML: order/distinct/take buffer a cheap ``(orderKey, JsonValue?)`` surrogate and materialize only the K survivors — by name (``emit_full_row_by_name``), so a struct survivor reads each field by key; only a non-struct ``Row`` falls back to ``outBind <- from_JV(handle, type<Row>)``. The ``JsonAdapter`` also fuses ``join`` / ``join |> group_by`` (``emit_join_hook`` + ``JsonJoinAdapter`` off ``build_group_by_adapter``'s upstream-join arm), reusing the array-join machinery (``build_join_standalone_pieces`` / ``build_join_adapter_pieces``): srcB is collected into a ``table<KEY; array<TUPB>>`` and the field-pruned array walk is the probe side, so the join key reads only its own field per element (e.g. ``read_json_field(jcur, "brand", …)``). Standalone ``group_join`` and a trailing ``where`` / ``select`` / ``count`` over group-join rows defer to tier-2, mirroring XML.
diff --git a/tests/linq/test_linq_table_source.das b/tests/linq/test_linq_table_source.das
index e85407216..bcbfef726 100644
--- a/tests/linq/test_linq_table_source.das
+++ b/tests/linq/test_linq_table_source.das
@@ -2,10 +2,211 @@ options gen2
 
 require dastest/testing_boost public
 require daslib/linq_boost
+require strings
 
-// Tier-2 LINQ over a table source via each_kv (the fused TableAdapter lands separately).
-// each_kv is [unsafe_outside_of_for], so a chain head needs the explicit unsafe(...) wrap —
-// same contract as each(arr) outside a fused chain.
+// Table source (each_kv / keys / values) through _fold — fused TableAdapter lanes must agree with
+// hand loops over the same table. Slot order is unspecified but stable per table instance, so
+// order-sensitive expectations compare against a keys()/values() walk of the same table.
+
+struct Pt {
+    x : int
+    y : int
+}
+
+def make_int_table(n : int) : table<int; int> {
+    var t <- { for (i in range(n)); i => i * 10 }
+    return <- t
+}
+
+def make_pt_table : table<string; Pt> {
+    var t : table<string; Pt>
+    t |> insert("a", Pt(x = 1, y = 10))
+    t |> insert("b", Pt(x = 2, y = 20))
+    t |> insert("c", Pt(x = 3, y = 30))
+    t |> insert("d", Pt(x = 4, y = 40))
+    return <- t
+}
+
+[test]
+def test_table_fold_count_shortcuts(t : T?) {
+    t |> run("bare count is length") @(t : T?) {
+        var tab <- make_int_table(10)
+        t |> equal(_fold(each_kv(tab).count()), 10)
+        t |> equal(_fold(keys(tab).count()), 10)
+        t |> equal(_fold(values(tab).count()), 10)
+        t |> equal(_fold(each_kv(tab).long_count()), 10l)
+        delete tab
+    }
+    t |> run("count with predicate walks") @(t : T?) {
+        var tab <- make_int_table(10)
+        t |> equal(_fold(each_kv(tab)._where(_.key % 2 == 0).count()), 5)
+        t |> equal(_fold(each_kv(tab)._where(_.value > 50).count()), 4)
+        delete tab
+    }
+    t |> run("empty table") @(t : T?) {
+        let e : table<int; int>
+        t |> equal(_fold(each_kv(e).count()), 0)
+        t |> equal(_fold(each_kv(e).any()), false)
+        t |> equal(_fold(each_kv(e)._where(_.key > 0).count()), 0)
+    }
+}
+
+[test]
+def test_table_fold_accumulators(t : T?) {
+    t |> run("sum/min/max over values and keys") @(t : T?) {
+        var tab <- make_int_table(10)
+        t |> equal(_fold(values(tab).sum()), 450)
+        t |> equal(_fold(keys(tab).sum()), 45)
+        t |> equal(_fold(each_kv(tab)._select(_.value).sum()), 450)
+        t |> equal(_fold(each_kv(tab)._select(_.key).min()), 0)
+        t |> equal(_fold(each_kv(tab)._select(_.value).max()), 90)
+        // body touches both sides — zipped walk
+        t |> equal(_fold(each_kv(tab)._select(_.key + _.value).sum()), 495)
+        delete tab
+    }
+    t |> run("early exit: any/all/contains") @(t : T?) {
+        var tab <- make_int_table(10)
+        t |> equal(_fold(each_kv(tab).any()), true)
+        t |> equal(_fold(each_kv(tab)._where(_.value > 80).any()), true)
+        t |> equal(_fold(each_kv(tab)._where(_.value > 90).any()), false)
+        t |> equal(_fold(each_kv(tab)._select(_.key)._all(_ >= 0)), true)
+        t |> equal(_fold(values(tab).contains(40)), true)
+        t |> equal(_fold(values(tab).contains(41)), false)
+        delete tab
+    }
+}
+
+[test]
+def test_table_fold_to_array_agreement(t : T?) {
+    t |> run("kv where+select agrees with hand loop, in slot order") @(t : T?) {
+        var tab <- make_pt_table()
+        var expected : array<int>
+        for (_k, v in keys(tab), values(tab)) {
+            if (v.x > 1) {
+                expected |> push(v.y)
+            }
+        }
+        var got <- _fold(each_kv(tab)._where(_.value.x > 1)._select(_.value.y).to_array())
+        t |> equal(length(got), length(expected))
+        for (i in range(length(expected))) {
+            t |> equal(got[i], expected[i])
+        }
+        delete got
+        delete expected
+        delete tab
+    }
+    t |> run("keys to_array in slot order") @(t : T?) {
+        var tab <- make_pt_table()
+        var expected : array<string>
+        for (k in keys(tab)) {
+            expected |> push(k)
+        }
+        // bare `<src>.to_array()` is not a recognized chain (any source) — keep a where on it
+        var got <- _fold(keys(tab)._where(_ != "zzz").to_array())
+        t |> equal(length(got), length(expected))
+        for (i in range(length(expected))) {
+            t |> equal(got[i], expected[i])
+        }
+        delete got
+        delete expected
+        delete tab
+    }
+    t |> run("whole-kv escape: identity to_array") @(t : T?) {
+        var tab <- make_pt_table()
+        var got <- _fold(each_kv(tab)._where(_.value.x >= 3).to_array())
+        t |> equal(length(got), 2)
+        for (kv in got) {
+            let expectedPt = tab?[kv.key] ?? Pt()
+            t |> equal(expectedPt.x, kv.value.x)
+            t |> equal(expectedPt.y, kv.value.y)
+        }
+        delete got
+        delete tab
+    }
+}
+
+[test]
+def test_table_fold_order_distinct_take(t : T?) {
+    t |> run("order_by key descending") @(t : T?) {
+        var tab <- make_int_table(10)
+        var got <- _fold(each_kv(tab)._select(_.key).order_by_descending(@(k : int) => k).to_array())
+        t |> equal(length(got), 10)
+        for (i in range(10)) {
+            t |> equal(got[i], 9 - i)
+        }
+        delete got
+        delete tab
+    }
+    t |> run("redundant distinct over keys/kv is dropped but correct") @(t : T?) {
+        var tab <- make_int_table(10)
+        t |> equal(_fold(each_kv(tab).distinct().count()), 10)
+        t |> equal(_fold(each_kv(tab)._where(_.key > 4).distinct().count()), 5)
+        t |> equal(_fold(keys(tab).distinct().count()), 10)
+        delete tab
+    }
+    t |> run("values distinct stays real") @(t : T?) {
+        var dup : table<int; int>
+        dup |> insert(1, 7)
+        dup |> insert(2, 7)
+        dup |> insert(3, 8)
+        t |> equal(_fold(values(dup).distinct().count()), 2)
+        t |> equal(_fold(each_kv(dup)._select(_.value).distinct().count()), 2)
+        delete dup
+    }
+    t |> run("take/skip ride the walk") @(t : T?) {
+        var tab <- make_int_table(10)
+        t |> equal(_fold(each_kv(tab)._select(_.value).take(3).count()), 3)
+        t |> equal(_fold(each_kv(tab)._select(_.key).skip(4).count()), 6)
+        var firstTwo <- _fold(keys(tab).take(2).to_array())
+        var expected : array<int>
+        for (k in keys(tab)) {
+            if (length(expected) < 2) {
+                expected |> push(k)
+            }
+        }
+        t |> equal(length(firstTwo), 2)
+        t |> equal(firstTwo[0], expected[0])
+        t |> equal(firstTwo[1], expected[1])
+        delete firstTwo
+        delete expected
+        delete tab
+    }
+    t |> run("first / first_or_default") @(t : T?) {
+        var tab <- make_int_table(10)
+        t |> equal(_fold(each_kv(tab)._where(_.key == 7)._select(_.value).first()), 70)
+        t |> equal(_fold(each_kv(tab)._where(_.key == 99)._select(_.value).first_or_default(-1)), -1)
+        delete tab
+    }
+}
+
+[test]
+def test_table_fold_iterator_result(t : T?) {
+    t |> run("chain consumed as iterator") @(t : T?) {
+        var tab <- make_int_table(6)
+        var s = 0
+        for (v in _fold(each_kv(tab)._where(_.key % 2 == 0)._select(_.value))) {
+            s += v
+        }
+        t |> equal(s, 0 + 20 + 40)
+        delete tab
+    }
+}
+
+[test]
+def test_table_fold_set_form(t : T?) {
+    t |> run("keys over table<K> set") @(t : T?) {
+        var s : table<string>
+        s |> insert("alpha")
+        s |> insert("beta")
+        s |> insert("gamma")
+        t |> equal(_fold(keys(s).count()), 3)
+        t |> equal(_fold(keys(s)._where(_ |> length() > 4).count()), 2)
+        delete s
+    }
+}
+
+// Tier-2 over the raw each_kv iterator (no _fold) — the [unsafe_outside_of_for] contract requires the
+// explicit unsafe(...) wrap at a bare chain head; fused chains rewrite the head before inference.
 
 [test]
 def test_each_kv_tier2(t : T?) {

From c00f655c29c16fd00a1b5e8fc40e7295ed7ebff3 Mon Sep 17 00:00:00 2001
From: Boris Batkin <bbatkin@gmail.com>
Date: Thu, 11 Jun 2026 00:10:11 -0700
Subject: [PATCH 04/11] linq-table arc: link #3096 (qmacro multi-source for $i
 limitation) in the plan doc

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 benchmarks/sql/LINQ_TO_TABLE.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/benchmarks/sql/LINQ_TO_TABLE.md b/benchmarks/sql/LINQ_TO_TABLE.md
index 72acd2be6..92483eae5 100644
--- a/benchmarks/sql/LINQ_TO_TABLE.md
+++ b/benchmarks/sql/LINQ_TO_TABLE.md
@@ -14,6 +14,8 @@ Stage 2 findings:
   the tier-2 cells stages 4–5 erase.
 - The qmacro grammar only allows `$i()` in the FIRST iterator slot of a multi-source `for` — the
   kv zip header uses literal `_tab_kv_key_` / `_tab_kv_value_` names (ZipAdapter's itA/itB trade).
+  Filed as [#3096](https://github.com/GaijinEntertainment/daScript/issues/3096) (grammar fix
+  and/or a templates_boost loop-builder helper).
 - `keys()` yields NON-const elements (writable temp copies) — the engine-visible bind is a `let`
   rebind (workhorse copy, free); push_clone's `==const` composition needs it.
 - `keys`/`each_kv` spell their element `-const` (iterator variance); the dispatcher clears

From 29d23baf6dada5ce73c4638048eb799b961d014f Mon Sep 17 00:00:00 2001
From: Boris Batkin <bbatkin@gmail.com>
Date: Thu, 11 Jun 2026 00:54:21 -0700
Subject: [PATCH 05/11] =?UTF-8?q?linq=5Fdas:=20tables=20as=20%linq!=20sour?=
 =?UTF-8?q?ces=20=E2=80=94=20untyped=20`from`=20dispatches=20via=201-arg?=
 =?UTF-8?q?=20from=5Fin?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

`from kv in tab` over table<K;V> → each_kv (kv.key/kv.value), table<K> set →
keys, anything else → each (arrays unchanged, ast-verified identical emission).
The reader can't tell an array from a table, so every untyped fused source now
emits `from_in(src)` and FromInMacro dispatches by the inferred value type.

FromInMacro rejects switch from `return call` to macro_error + return null (the
_sql idiom) — returning the call report-ast-changes every pass and churns to the
50-pass infer cap (30507). The not-inferred arm also gates on isAutoOrAlias and
doubles as the defer for local sources whose type settles a pass later.

Joins over tables already work on either side at tier-2 (tested both ways);
cross/SelectMany over tables stays a named deferred edge in LINQ_TO_TABLE.md.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 benchmarks/sql/LINQ_TO_TABLE.md      |  29 ++++++-
 daslib/linq_das.das                  |  81 +++++++++++++------
 doc/source/reference/linq_das.rst    |  54 +++++++++----
 tests/linq/failed_linq_das_table.das |  29 +++++++
 tests/linq/test_linq_das.das         | 115 +++++++++++++++++++++++++++
 5 files changed, 267 insertions(+), 41 deletions(-)
 create mode 100644 tests/linq/failed_linq_das_table.das

diff --git a/benchmarks/sql/LINQ_TO_TABLE.md b/benchmarks/sql/LINQ_TO_TABLE.md
index 92483eae5..10a62d34b 100644
--- a/benchmarks/sql/LINQ_TO_TABLE.md
+++ b/benchmarks/sql/LINQ_TO_TABLE.md
@@ -4,7 +4,28 @@ Sibling of [LINQ.md](LINQ.md) / [LINQ_TO_DECS.md](LINQ_TO_DECS.md). Plan of reco
 `table<K;V>` / `table<K>` as the 6th `_fold` source, plus the `to_table` sink.
 Edited in-place as PRs land.
 
-Status: **stage 2 committed** (TableAdapter + m7; stage 1 = `each_kv` builtin, 8751bb9ba).
+Status: **stage 3 committed** (`%linq!` table sources; stage 2 = TableAdapter + m7, 571fe879e;
+stage 1 = `each_kv` builtin, 8751bb9ba).
+
+Stage 3 findings:
+- The untyped `from c in <src>` now emits the **1-arg `from_in(src)`** for every source (the reader
+  can't tell an array from a table); FromInMacro dispatches at infer time — `table<K;V>` →
+  `each_kv`, `table<K>` set → `keys`, anything else → `each` (arrays land on the identical fused
+  emission as before, ast-verified). The `unsafe($c(...))` qmacro form puts `alwaysSafe` on the call
+  itself (templates_boost `carry_tag_safe_flags`), so extractors/peel_each still see a bare ExprCall.
+- **Call-macro reject/defer idiom**: macro_error + `return null` (the `_sql` idiom). Returning
+  `call` after an error report-ast-changes every pass and churns to the 50-pass infer cap (30507).
+  All FromInMacro rejects switched to null; the "source type is not inferred" arm (now also gated on
+  `isAutoOrAlias`) doubles as the DEFER — errors clear per pass while other inference progresses
+  (a local `var tab <- {...}` source reaches the visit before its own type settles), and only stick
+  if the source genuinely never infers.
+- A rejecting `from_in` leaves the chain head unresolved, so `_fold`'s "expecting linq expression"
+  verify lands on the same generated line with the same cerr — the error report collapses the pair
+  to ONE 50503 (`+1 more on this line`); failed-test `expect` counts are post-collapse.
+- **Joins over tables already work on either side** (tier-2; the kv pair is that side's row) —
+  tested both directions. Stage 5's probe will optimize the table-as-srcB case.
+- The non-copyable-value gate composes through the reader unchanged: fused dispatch declines,
+  tier-2 instantiates the real `each_kv`, one clean 31400.
 
 Stage 2 findings:
 - m7 INTERP profile (2026-06-10 sweep): pruned scans sit between array and XML — `sum_aggregate`
@@ -120,6 +141,12 @@ End of arc: `skills/linq.md` + linq docs mention the table source.
 
 ## Deferred edges (named, not built)
 
+- **Multiple-`from` (cross / SelectMany) over tables**: the unfused `_cross_join` arm passes the
+  bare source text so the array×array overload resolves without an `each` unsafe trip; a table
+  there has no overload (confusing 30303 cascade). `cross_join` has iterator overloads, so routing
+  the unfused untyped sources through `from_in` would work — but it changes overload selection for
+  every existing untyped array cross query. Documented as unsupported (join a table instead);
+  revisit on demand.
 - **Key-as-handle deferred materialization**: for `order_by` over kv with large (copyable)
   values, buffer `(orderKey, key)` surrogates and materialize survivors via `tab?[key]` — K
   probes instead of N value copies. The table handle is its key; clean fit for the existing
diff --git a/daslib/linq_das.das b/daslib/linq_das.das
index b0fa73a79..4162b6f2e 100644
--- a/daslib/linq_das.das
+++ b/daslib/linq_das.das
@@ -14,17 +14,21 @@ require daslib/linq_boost public
 // The `%linq! … %%` inline reader macro rewrites a C#-like query into a `_fold(...)` chain:
 //
 //     %linq! from c in cars where c.price > 100 orderby c.price select c.name %%
-//        →   ( _fold( each(cars) |> _where($(c) => c.price > 100) |> _order_by($(c) => c.price) |> _select($(c) => c.name) |> to_array() ) )
+//        →   ( _fold( from_in(cars) |> _where($(c) => c.price > 100) |> _order_by($(c) => c.price) |> _select($(c) => c.name) |> to_array() ) )
 //     %linq! from c in cars group c by c.brand %%
-//        →   ( _fold( each(cars) |> _group_by_lazy($(c) => c.brand) |> to_array() ) )
+//        →   ( _fold( from_in(cars) |> _group_by_lazy($(c) => c.brand) |> to_array() ) )
 //     %linq! from c in cars join d in dealers on c.brand equals d.brand select (N = c.name, City = d.city) %%
-//        →   ( _fold( each(cars) |> _join(each(dealers), $(c, d) => c.brand == d.brand, $(c, d) => (N = c.name, City = d.city)) |> to_array() ) )
+//        →   ( _fold( from_in(cars) |> _join(from_in(dealers), $(c, d) => c.brand == d.brand, $(c, d) => (N = c.name, City = d.city)) |> to_array() ) )
 //     %linq! from c in cars from d in dealers select (N = c.name, City = d.city) %%
 //        →   ( _fold( (cars) |> _cross_join((dealers), $(c, d) => (N = c.name, City = d.city)) |> to_array() ) )
 //
-// An untyped `from c in <arr>` is an array source. A typed `from c : Row in <src>` selects a non-array
-// source: `decs` (a keyword marker → `from_decs_template`), or any value whose type the `from_in` call
-// macro dispatches (a SQL runner → `select_from`, an XML node → `from_xml_node`). The range variable is
+// (`from_in(arr)` resolves during inference to `each(arr)` — see FromInMacro at the bottom.)
+//
+// An untyped `from c in <src>` is an array or table source — the 1-arg `from_in` call macro dispatches
+// by the value type (array → `each`, `table<K;V>` → `each_kv` with `kv.key`/`kv.value`, `table<K>` set →
+// `keys`). A typed `from c : Row in <src>` selects a row-typed source: `decs` (a keyword marker →
+// `from_decs_template`), or any value whose type the 2-arg `from_in` dispatches (a SQL runner →
+// `select_from`, an XML node → `from_xml_node`). The range variable is
 // spliced verbatim as the block parameter. `orderby <expr> [descending]` is a single sort key. `group c
 // by <key>` is a terminal yielding `tuple<key; array<elem>>` per bucket — IGrouping, in-memory sources
 // only (over SQL it errors: SQL needs an aggregating projection). A trailing `iterator` keyword yields an
@@ -234,13 +238,14 @@ def private strip_trailing_keyword(text : string; kw : string) : tuple<string; b
 }
 
 // Source builder for a clause. `fused` distinguishes a source that will be a fused `_fold` chain head /
-// fused-op source (single-`from`, `join` — keep `each` so the fusion picks the array up as a loop source)
-// from an UNFUSED operator argument (the uncorrelated multiple-`from`, whose `_cross_join` runs at tier-3
-// passthrough — pass the bare array so `each`'s `[unsafe_outside_of_for]` does not trip and the array×array
-// cross_join overload is selected). array (untyped) → each(...) / bare; decs (keyword marker) →
-// from_decs_template; any other typed value source → from_in (dispatches by the source value's type).
+// fused-op source (single-`from`, `join`) from an UNFUSED operator argument (the uncorrelated
+// multiple-`from`, whose `_cross_join` runs at tier-3 passthrough — pass the bare array so `each`'s
+// `[unsafe_outside_of_for]` does not trip and the array×array cross_join overload is selected). An
+// untyped fused source goes through the 1-arg `from_in` (the reader can't tell an array from a table;
+// the call macro dispatches by the inferred value type: table → each_kv/keys, anything else → each);
+// decs (keyword marker) → from_decs_template; a typed value source → 2-arg from_in.
 def private build_src(rowType : string; srcText : string; fused : bool) : string {
-    if (rowType == "") return fused ? "each({srcText})" : "({srcText})"
+    if (rowType == "") return fused ? "from_in({srcText})" : "({srcText})"
     if (srcText == "decs") return "from_decs_template(type<{rowType}>)"
     return "from_in({srcText}, type<{rowType}>)"
 }
@@ -1230,20 +1235,45 @@ def private transpile_query(query : string; prog : ProgramPtr; at : LineInfo) :
 
 [call_macro(name="from_in")]
 class FromInMacro : AstCallMacro {
-    //! Typed-source dispatcher for the C# query form `from c : Row in <src>`. Rewrites
-    //! `from_in(src, type<Row>)` to the concrete `_fold` source builder by `src`'s type — a SQL runner
-    //! → `select_from`, an XML node → `from_xml_node`, a JSON value → `from_json`. Must be a call macro:
-    //! a plain function would leave `from_in(...)` at the chain head, which `_fold`'s name-based source
+    //! Source dispatcher for the C# query `from` clause. The typed form `from c : Row in <src>` arrives
+    //! as `from_in(src, type<Row>)` and rewrites to the concrete `_fold` source builder by `src`'s type —
+    //! a SQL runner → `select_from`, an XML node → `from_xml_node`, a JSON value → `from_json`. The
+    //! UNTYPED form `from c in <src>` arrives as `from_in(src)` — a table → `each_kv` (`keys` for the
+    //! `table<K>` set form), anything else → `each` (the array path). Must be a call macro: a plain
+    //! function would leave `from_in(...)` at the chain head, which `_fold`'s name-based source
     //! detection cannot route.
     def override visit(prog : ProgramPtr; mod : Module?; var call : ExprCallMacro?) : ExpressionPtr {
-        if (length(call.arguments) != 2) {
-            macro_error(prog, call.at, "from_in(src, type<Row>): expected 2 arguments")
-            return call
+        // Every reject below is macro_error + return null (the `_sql` idiom): infer stabilizes and the
+        // error sticks. Returning `call` instead would report-ast-changed every pass and churn to the
+        // 50-pass cap (30507). The not-inferred arm doubles as the DEFER: errors clear per pass, so while
+        // other inference still progresses (e.g. a local `var tab <- {...}` source settling) the error is
+        // discarded and the macro re-runs; it only sticks if the source genuinely never infers.
+        if (length(call.arguments) != 1 && length(call.arguments) != 2) {
+            macro_error(prog, call.at, "from_in(src[, type<Row>]): expected 1 or 2 arguments")
+            return null
         }
         let srcT = call.arguments[0]._type
-        if (srcT == null) {
+        if (srcT == null || srcT.isAutoOrAlias) {
             macro_error(prog, call.at, "from_in: source type is not inferred")
-            return call
+            return null
+        }
+        if (length(call.arguments) == 1) {
+            // Untyped `from c in <src>` — table sources dispatch on the value type; anything else keeps
+            // the historical array emit (`each` carries its own diagnostics for non-iterable sources).
+            // `unsafe(...)` over a $c tag lands as alwaysSafe on the call itself ([unsafe_outside_of_for]
+            // heads are fine fused or unfused), and extractors/peel_each see the bare ExprCall.
+            if (srcT.isGoodTableType) {
+                if (srcT.secondType == null || srcT.secondType.baseType == Type.tVoid)
+                    return qmacro(unsafe($c("keys")($e(call.arguments[0]))))
+                return qmacro(unsafe($c("each_kv")($e(call.arguments[0]))))
+            }
+            return qmacro(unsafe($c("each")($e(call.arguments[0]))))
+        }
+        // A table source carries its row shape (tuple<key;value> / the key type) — the annotation has
+        // nothing to add and the typed builders below would all mis-fire. Reject early with the fix.
+        if (srcT.isGoodTableType) {
+            macro_error(prog, call.at, "linq: a table source takes no row-type annotation — write `from kv in <tab>` (kv.key / kv.value)")
+            return null
         }
         // SQL: db is a sqlite_boost::SqlRunner → select_from(db, type<Row>)
         if (srcT.structType != null && srcT.structType.name == "SqlRunner"
@@ -1260,8 +1290,8 @@ class FromInMacro : AstCallMacro {
                 && srcT.firstType.structType.name == "JsonValue"
                 && srcT.firstType.structType._module != null && srcT.firstType.structType._module.name == "json")
             return qmacro(unsafe($c("from_json")($e(call.arguments[0]), $e(call.arguments[1]))))
-        macro_error(prog, call.at, "linq: unsupported source for `from c : Row in <src>` — expected a SQL runner, an XML node, or a JSON value (use `in decs` for decs, or an array for the untyped `from c in <arr>` form)")
-        return call
+        macro_error(prog, call.at, "linq: unsupported source for `from c : Row in <src>` — expected a SQL runner, an XML node, or a JSON value (use `in decs` for decs; arrays and tables use the untyped `from c in <src>` form)")
+        return null
     }
 }
 
@@ -1269,8 +1299,9 @@ class FromInMacro : AstCallMacro {
 class LinqDasReader : AstReaderMacro {
     //! C#-style LINQ query reader macro.
     //! ``%linq! from c [: Row] in src [where <pred>] [ join d [: RowB] in B on <kA> equals <kB> | from d [: RowB] in B ] [orderby <expr> [descending]] ( select <proj> | group c by <key> ) [iterator] %%``
-    //! rewrites to a ``_fold(...)`` chain. Sources: array (untyped `from c in arr`), or a typed
-    //! ``from c : Row in src`` over decs / SQL / XML / JSON. ``orderby`` is a single sort key; ``group c by <key>``
+    //! rewrites to a ``_fold(...)`` chain. Sources: array or table (untyped `from c in src`, dispatched
+    //! by value type — a ``table<K;V>`` binds ``kv.key``/``kv.value`` pairs, a ``table<K>`` set binds
+    //! keys), or a typed ``from c : Row in src`` over decs / SQL / XML / JSON. ``orderby`` is a single sort key; ``group c by <key>``
     //! is a terminal yielding ``tuple<key; array<elem>>`` buckets (in-memory sources only). A second range
     //! variable comes from either a ``join`` (single inner equi-join) or a second ``from`` (SelectMany):
     //! uncorrelated (``from d in B``) is the cross product, emitting ``_cross_join`` (pushes down to a SQL
diff --git a/doc/source/reference/linq_das.rst b/doc/source/reference/linq_das.rst
index 183053e30..6d699c139 100644
--- a/doc/source/reference/linq_das.rst
+++ b/doc/source/reference/linq_das.rst
@@ -20,7 +20,10 @@ rewrites to, and is re-parsed in place as:
 
 .. code-block:: das
 
-    var names <- ( _fold( each(cars) |> _where($(c) => c.price > 100) |> _select($(c) => c.name) |> to_array() ) )
+    var names <- ( _fold( from_in(cars) |> _where($(c) => c.price > 100) |> _select($(c) => c.name) |> to_array() ) )
+
+where ``from_in(cars)`` resolves during inference to ``each(cars)`` (see
+`Sources`_ — for a table source it resolves to ``each_kv`` / ``keys`` instead).
 
 The macro lives in the lexer's inline reader-macro slot (``%name!``), so a
 query is an ordinary expression — it can be assigned, passed as an argument, or
@@ -45,7 +48,9 @@ between body clauses** — it is inlined away before the rest is parsed (see
 :ref:`linq_das_let`):
 
 - ``from <var> in <source>`` — the element bind ``<var>`` names the per-row
-  value. With no type annotation, ``<source>`` is an ``array<T>``.
+  value. With no type annotation, ``<source>`` is an ``array<T>`` or a table —
+  a ``table<K;V>`` binds read-only ``(key, value)`` pairs (``kv.key`` /
+  ``kv.value``), a ``table<K>`` set binds its keys.
 - ``let <name> = <expr>`` — optional, repeatable, and free to appear between any
   body clauses; binds a computed value reused in the clauses that follow it (see
   :ref:`linq_das_let`).
@@ -78,16 +83,23 @@ Clauses may span multiple lines inside the ``%linq! … %%`` body.
 Sources
 -------
 
-An **untyped** ``from c in <arr>`` is an array source. A **typed** range
-variable ``from c : Row in <src>`` selects a non-array source — the row type
-``Row`` is supplied on the range variable (C#-faithful ``from Type c in src``)
-because the source value alone does not carry it:
+An **untyped** ``from c in <src>`` is an array or table source, dispatched by
+the source value's type. A **typed** range variable ``from c : Row in <src>``
+selects a row-typed source — the row type ``Row`` is supplied on the range
+variable (C#-faithful ``from Type c in src``) because the source value alone
+does not carry it:
 
 .. code-block:: das
 
     // array (untyped) — `each(arr)`
     var a <- %linq! from c in cars where c.price > 100 select c.name %%
 
+    // table<K;V> (untyped) — `each_kv(tab)`: read-only (key, value) pairs, fused by the TableAdapter
+    var t <- %linq! from kv in carsById where kv.value.price > 100 select kv.value.name %%
+
+    // table<K> set (untyped) — `keys(s)`
+    var k <- %linq! from id in soldIds where id > 100 select id %%
+
     // decs — the `decs` keyword marker → `from_decs_template(type<CarComp>)`
     var d <- %linq! from c : CarComp in decs where c.price > 100 select c.name %%
 
@@ -100,12 +112,18 @@ because the source value alone does not carry it:
     // JSON — a JsonValue? array → `from_json`, fused by the JsonAdapter
     var j <- %linq! from c : Car in carsJson where c.price > 100 select c.name %%
 
-For value sources (SQL, XML, JSON) the reader emits
-``from_in(<src>, type<Row>)``; the ``from_in`` call macro dispatches on the
-source value's type to the concrete builder (so a new backend is a new
-``from_in`` branch, never a parser change). ``decs`` has no source value, so it
-is emitted directly as ``from_decs_template`` and never goes through
-``from_in``. The row type's required annotation depends on the source —
+Untyped sources go through the 1-arg ``from_in(<src>)``, typed value sources
+(SQL, XML, JSON) through ``from_in(<src>, type<Row>)``; the ``from_in`` call
+macro dispatches on the source value's type to the concrete builder (so a new
+backend is a new ``from_in`` branch, never a parser change). ``decs`` has no
+source value, so it is emitted directly as ``from_decs_template`` and never
+goes through ``from_in``. A table element is the ``each_kv`` named tuple
+``(key, value)`` — both fields are **copies** (read-only view), the value type
+must be copyable (a ``table<K; array<T>>`` source rejects at compile time —
+see :ref:`the table source row in linq_fold_patterns <linq_fold_patterns>`),
+and slot order is unspecified, so add an ``orderby`` when the result order
+matters. A table source takes no row-type annotation (its element shape comes
+from the table type itself). The row type's required annotation depends on the source —
 ``[decs_template]`` for decs, ``[sql_table]`` / ``[sql_view]`` for SQL, a plain
 struct for XML and JSON. The JSON source is a ``JsonValue?`` holding a JSON
 **array** of objects (``from c : Car in jv["cars"]`` descends into a nested
@@ -142,7 +160,7 @@ own ``_where`` filter, AND-folded in source order:
 
     // two predicates — both apply
     var names <- %linq! from c in cars where c.price > 100 where c.brand == "eco" select c.name %%
-    // expands to: _fold( each(cars) |> _where($(c) => c.price > 100) |> _where($(c) => c.brand == "eco") |> _select($(c) => c.name) |> to_array() )
+    // expands to: _fold( from_in(cars) |> _where($(c) => c.price > 100) |> _where($(c) => c.brand == "eco") |> _select($(c) => c.name) |> to_array() )
 
 Over a **SQL** source the predicates push down as one ANDed ``WHERE`` (a single
 statement, no intermediate materialize). On a two-source query (``join`` / second
@@ -339,8 +357,10 @@ Join
 
 ``join <var2> [ : <Row2> ] in <src2> on <keyA> equals <keyB>`` adds a single
 **inner equi-join** — one new range variable, one equality key. The second
-source is built exactly like the first (untyped → array, typed → the
-``from_in`` dispatch), so it may be a different kind of source than the left.
+source is built exactly like the first (untyped → array/table, typed → the
+``from_in`` dispatch), so it may be a different kind of source than the left —
+a table works on either side (its kv pair is that side's row, e.g.
+``on c.brand equals p.key``); note a table left source walks in slot order.
 
 The reader picks one of two emit shapes from the **post-join** clauses (it
 transpiles before type inference and cannot see the source, so it decides
@@ -460,6 +480,10 @@ subset. Both slots are repeatable (see :ref:`linq_das_filtering`):
 terminal carries ``(c, b)`` as a pair, in-memory only (same SQL boundary as
 ``join``).
 
+Table sources are **not supported** in a multiple-``from`` query (the
+cross/flatten arms are array-shaped) — ``join`` a table instead, or
+materialize it first.
+
 Correlated ``from`` (flatten)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/tests/linq/failed_linq_das_table.das b/tests/linq/failed_linq_das_table.das
new file mode 100644
index 000000000..9ba635ec2
--- /dev/null
+++ b/tests/linq/failed_linq_das_table.das
@@ -0,0 +1,29 @@
+options gen2
+
+// linq_das table-source rejections — INFER-level (FromInMacro), unlike the reader-level rejects in
+// failed_linq_das.das (so inference DOES run here, and `_fold`'s own "expecting linq expression"
+// verify lands next to a rejecting from_in's message — same line, same cerr, so the report collapses
+// the pair to ONE 50503 with a `+1 more on this line` suffix). A rejecting from_in arm is macro_error +
+// return null: infer stabilizes and the errors stick, with no 50-pass churn.
+expect 50503:1   // typed annotation over a table source (`from kv : Car in tab`), collapsed with the _fold verify
+expect 31400:1   // non-copyable value type: untyped from over table<int; array<int>> → each_kv concept_assert
+
+require daslib/linq_das
+require daslib/linq_fold
+
+struct Car {
+    name  : string
+    price : int
+}
+
+def trigger_table_typed_annotation() {
+    var tab : table<int; Car>
+    let x <- %linq! from kv : Car in tab select kv %%
+}
+
+def trigger_table_non_copyable_value() {
+    // the uniform can_copy gate: fused dispatch declines, tier-2 instantiates the real each_kv,
+    // its concept_assert carries the user-facing message
+    var tab : table<int; array<int>>
+    let x <- %linq! from kv in tab select kv.key %%
+}
diff --git a/tests/linq/test_linq_das.das b/tests/linq/test_linq_das.das
index 695173869..bdfce8c2b 100644
--- a/tests/linq/test_linq_das.das
+++ b/tests/linq/test_linq_das.das
@@ -195,6 +195,121 @@ def test_decs_iterator_output(t : T?) {
     t |> equal(length(names), 2, "decs iterator output")
 }
 
+// ===== table source (untyped `from kv in tab` — 1-arg from_in dispatches table<K;V> → each_kv,
+// ===== table<K> set → keys; kv.key / kv.value pair surface; slot order unspecified → order-insensitive checks) =====
+
+def private mk_car_tab() : table<int; Car> {
+    return <- {
+        1 => Car(name = "cheap", price = 50, brand = "eco"),
+        2 => Car(name = "mid", price = 150, brand = "eco"),
+        3 => Car(name = "lux", price = 300, brand = "lux")
+    }
+}
+
+[test]
+def test_table_kv_where_select(t : T?) {
+    let tab <- mk_car_tab()
+    let prices <- %linq! from kv in tab where kv.value.price > 100 select kv.value.price %%
+    t |> equal(length(prices), 2, "kv.value predicate filters, projection rides the values-pruned walk")
+    t |> equal(prices[0] + prices[1], 450, "mid + lux, slot-order-insensitive")
+}
+
+[test]
+def test_table_kv_key_only(t : T?) {
+    let tab <- mk_car_tab()
+    let ks <- %linq! from kv in tab where kv.key != 2 select kv.key %%
+    t |> equal(length(ks), 2, "key-only chain rides the keys-pruned walk")
+    t |> equal(ks[0] + ks[1], 4, "keys 1 + 3")
+}
+
+[test]
+def test_table_kv_identity_select(t : T?) {
+    let tab <- mk_car_tab()
+    let rows <- %linq! from kv in tab where kv.value.brand == "eco" select kv %%
+    t |> equal(length(rows), 2, "identity select returns (key, value) tuples")
+    t |> equal(rows[0].key + rows[1].key, 3, "eco keys 1 + 2")
+    t |> equal(rows[0].value.price + rows[1].value.price, 200, "eco prices 50 + 150")
+}
+
+[test]
+def test_table_kv_orderby(t : T?) {
+    let tab <- mk_car_tab()
+    // orderby makes the unspecified slot order deterministic
+    let names <- %linq! from kv in tab orderby kv.value.price descending select kv.value.name %%
+    t |> equal(length(names), 3)
+    t |> equal(names[0], "lux")
+    t |> equal(names[2], "cheap")
+}
+
+[test]
+def test_table_kv_group_by(t : T?) {
+    let tab <- mk_car_tab()
+    let buckets <- %linq! from kv in tab group kv by kv.value.brand %%
+    t |> equal(length(buckets), 2, "two brands")
+    for (b in buckets) {
+        t |> equal(length(b._1), b._0 == "eco" ? 2 : 1, "eco bucket has 2 cars, lux has 1")
+    }
+}
+
+[test]
+def test_table_kv_let(t : T?) {
+    let tab <- mk_car_tab()
+    let doubled <- %linq! from kv in tab let p = kv.value.price * 2 where p >= 300 select p %%
+    t |> equal(length(doubled), 2, "let binding inlines over the kv pair")
+    t |> equal(doubled[0] + doubled[1], 900, "2*150 + 2*300")
+}
+
+[test]
+def test_table_kv_iterator_output(t : T?) {
+    let tab <- mk_car_tab()
+    let got <- [for (nm in %linq! from kv in tab orderby kv.value.price select kv.value.name iterator %%); nm]
+    t |> equal(length(got), 3)
+    t |> equal(got[0], "cheap")
+}
+
+[test]
+def test_table_set_form(t : T?) {
+    var s : table<int> <- { 5, 7, 9 }
+    let big <- %linq! from k in s where k > 6 select k %%
+    t |> equal(length(big), 2, "table<K> set source rides keys()")
+    t |> equal(big[0] + big[1], 16, "7 + 9")
+}
+
+[test]
+def test_table_arbitrary_range_var_name(t : T?) {
+    let tab <- mk_car_tab()
+    // the kv pair name is the range variable — any identifier works
+    let names <- %linq! from entry in tab where entry.value.price > 200 select entry.value.name %%
+    t |> equal(length(names), 1)
+    t |> equal(names[0], "lux")
+}
+
+[test]
+def test_table_as_join_right_source(t : T?) {
+    // a table works on either side of a join (tier-2; the kv pair is that side's row).
+    // left side is the array → result follows array order, deterministic
+    let cars <- mk_cars()
+    let prio <- { "eco" => 10, "lux" => 99 }
+    let rows <- %linq! from c in cars join p in prio on c.brand equals p.key select "{c.name}={p.value}" %%
+    t |> equal(length(rows), 3)
+    t |> equal(rows[0], "cheap=10")
+    t |> equal(rows[1], "mid=10")
+    t |> equal(rows[2], "lux=99")
+}
+
+[test]
+def test_table_as_join_left_source(t : T?) {
+    let cars <- mk_cars()
+    let prio <- { "eco" => 10, "lux" => 99 }
+    // left side is the table → slot order, so sort before asserting
+    var rows <- %linq! from p in prio join c in cars on p.key equals c.brand select "{c.name}={p.value}" %%
+    rows |> sort()
+    t |> equal(length(rows), 3)
+    t |> equal(rows[0], "cheap=10")
+    t |> equal(rows[1], "lux=99")
+    t |> equal(rows[2], "mid=10")
+}
+
 // ===== orderby (single key, optional `descending`) =====
 
 [test]

From ac441c4a0e7b97951b6e9cd3057201de3384e8e8 Mon Sep 17 00:00:00 2001
From: Boris Batkin <bbatkin@gmail.com>
Date: Thu, 11 Jun 2026 01:27:14 -0700
Subject: [PATCH 06/11] =?UTF-8?q?linq=5Ffold:=20table=20point-lookup=20fol?=
 =?UTF-8?q?ds=20=E2=80=94=20where(kv.key=20=3D=3D=20X)=20+=20terminator=20?=
 =?UTF-8?q?=E2=86=92=20O(1)=20probe?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

try_table_point_lookup runs ahead of pattern dispatch in the table arm:
any / keys-lane contains → key_exists, count → key_exists ? 1 : 0,
first / first_or_default (± one trailing select) → an unsafe(tab?[X]) probe
with the scan's exact semantics (panic on missing first, eagerly-bound
default). Predicate-form any(p)/count(p) and either operand order match too.

X must be loop-invariant AND side-effect free — the scan evaluates X per
element, a probe once; a regression test pins per-element evaluation for an
impure X. Compound && predicates (incl. collapsed multi-where) decline the
probe; conjunct extraction is a named deferred edge in LINQ_TO_TABLE.md.

m7 INTERP: point_lookup 0.0 ns/elem vs point_lookup_scan 8.4 (the same query
forced through the walk); results.md re-swept.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 benchmarks/sql/LINQ_TO_TABLE.md             |  25 +-
 benchmarks/sql/results.md                   | 275 ++++++++++----------
 benchmarks/sql/table.das                    |  26 ++
 daslib/linq_fold.das                        |  17 +-
 daslib/linq_fold_table.das                  | 163 ++++++++++++
 doc/source/reference/linq_fold_patterns.rst |   2 +-
 tests/linq/test_linq_table_source.das       |  73 ++++++
 7 files changed, 438 insertions(+), 143 deletions(-)

diff --git a/benchmarks/sql/LINQ_TO_TABLE.md b/benchmarks/sql/LINQ_TO_TABLE.md
index 10a62d34b..77d3efdf1 100644
--- a/benchmarks/sql/LINQ_TO_TABLE.md
+++ b/benchmarks/sql/LINQ_TO_TABLE.md
@@ -4,8 +4,26 @@ Sibling of [LINQ.md](LINQ.md) / [LINQ_TO_DECS.md](LINQ_TO_DECS.md). Plan of reco
 `table<K;V>` / `table<K>` as the 6th `_fold` source, plus the `to_table` sink.
 Edited in-place as PRs land.
 
-Status: **stage 3 committed** (`%linq!` table sources; stage 2 = TableAdapter + m7, 571fe879e;
-stage 1 = `each_kv` builtin, 8751bb9ba).
+Status: **stage 4 committed** (point-lookup folds; stage 3 = `%linq!` table sources, 29d23baf6;
+stage 2 = TableAdapter + m7, 571fe879e; stage 1 = `each_kv` builtin, 8751bb9ba).
+
+Stage 4 findings:
+- `try_table_point_lookup` (linq_fold_table.das) runs in the dispatcher arm BEFORE pattern dispatch;
+  shapes per plan — where(key==X)+any/count/first/first_or_default(±select), predicate-form
+  any(p)/count(p), keys-lane contains — all emit through `TableAdapter.wrap_invoke` (probe inside
+  the same 1-param const-table invoke as the walks).
+- **Invariance alone is not enough**: X must also be side-effect free (`has_sideeffects`) — the scan
+  evaluates X per element, a probe once; an impure X (e.g. a counter bump) would change observable
+  behavior. Covered by a regression test asserting per-element evaluation is preserved.
+- Table safe-index `tab?[k]` is **unsafe** (31034 — the pointer dangles on rehash); the generated
+  probe wraps it (the invoke never mutates the table). Deref after the null check is plain `*p`.
+- Scan-semantics mirroring: `first` panics "sequence contains no elements"; `first_or_default`
+  binds its default eagerly before the probe (same order as the early-exit lane / linq.das).
+- `collapse_chained_wheres` runs before dispatch, so `where(key==X)|>where(p)` arrives as one
+  `&&` body → correctly declined (compound predicates keep the scan). Conjunct extraction
+  (probe + residual predicate on the probed element) is a named deferred edge below.
+- m7 INTERP (2026-06-11 sweep): `point_lookup` 0.0 ns/elem (O(1) probe) vs `point_lookup_scan`
+  (the same query forced through the walk via a second always-true where) at full scan cost.
 
 Stage 3 findings:
 - The untyped `from c in <src>` now emits the **1-arg `from_in(src)`** for every source (the reader
@@ -141,6 +159,9 @@ End of arc: `skills/linq.md` + linq docs mention the table source.
 
 ## Deferred edges (named, not built)
 
+- **Point-lookup conjunct extraction**: `where(kv.key == X && <residual>)` (incl. the collapsed
+  multi-where form) could probe and evaluate the residual on the probed element only. The matcher
+  currently declines compound predicates; add when a real chain wants it.
 - **Multiple-`from` (cross / SelectMany) over tables**: the unfused `_cross_join` arm passes the
   bare source text so the array×array overload resolves without an `each` unsafe trip; a table
   there has no overload (confusing 30303 cascade). `cross_join` has iterator overloads, so routing
diff --git a/benchmarks/sql/results.md b/benchmarks/sql/results.md
index 519254b0b..aede8f85a 100644
--- a/benchmarks/sql/results.md
+++ b/benchmarks/sql/results.md
@@ -16,7 +16,9 @@ are stable now).
 - **m5f XML** — `_fold` over `from_xml_node(root, type<Car>)` (`XmlAdapter` fuses + field-prunes).
 - **m6f JSON** — `_fold` over `from_json(jv, type<Car>)` (`JsonAdapter`, same machinery, array walk).
 - **m7 Table** — `_fold` over `each_kv(table<int; Car>)` (`TableAdapter`; kv usage-pruning picks keys-only /
-  values-only / zipped slot walks; group_by / join / reverse defer to tier-2 until their stages land).
+  values-only / zipped slot walks; key-equality `where` + terminator folds to an O(1) probe — the
+  `point_lookup` / `point_lookup_scan` pair measures it; group_by / join / reverse defer to tier-2
+  until their stages land).
 
 `0.00` = early-exit terminator below timer resolution ("free"). Chain shapes are in
 `benchmarks/README.md`; the splice arms each fires are in `doc/source/reference/linq_fold_patterns.rst`.
@@ -25,169 +27,173 @@ are stable now).
 signal, JIT deltas as indicative.**
 
 <!-- BENCH:TABLES BEGIN -->
-*Generated 2026-06-10 by `benchmarks/sql/_update_results.das` — ns/op; `—` = absent lane. Edit the prose around the markers, not the tables.*
+*Generated 2026-06-11 by `benchmarks/sql/_update_results.das` — ns/op; `—` = absent lane. Edit the prose around the markers, not the tables.*
 
 ## INTERP
 
 | Benchmark | SQL (m1) | Array (m3f) | Decs (m4) | XML fold (m5f) | JSON fold (m6f) | Table fold (m7) |
 |---|---:|---:|---:|---:|---:|---:|
-| `aggregate_match` | 34.7 | 5.9 | 5.8 | 60.1 | 152.3 | 19.0 |
-| `all_match` | 27.3 | 3.5 | 3.4 | 55.6 | 147.0 | 15.8 |
+| `aggregate_match` | 34.9 | 5.9 | 5.8 | 60.7 | 160.3 | 19.1 |
+| `all_match` | 27.5 | 3.5 | 3.4 | 55.9 | 154.1 | 15.8 |
 | `any_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
-| `average_aggregate` | 29.8 | 5.9 | 8.8 | 58.3 | 156.2 | 17.2 |
+| `average_aggregate` | 30.5 | 5.9 | 8.8 | 60.2 | 163.1 | 17.3 |
 | `bare_last` | — | 4.2 | 0.0 | 0.0 | 4.2 | 29.2 |
-| `bare_order_where` | 277.1 | 118.1 | 126.8 | 300.9 | 292.2 | 166.4 |
-| `chained_select_collapse` | — | 17.7 | 17.4 | 70.1 | 155.4 | 27.8 |
-| `chained_where` | 35.8 | 6.6 | 7.1 | 104.2 | 174.7 | 24.1 |
-| `contains_match` | 0.0 | 2.2 | 1.4 | 27.5 | 68.5 | 6.6 |
-| `count_aggregate` | 29.2 | 4.1 | 4.1 | 63.4 | 147.5 | 20.2 |
-| `cross_join` | 13122.7 | 3685.9 | — | 3995.6 | 4066.2 | — |
+| `bare_order_where` | 278.2 | 117.7 | 126.7 | 299.6 | 292.7 | 166.4 |
+| `chained_select_collapse` | — | 17.7 | 17.4 | 70.4 | 168.3 | 27.8 |
+| `chained_where` | 35.9 | 6.6 | 7.1 | 104.9 | 184.0 | 24.1 |
+| `contains_match` | 0.0 | 2.3 | 1.5 | 29.1 | 72.4 | 6.6 |
+| `count_aggregate` | 30.0 | 4.1 | 4.2 | 63.7 | 155.2 | 20.2 |
+| `cross_join` | 12604.3 | 3685.2 | — | 4006.6 | 4040.5 | — |
 | `decs_count_bare_pred` | — | — | 4.1 | — | — | — |
-| `distinct_by_count` | 40.8 | 15.6 | 15.6 | 70.2 | 154.0 | 26.4 |
-| `distinct_by_order_take` | 240.7 | 22.1 | 23.4 | 122.7 | 161.6 | 48.5 |
-| `distinct_by_order_to_array` | 239.2 | 22.2 | 23.5 | 123.6 | 161.7 | 48.4 |
-| `distinct_count` | 40.7 | 15.9 | 15.7 | 70.5 | 155.8 | 26.9 |
-| `distinct_count_pred` | 251.0 | 16.1 | 15.8 | 111.5 | 178.0 | 26.3 |
+| `distinct_by_count` | 40.9 | 15.6 | 15.6 | 70.6 | 162.2 | 26.3 |
+| `distinct_by_order_take` | 239.3 | 22.1 | 23.4 | 123.3 | 162.4 | 48.6 |
+| `distinct_by_order_to_array` | 237.8 | 22.1 | 23.5 | 124.1 | 163.3 | 48.6 |
+| `distinct_count` | 41.2 | 15.8 | 15.7 | 70.8 | 163.6 | 26.9 |
+| `distinct_count_pred` | 252.2 | 15.7 | 15.9 | 112.1 | 178.4 | 26.3 |
 | `distinct_take` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
 | `element_at_match` | 0.0 | 0.0 | 0.0 | 0.4 | 0.3 | 0.0 |
 | `first_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
 | `first_or_default_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
-| `groupby_average` | 173.3 | 29.3 | 29.3 | 122.9 | 190.0 | — |
-| `groupby_count` | 143.5 | 19.4 | 19.4 | 75.4 | 161.0 | 162.6 |
-| `groupby_first` | 251.7 | 19.5 | 20.1 | 72.1 | 156.9 | — |
-| `groupby_having_count` | 140.7 | 19.5 | 19.5 | 74.7 | 161.2 | — |
-| `groupby_having_hidden_sum` | 176.1 | 22.5 | 22.6 | 118.0 | 183.5 | — |
-| `groupby_having_post_where` | 172.8 | 20.8 | 20.8 | 114.1 | 180.4 | — |
-| `groupby_max` | 173.5 | 24.8 | 25.3 | 119.7 | 185.2 | — |
-| `groupby_min` | 173.8 | 25.2 | 25.1 | 119.8 | 184.7 | — |
-| `groupby_multi_reducer` | 189.5 | 30.5 | 30.6 | 124.3 | 188.4 | — |
-| `groupby_select_order` | 169.9 | 20.8 | 20.8 | 114.3 | 180.9 | — |
-| `groupby_select_sum` | 196.9 | 38.6 | 38.1 | 101.6 | 186.6 | — |
-| `groupby_sum` | 170.5 | 21.2 | 20.8 | 114.4 | 180.2 | 192.8 |
-| `groupby_where_count` | 75.6 | 14.1 | 14.3 | 115.2 | 177.8 | — |
-| `groupby_where_sum` | 86.4 | 14.1 | 14.6 | 116.2 | 178.1 | — |
-| `join_count` | 38.0 | 51.2 | 64.2 | 112.7 | 176.9 | 195.0 |
-| `join_groupby_count` | 157.7 | 86.1 | 88.2 | 177.4 | 221.8 | — |
-| `join_groupby_to_array` | 194.9 | 80.3 | 91.7 | 214.8 | 212.1 | — |
-| `join_select` | 150.3 | 72.4 | 84.4 | 187.8 | 209.0 | — |
-| `join_where_count` | 39.0 | 61.6 | 76.7 | 159.8 | 193.6 | 229.1 |
-| `last_match` | 0.0 | 5.9 | 13.9 | 64.9 | 152.3 | 31.0 |
-| `long_count_aggregate` | 28.7 | 4.1 | 4.1 | 63.3 | 147.5 | 20.3 |
-| `max_aggregate` | 30.6 | 6.0 | 6.8 | 58.4 | 156.1 | 17.0 |
-| `min_aggregate` | 30.5 | 6.0 | 6.8 | 58.4 | 155.1 | 17.0 |
-| `order_by_multi_key` | 338.7 | 272.3 | 286.1 | 457.7 | 448.2 | 333.0 |
-| `order_distinct_take` | 138.4 | 15.9 | 99.2 | 72.4 | 156.5 | 31.0 |
-| `order_reverse_normalized` | 37.9 | 16.3 | 20.0 | 70.4 | 162.9 | — |
-| `order_take_desc` | 37.8 | 16.3 | 20.3 | 69.8 | 163.3 | 33.2 |
-| `reverse_distinct_by` | 294.1 | 21.2 | 28.0 | 70.8 | 155.4 | — |
-| `reverse_take` | 0.1 | 0.0 | 0.2 | 0.0 | 26.1 | 58.7 |
-| `reverse_take_select` | 0.0 | 0.0 | 0.2 | 0.0 | 26.1 | — |
-| `select_count` | 0.1 | 0.0 | 2.2 | 64.8 | 2.2 | 0.0 |
-| `select_many` | — | 191.0 | — | — | — | — |
-| `select_where` | 194.7 | 11.5 | 19.3 | 195.9 | 185.7 | 37.5 |
-| `select_where_count` | 32.3 | 5.1 | 7.4 | 64.6 | 150.7 | 21.8 |
-| `select_where_order_take` | 36.2 | 12.2 | 15.0 | 72.3 | 158.5 | 34.4 |
-| `select_where_sum` | 37.1 | 7.5 | 7.5 | 66.3 | 160.5 | 23.2 |
-| `single_match` | 0.0 | 2.9 | 5.5 | 56.9 | 151.1 | 22.8 |
+| `groupby_average` | 170.5 | 29.3 | 29.3 | 122.7 | 197.8 | — |
+| `groupby_count` | 141.6 | 19.5 | 19.4 | 74.7 | 169.0 | 163.3 |
+| `groupby_first` | 252.3 | 19.4 | 20.1 | 71.8 | 163.5 | — |
+| `groupby_having_count` | 141.3 | 19.5 | 19.5 | 74.3 | 169.3 | — |
+| `groupby_having_hidden_sum` | 175.7 | 22.4 | 22.6 | 118.5 | 192.1 | — |
+| `groupby_having_post_where` | 171.6 | 20.8 | 21.6 | 114.8 | 188.9 | — |
+| `groupby_max` | 174.1 | 24.7 | 25.6 | 119.8 | 192.6 | — |
+| `groupby_min` | 173.5 | 25.1 | 26.2 | 119.9 | 193.4 | — |
+| `groupby_multi_reducer` | 189.9 | 30.2 | 30.6 | 125.1 | 196.0 | — |
+| `groupby_select_order` | 174.3 | 20.8 | 20.8 | 114.6 | 189.8 | — |
+| `groupby_select_sum` | 197.9 | 38.5 | 40.7 | 101.5 | 196.1 | — |
+| `groupby_sum` | 171.2 | 20.7 | 20.8 | 115.0 | 190.5 | 192.9 |
+| `groupby_where_count` | 75.7 | 14.0 | 14.3 | 115.5 | 187.7 | — |
+| `groupby_where_sum` | 86.5 | 14.1 | 14.7 | 116.3 | 186.7 | — |
+| `join_count` | 38.3 | 51.2 | 64.3 | 113.1 | 184.5 | 194.6 |
+| `join_groupby_count` | 157.7 | 79.1 | 88.6 | 177.7 | 232.0 | — |
+| `join_groupby_to_array` | 189.0 | 78.1 | 90.1 | 215.3 | 215.6 | — |
+| `join_select` | 151.5 | 72.6 | 85.0 | 188.5 | 215.8 | — |
+| `join_where_count` | 48.8 | 61.5 | 76.7 | 160.0 | 201.9 | 229.1 |
+| `last_match` | 0.0 | 5.9 | 13.9 | 65.1 | 159.0 | 30.9 |
+| `long_count_aggregate` | 28.9 | 4.1 | 4.2 | 63.3 | 154.6 | 20.3 |
+| `max_aggregate` | 30.7 | 6.0 | 6.9 | 58.7 | 163.1 | 17.0 |
+| `min_aggregate` | 30.6 | 6.0 | 6.9 | 58.6 | 163.3 | 17.1 |
+| `order_by_multi_key` | 339.9 | 271.4 | 283.6 | 458.8 | 446.1 | 334.3 |
+| `order_distinct_take` | 137.9 | 15.9 | 100.3 | 72.5 | 164.1 | 31.1 |
+| `order_reverse_normalized` | 38.3 | 16.2 | 20.3 | 70.7 | 170.9 | — |
+| `order_take_desc` | 38.2 | 16.2 | 20.6 | 70.1 | 170.2 | 33.3 |
+| `point_lookup` | — | — | — | — | — | 0.0 |
+| `point_lookup_scan` | — | — | — | — | — | 8.4 |
+| `reverse_distinct_by` | 294.0 | 21.1 | 28.1 | 71.1 | 162.6 | — |
+| `reverse_take` | 0.1 | 0.0 | 0.2 | 0.0 | 26.1 | 58.9 |
+| `reverse_take_select` | 0.0 | 0.0 | 0.2 | 0.0 | 26.2 | — |
+| `select_count` | 0.1 | 0.0 | 2.2 | 68.3 | 2.2 | 0.0 |
+| `select_many` | — | 191.5 | — | — | — | — |
+| `select_where` | 197.5 | 11.2 | 19.4 | 195.6 | 183.7 | 37.5 |
+| `select_where_count` | 32.2 | 5.1 | 7.5 | 64.8 | 157.1 | 21.9 |
+| `select_where_order_take` | 36.2 | 12.2 | 15.1 | 72.5 | 165.1 | 34.5 |
+| `select_where_sum` | 37.1 | 7.5 | 7.5 | 66.4 | 162.2 | 23.3 |
+| `single_match` | 0.0 | 2.9 | 5.5 | 58.5 | 151.1 | 22.8 |
 | `skip_take` | 0.5 | 0.1 | 0.2 | 3.0 | 2.8 | 0.3 |
-| `skip_while_match` | 3.5 | 5.3 | 5.3 | 57.3 | 146.6 | 18.2 |
-| `sort_first` | 37.6 | 11.1 | 13.3 | 64.6 | 159.5 | 31.7 |
-| `sort_take` | 38.0 | 16.2 | 20.9 | 70.2 | 161.9 | 33.0 |
-| `sort_take_select` | 37.6 | 16.3 | 20.9 | 70.8 | 162.7 | 33.3 |
-| `sum_aggregate` | 29.7 | 2.1 | 2.1 | 54.3 | 146.7 | 13.4 |
-| `sum_where` | 31.9 | 4.3 | 4.3 | 63.6 | 148.1 | 20.5 |
+| `skip_while_match` | 3.5 | 5.3 | 5.3 | 60.2 | 153.8 | 18.3 |
+| `sort_first` | 38.0 | 11.1 | 13.3 | 65.0 | 167.1 | 31.7 |
+| `sort_take` | 38.2 | 16.3 | 21.1 | 70.2 | 170.7 | 33.2 |
+| `sort_take_select` | 38.1 | 16.3 | 21.8 | 71.1 | 170.6 | 33.3 |
+| `sum_aggregate` | 30.6 | 2.1 | 2.1 | 54.8 | 152.8 | 13.5 |
+| `sum_where` | 32.9 | 4.4 | 4.3 | 63.4 | 154.2 | 20.6 |
 | `take_count` | 3.6 | 0.2 | 0.4 | 2.9 | 2.7 | 0.5 |
 | `take_count_filtered` | 1.1 | 0.2 | 0.2 | 1.3 | 1.1 | 0.3 |
 | `take_sum_aggregate` | 0.8 | 0.1 | 0.1 | 0.6 | 0.5 | 0.1 |
 | `take_where_count` | 0.9 | 0.1 | 0.1 | 0.7 | 0.6 | 0.2 |
-| `take_while_match` | 7.8 | 2.4 | 2.4 | 28.8 | 71.4 | 16.8 |
-| `to_array_filter` | 70.3 | 11.8 | 11.7 | 71.1 | 157.4 | 28.8 |
-| `where_join_count` | 41.0 | 29.0 | 41.5 | 133.0 | 163.1 | — |
-| `zip_count_pred` | 39.0 | 15.8 | — | 313.5 | 319.6 | — |
-| `zip_dot_product` | 46.1 | 12.6 | 10.5 | 308.6 | 317.2 | — |
-| `zip_dot_product_3arg` | 46.1 | 12.8 | — | 308.7 | 316.5 | — |
-| `zip_reverse_to_array` | — | 31.6 | — | 343.1 | 351.0 | — |
+| `take_while_match` | 7.8 | 2.4 | 2.4 | 30.3 | 75.4 | 16.5 |
+| `to_array_filter` | 70.0 | 11.8 | 11.7 | 71.3 | 164.9 | 28.7 |
+| `where_join_count` | 41.2 | 29.0 | 42.0 | 132.1 | 168.9 | — |
+| `zip_count_pred` | 39.2 | 15.9 | — | 313.8 | 322.0 | — |
+| `zip_dot_product` | 46.1 | 12.6 | 10.6 | 308.6 | 319.3 | — |
+| `zip_dot_product_3arg` | 46.1 | 12.8 | — | 309.7 | 319.0 | — |
+| `zip_reverse_to_array` | — | 31.6 | — | 343.4 | 353.5 | — |
 
 ## JIT
 
 | Benchmark | SQL (m1) | Array (m3f) | Decs (m4) | XML fold (m5f) | JSON fold (m6f) | Table fold (m7) |
 |---|---:|---:|---:|---:|---:|---:|
-| `aggregate_match` | 35.0 | 0.3 | 0.6 | 21.7 | 27.3 | 13.6 |
-| `all_match` | 27.8 | 0.3 | 0.2 | 18.1 | 25.9 | 13.5 |
+| `aggregate_match` | 35.1 | 0.3 | 0.6 | 21.8 | 26.0 | 13.4 |
+| `all_match` | 27.8 | 0.3 | 0.2 | 18.1 | 25.2 | 13.5 |
 | `any_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
-| `average_aggregate` | 29.9 | 1.0 | 3.6 | 18.0 | 24.4 | 13.4 |
+| `average_aggregate` | 30.1 | 1.0 | 3.6 | 18.1 | 24.6 | 13.5 |
 | `bare_last` | — | 0.4 | 0.0 | 0.0 | 0.0 | 17.1 |
-| `bare_order_where` | 186.2 | 34.0 | 35.3 | 106.3 | 52.4 | 78.7 |
-| `chained_select_collapse` | — | 1.1 | 1.1 | 20.4 | 33.0 | 14.0 |
-| `chained_where` | 35.9 | 0.6 | 0.8 | 35.5 | 31.5 | 17.6 |
-| `contains_match` | 0.0 | 0.2 | 0.1 | 14.8 | 9.2 | 4.7 |
-| `count_aggregate` | 29.5 | 0.3 | 0.6 | 20.4 | 25.1 | 13.4 |
-| `cross_join` | 5964.4 | 734.4 | — | 834.2 | 772.7 | — |
+| `bare_order_where` | 185.6 | 34.0 | 35.2 | 106.5 | 53.5 | 78.9 |
+| `chained_select_collapse` | — | 1.1 | 1.1 | 20.6 | 33.4 | 14.0 |
+| `chained_where` | 36.2 | 0.6 | 0.8 | 35.6 | 31.4 | 17.7 |
+| `contains_match` | 0.0 | 0.2 | 0.1 | 17.5 | 9.0 | 4.7 |
+| `count_aggregate` | 29.3 | 0.3 | 0.6 | 20.5 | 25.3 | 13.5 |
+| `cross_join` | 5962.8 | 733.1 | — | 836.0 | 773.4 | — |
 | `decs_count_bare_pred` | — | — | 0.6 | — | — | — |
-| `distinct_by_count` | 41.0 | 1.1 | 1.1 | 20.4 | 32.0 | 14.0 |
-| `distinct_by_order_take` | 237.4 | 1.7 | 2.6 | 48.4 | 37.1 | 29.9 |
-| `distinct_by_order_to_array` | 237.2 | 1.7 | 2.7 | 47.5 | 36.8 | 30.0 |
-| `distinct_count` | 40.8 | 1.1 | 1.1 | 20.5 | 31.9 | 14.0 |
-| `distinct_count_pred` | 249.8 | 1.1 | 1.3 | 37.6 | 41.7 | 14.0 |
+| `distinct_by_count` | 41.2 | 1.1 | 1.1 | 20.6 | 33.3 | 14.0 |
+| `distinct_by_order_take` | 237.1 | 1.7 | 2.6 | 47.4 | 39.1 | 30.3 |
+| `distinct_by_order_to_array` | 242.4 | 1.8 | 2.6 | 47.4 | 38.7 | 30.3 |
+| `distinct_count` | 40.9 | 1.1 | 1.1 | 20.6 | 33.3 | 14.0 |
+| `distinct_count_pred` | 250.6 | 1.1 | 1.3 | 37.7 | 43.5 | 14.0 |
 | `distinct_take` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
-| `element_at_match` | 0.0 | 0.0 | 0.0 | 0.1 | 0.0 | 0.0 |
+| `element_at_match` | 0.0 | 0.0 | 0.0 | 0.2 | 0.0 | 0.0 |
 | `first_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
 | `first_or_default_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
-| `groupby_average` | 170.1 | 1.5 | 1.9 | 35.7 | 43.0 | — |
-| `groupby_count` | 141.1 | 1.3 | 1.5 | 20.5 | 32.2 | 43.0 |
-| `groupby_first` | 251.0 | 1.3 | 2.3 | 20.5 | 32.9 | — |
-| `groupby_having_count` | 141.1 | 1.3 | 1.5 | 20.5 | 32.1 | — |
-| `groupby_having_hidden_sum` | 173.9 | 1.5 | 1.7 | 35.8 | 42.7 | — |
-| `groupby_having_post_where` | 170.2 | 1.4 | 1.9 | 35.8 | 41.8 | — |
-| `groupby_max` | 172.3 | 1.5 | 1.9 | 35.9 | 43.6 | — |
-| `groupby_min` | 173.0 | 1.5 | 1.8 | 35.8 | 43.6 | — |
-| `groupby_multi_reducer` | 191.8 | 1.6 | 1.9 | 36.1 | 43.7 | — |
-| `groupby_select_order` | 170.5 | 1.4 | 1.9 | 35.8 | 42.0 | — |
-| `groupby_select_sum` | 195.5 | 2.8 | 3.2 | 32.3 | 37.6 | — |
-| `groupby_sum` | 169.8 | 1.4 | 1.6 | 35.8 | 42.0 | 51.2 |
-| `groupby_where_count` | 75.7 | 0.9 | 1.3 | 35.9 | 39.7 | — |
-| `groupby_where_sum` | 86.4 | 0.9 | 1.3 | 35.9 | 39.6 | — |
-| `join_count` | 37.9 | 11.0 | 11.7 | 43.4 | 68.3 | 62.9 |
-| `join_groupby_count` | 156.2 | 18.2 | 20.0 | 68.3 | 86.7 | — |
-| `join_groupby_to_array` | 189.2 | 17.5 | 19.4 | 80.2 | 36.1 | — |
-| `join_select` | 92.8 | 19.6 | 21.6 | 74.4 | 94.1 | — |
-| `join_where_count` | 39.1 | 18.9 | 20.6 | 64.5 | 77.9 | 80.0 |
-| `last_match` | 0.0 | 0.5 | 1.4 | 18.6 | 25.9 | 22.9 |
-| `long_count_aggregate` | 28.7 | 0.3 | 0.6 | 20.4 | 26.6 | 13.4 |
-| `max_aggregate` | 30.6 | 0.3 | 0.5 | 18.1 | 26.7 | 13.4 |
-| `min_aggregate` | 30.6 | 0.3 | 0.5 | 18.2 | 26.3 | 13.4 |
-| `order_by_multi_key` | 247.0 | 53.4 | 54.8 | 125.3 | 70.3 | 128.9 |
-| `order_distinct_take` | 137.9 | 1.1 | 75.6 | 20.9 | 34.1 | 14.0 |
-| `order_reverse_normalized` | 37.8 | 0.7 | 1.3 | 24.6 | 27.0 | — |
-| `order_take_desc` | 38.0 | 0.7 | 1.3 | 24.5 | 26.9 | 17.7 |
-| `reverse_distinct_by` | 295.4 | 1.5 | 3.2 | 20.4 | 32.7 | — |
-| `reverse_take` | 0.0 | 0.0 | 0.0 | 0.0 | 3.7 | 26.8 |
-| `reverse_take_select` | 0.0 | 0.0 | 0.0 | 0.0 | 3.8 | — |
-| `select_count` | 0.1 | 0.0 | 0.0 | 63.4 | 0.0 | 0.0 |
-| `select_many` | — | 61.5 | — | — | — | — |
-| `select_where` | 110.5 | 4.3 | 5.3 | 76.1 | 22.1 | 27.9 |
-| `select_where_count` | 32.1 | 0.3 | 0.6 | 18.4 | 25.9 | 13.3 |
-| `select_where_order_take` | 36.3 | 0.7 | 1.4 | 18.9 | 26.6 | 22.9 |
-| `select_where_sum` | 37.0 | 0.4 | 0.6 | 17.9 | 24.9 | 13.3 |
-| `single_match` | 0.0 | 0.4 | 1.1 | 43.4 | 22.2 | 17.2 |
-| `skip_take` | 0.3 | 0.0 | 0.0 | 1.3 | 0.2 | 0.1 |
-| `skip_while_match` | 3.5 | 0.4 | 0.4 | 43.5 | 21.8 | 13.2 |
-| `sort_first` | 37.7 | 0.4 | 1.4 | 17.9 | 26.1 | 17.1 |
-| `sort_take` | 38.0 | 0.7 | 1.5 | 24.5 | 26.8 | 17.7 |
-| `sort_take_select` | 37.8 | 0.7 | 1.3 | 24.5 | 26.9 | 17.7 |
-| `sum_aggregate` | 29.6 | 0.3 | 0.1 | 23.3 | 24.3 | 13.4 |
-| `sum_where` | 32.1 | 0.3 | 0.6 | 18.4 | 25.9 | 13.3 |
-| `take_count` | 1.8 | 0.1 | 0.1 | 1.2 | 0.3 | 0.4 |
+| `groupby_average` | 171.1 | 1.6 | 1.9 | 35.9 | 45.7 | — |
+| `groupby_count` | 141.4 | 1.3 | 1.5 | 20.6 | 33.9 | 45.7 |
+| `groupby_first` | 250.7 | 1.3 | 2.3 | 20.6 | 34.3 | — |
+| `groupby_having_count` | 141.5 | 1.3 | 1.5 | 20.6 | 33.8 | — |
+| `groupby_having_hidden_sum` | 174.4 | 1.5 | 1.7 | 35.9 | 45.3 | — |
+| `groupby_having_post_where` | 170.1 | 1.4 | 2.0 | 35.8 | 44.2 | — |
+| `groupby_max` | 175.6 | 1.5 | 2.0 | 36.0 | 46.0 | — |
+| `groupby_min` | 172.4 | 1.5 | 1.8 | 36.0 | 46.0 | — |
+| `groupby_multi_reducer` | 189.6 | 1.6 | 2.0 | 36.1 | 46.1 | — |
+| `groupby_select_order` | 170.1 | 1.4 | 1.9 | 35.9 | 44.3 | — |
+| `groupby_select_sum` | 197.0 | 2.8 | 3.2 | 32.2 | 40.0 | — |
+| `groupby_sum` | 170.5 | 1.4 | 1.6 | 35.9 | 43.4 | 54.2 |
+| `groupby_where_count` | 75.6 | 0.9 | 1.3 | 36.0 | 41.7 | — |
+| `groupby_where_sum` | 86.2 | 0.9 | 1.3 | 35.9 | 41.7 | — |
+| `join_count` | 38.2 | 11.0 | 11.7 | 43.6 | 71.4 | 63.1 |
+| `join_groupby_count` | 156.8 | 18.0 | 20.1 | 68.5 | 90.1 | — |
+| `join_groupby_to_array` | 189.5 | 17.4 | 19.4 | 80.5 | 36.0 | — |
+| `join_select` | 93.2 | 19.6 | 21.7 | 74.8 | 94.5 | — |
+| `join_where_count` | 48.3 | 19.0 | 20.7 | 64.5 | 78.3 | 80.0 |
+| `last_match` | 0.0 | 0.5 | 1.4 | 18.8 | 25.9 | 22.9 |
+| `long_count_aggregate` | 28.8 | 0.3 | 0.6 | 20.6 | 25.4 | 13.5 |
+| `max_aggregate` | 30.5 | 0.3 | 0.5 | 18.3 | 26.7 | 13.4 |
+| `min_aggregate` | 30.6 | 0.3 | 0.5 | 18.3 | 26.6 | 13.5 |
+| `order_by_multi_key` | 249.4 | 53.4 | 54.8 | 125.6 | 71.1 | 129.8 |
+| `order_distinct_take` | 138.1 | 1.1 | 75.6 | 20.9 | 35.8 | 14.0 |
+| `order_reverse_normalized` | 38.0 | 0.7 | 1.4 | 24.6 | 27.6 | — |
+| `order_take_desc` | 37.9 | 0.7 | 1.3 | 24.6 | 27.9 | 17.8 |
+| `point_lookup` | — | — | — | — | — | 0.0 |
+| `point_lookup_scan` | — | — | — | — | — | 6.1 |
+| `reverse_distinct_by` | 295.6 | 1.6 | 3.2 | 20.6 | 34.3 | — |
+| `reverse_take` | 0.0 | 0.0 | 0.0 | 0.0 | 3.7 | 26.9 |
+| `reverse_take_select` | 0.0 | 0.0 | 0.0 | 0.0 | 3.7 | — |
+| `select_count` | 0.1 | 0.0 | 0.0 | 68.7 | 0.0 | 0.0 |
+| `select_many` | — | 64.0 | — | — | — | — |
+| `select_where` | 110.6 | 4.2 | 5.3 | 76.5 | 22.0 | 28.1 |
+| `select_where_count` | 32.3 | 0.3 | 0.6 | 18.6 | 26.7 | 13.5 |
+| `select_where_order_take` | 37.1 | 0.7 | 1.4 | 19.1 | 27.4 | 23.0 |
+| `select_where_sum` | 36.9 | 0.4 | 0.6 | 18.2 | 25.2 | 13.4 |
+| `single_match` | 0.0 | 0.4 | 1.1 | 46.3 | 22.2 | 17.3 |
+| `skip_take` | 0.3 | 0.0 | 0.0 | 1.3 | 0.2 | 0.2 |
+| `skip_while_match` | 3.5 | 0.4 | 0.4 | 46.7 | 21.7 | 13.3 |
+| `sort_first` | 38.3 | 0.4 | 1.3 | 18.2 | 26.7 | 17.3 |
+| `sort_take` | 38.2 | 0.7 | 1.4 | 24.7 | 27.8 | 17.8 |
+| `sort_take_select` | 37.6 | 0.7 | 1.4 | 24.7 | 27.8 | 17.8 |
+| `sum_aggregate` | 29.3 | 0.3 | 0.1 | 23.4 | 24.6 | 13.5 |
+| `sum_where` | 31.8 | 0.3 | 0.6 | 18.6 | 26.4 | 13.4 |
+| `take_count` | 1.9 | 0.1 | 0.1 | 1.2 | 0.2 | 0.2 |
 | `take_count_filtered` | 1.1 | 0.0 | 0.0 | 0.4 | 0.1 | 0.2 |
 | `take_sum_aggregate` | 0.8 | 0.0 | 0.0 | 0.2 | 0.0 | 0.1 |
-| `take_where_count` | 0.8 | 0.0 | 0.0 | 0.2 | 0.0 | 0.1 |
-| `take_while_match` | 7.8 | 0.2 | 0.3 | 14.7 | 9.0 | 13.3 |
-| `to_array_filter` | 47.1 | 3.3 | 3.3 | 21.3 | 33.6 | 20.0 |
-| `where_join_count` | 39.0 | 5.8 | 6.7 | 49.5 | 40.6 | — |
-| `zip_count_pred` | 39.1 | 0.1 | — | 116.7 | 33.5 | — |
-| `zip_dot_product` | 46.3 | 0.1 | 0.1 | 116.6 | 33.4 | — |
-| `zip_dot_product_3arg` | 46.1 | 0.1 | — | 116.5 | 33.4 | — |
-| `zip_reverse_to_array` | — | 4.6 | — | 127.7 | 50.0 | — |
+| `take_where_count` | 0.9 | 0.0 | 0.0 | 0.2 | 0.0 | 0.1 |
+| `take_while_match` | 7.7 | 0.2 | 0.3 | 17.3 | 9.0 | 13.4 |
+| `to_array_filter` | 48.2 | 3.2 | 3.3 | 21.6 | 35.0 | 20.4 |
+| `where_join_count` | 41.2 | 5.8 | 6.7 | 49.6 | 41.9 | — |
+| `zip_count_pred` | 38.6 | 0.1 | — | 117.0 | 33.9 | — |
+| `zip_dot_product` | 46.0 | 0.1 | 0.1 | 116.8 | 33.8 | — |
+| `zip_dot_product_3arg` | 45.9 | 0.1 | — | 116.8 | 33.7 | — |
+| `zip_reverse_to_array` | — | 4.6 | — | 128.3 | 51.4 | — |
 <!-- BENCH:TABLES END -->
 
 ## Missing lanes (the `—` cells)
@@ -204,6 +210,7 @@ Each empty cell's reason is also in the bench `.das` file's comment; SQL gaps ar
 - **`order_distinct_take` m4 vs m3f** — `unique_key` hashes workhorse keys directly (array `int`) but string-interpolates structs (decs `DecsBrand`); the gap is per-element string hashing, not decs-walk. `distinct_by_count` is the key-based variant (m4 parity).
 - **`zip_reverse_to_array` / `zip_*` SQL / Decs** — `reverse` has no SQL order key; zip is not relational / not expressible over one archetype walk. By design. (XML/JSON zip lanes are lit, partially fused.)
 - **m7 absent families** — `zip_*` / `cross_join` (lockstep over an unordered slot walk is meaningless), `select_many` (flat fixture, no nested array field), `order_reverse_normalized` / `reverse_take_select` / `reverse_distinct_by` (no backward slot walk; `reverse_take` is kept as the single deferral marker), the group-by tail beyond `groupby_count`/`groupby_sum` and joins beyond `join_count`/`join_where_count` (table group_by/join fusion is staged — see `LINQ_TO_TABLE.md`; the four marker cells track the tier-2 cost until then), `decs_count_bare_pred` (decs-only).
+- **`point_lookup` / `point_lookup_scan` non-m7** — m7-only pair: only a table source has a key to probe (`where(kv.key == X)` + terminator → `key_exists` / `tab?[X]`, O(1)); the `_scan` twin forces the same query through the walk (compound `&&` predicate declines the probe) to show the gap. Other sources have no analog by design.
 
 ## Accepted floors
 
diff --git a/benchmarks/sql/table.das b/benchmarks/sql/table.das
index 66564b963..2b49e1c31 100644
--- a/benchmarks/sql/table.das
+++ b/benchmarks/sql/table.das
@@ -388,6 +388,32 @@ def order_take_desc_m7(b : B?) {
     }
 }
 
+// Point-lookup pair: the fused probe (key-equality where + first_or_default → `g_t?[k]`, O(1) total —
+// per-element ns reads ~0) vs the same query forced onto the linear scan via a second always-true
+// `where` (collapses to a compound `&&` predicate, which the probe matcher correctly declines).
+[benchmark]
+def point_lookup_m7(b : B?) {
+    b |> run("point_lookup", N) {
+        let row = _fold(unsafe(each_kv(g_t))._where(_.key == N / 2).first_or_default(default<CarKV>))
+        b |> accept(row)
+        if (row.key == 0) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def point_lookup_scan_m7(b : B?) {
+    b |> run("point_lookup_scan", N) {
+        let row = _fold(unsafe(each_kv(g_t))._where(_.key == N / 2)._where(_.value.price >= 0)
+            .first_or_default(default<CarKV>))
+        b |> accept(row)
+        if (row.key == 0) {
+            b->failNow()
+        }
+    }
+}
+
 [benchmark]
 def reverse_take_m7(b : B?) {
     b |> run("reverse_take", N) {
diff --git a/daslib/linq_fold.das b/daslib/linq_fold.das
index d09f9f027..3f81673c7 100644
--- a/daslib/linq_fold.das
+++ b/daslib/linq_fold.das
@@ -219,9 +219,6 @@ def private try_splice_patterns(prog : ProgramPtr; var expr : Expression?) : Exp
         let valT = tabCall.arguments[0]._type.secondType
         if (tabName != "each_kv" || (valT != null && valT.canCopy)) {
             let lane = tabName == "each_kv" ? TableLane.KV : (tabName == "keys" ? TableLane.KEYS : TableLane.VALUES)
-            if (lane != TableLane.VALUES) {
-                drop_redundant_distinct(calls)   // keys are unique by construction; values can repeat
-            }
             var ttopClone = clone_expression(top)
             // keys/each_kv spell their element `-const` (iterator-variance concern); that flag must not leak into emitted var/buffer type spellings (`array<tuple<…> -const>` breaks push_clone unification).
             if (ttopClone._type != null && ttopClone._type.firstType != null) {
@@ -229,9 +226,17 @@ def private try_splice_patterns(prog : ProgramPtr; var expr : Expression?) : Exp
             }
             var elemT = clone_type(tabCall._type.firstType)
             elemT.flags.removeConstant = false
-            return run_splice_adapter(calls, ttopClone, ttopClone,
-                new TableAdapter(tabExpr = clone_expression(tabCall.arguments[0]), srcName = qn("tsrc", at),
-                                 elemType = elemT, lane = lane), exprIsIter, at)
+            var tadapter = new TableAdapter(tabExpr = clone_expression(tabCall.arguments[0]), srcName = qn("tsrc", at),
+                                            elemType = elemT, lane = lane)
+            if (!exprIsIter) {
+                // `where(kv.key == X)` + terminator → O(1) key probe instead of the walk
+                var probe = try_table_point_lookup(calls, tadapter, at)
+                if (probe != null) return probe
+            }
+            if (lane != TableLane.VALUES) {
+                drop_redundant_distinct(calls)   // keys are unique by construction; values can repeat
+            }
+            return run_splice_adapter(calls, ttopClone, ttopClone, tadapter, exprIsIter, at)
         }
     }
     top = peel_each(top)
diff --git a/daslib/linq_fold_table.das b/daslib/linq_fold_table.das
index b089bb0b6..c9fc01181 100644
--- a/daslib/linq_fold_table.das
+++ b/daslib/linq_fold_table.das
@@ -18,6 +18,7 @@ module linq_fold_table shared public
 require daslib/ast_boost
 require daslib/ast_match
 require daslib/templates_boost
+require daslib/macro_boost
 require daslib/linq_fold_common public
 
 enum TableLane {
@@ -171,6 +172,168 @@ class TableAdapter : SourceAdapter {
     }
 }
 
+// ===== Point-lookup folds — `where(kv.key == X)` + terminator → O(1) key probe =====
+// any/contains → key_exists, count → key_exists?1:0, first[_or_default] (± select) → tab?[X] probe,
+// with the scan's exact semantics. Full shape/decline table: linq_fold_patterns.rst (table source row).
+
+[macro_function]
+def private match_key_probe_side(var keySide, otherSide : Expression?; lane : TableLane; bindName : string) : Expression? {
+    var k = keySide
+    if (k != null && k is ExprRef2Value) {
+        k = (k as ExprRef2Value).subexpr
+    }
+    if (lane == TableLane.KV) {
+        if (k == null || !(k is ExprField)) return null
+        var f = k as ExprField
+        if (f.name != "key") return null
+        var base = f.value
+        if (base != null && base is ExprRef2Value) {
+            base = (base as ExprRef2Value).subexpr
+        }
+        if (base == null || !(base is ExprVar) || (base as ExprVar).name != bindName) return null
+    } else {
+        if (k == null || !(k is ExprVar) || (k as ExprVar).name != bindName) return null
+    }
+    // X must be loop-invariant AND side-effect free — the scan evaluates X per element, a probe once
+    if (expr_uses_var(otherSide, bindName) || has_sideeffects(otherSide)) return null
+    return clone_expression(otherSide)
+}
+
+// Decompose a peeled predicate body (binder renamed to bindName) as `<key-ref> == X`. Returns cloned X.
+[macro_function]
+def private extract_key_probe(var pred : Expression?; lane : TableLane; bindName : string) : Expression? {
+    if (pred == null || !(pred is ExprOp2)) return null
+    var op2 = pred as ExprOp2
+    if (op2.op != "==") return null
+    var probe = match_key_probe_side(op2.left, op2.right, lane, bindName)
+    if (probe == null) {
+        probe = match_key_probe_side(op2.right, op2.left, lane, bindName)
+    }
+    return probe
+}
+
+[macro_function]
+def try_table_point_lookup(var calls : array<tuple<ExprCall?; LinqCall?>>; var adapter : TableAdapter?; at : LineInfo) : Expression? {
+    if (adapter.lane == TableLane.VALUES) return null
+    let n = length(calls)
+    if (n < 1 || n > 3) return null
+    var termCall = calls[n - 1]._0
+    let termName = calls[n - 1]._1.name
+    let termArgs = length(termCall.arguments)
+    var selCall : ExprCall?
+    var predArg : Expression?
+    var keyX : Expression?
+    let bindName = qn("plk_it", at)
+    if (n == 1) {
+        // predicate-form terminators, and the keys-lane contains
+        if ((termName == "any" || termName == "count") && termArgs == 2) {
+            predArg = termCall.arguments[1]
+        } elif (termName == "contains" && termArgs == 2 && adapter.lane == TableLane.KEYS) {
+            // element evaluated exactly once on both paths — no invariance/purity gate needed
+            keyX = clone_expression(termCall.arguments[1])
+        } else {
+            return null
+        }
+    } else {
+        if (calls[0]._1.name != "where_" || length(calls[0]._0.arguments) != 2) return null
+        predArg = calls[0]._0.arguments[1]
+        if (n == 3) {
+            if (calls[1]._1.name != "select" || length(calls[1]._0.arguments) != 2
+                    || (termName != "first" && termName != "first_or_default")) {
+                return null
+            }
+            selCall = calls[1]._0
+        } elif (!((termName == "any" || termName == "count" || termName == "first") && termArgs == 1)
+                && !(termName == "first_or_default" && termArgs == 2)) {
+            return null
+        }
+    }
+    if (predArg != null) {
+        var predBody = peel_lambda_rename_var(predArg, bindName)
+        keyX = extract_key_probe(predBody, adapter.lane, bindName)
+    }
+    if (keyX == null) return null
+    let sn = adapter.srcName
+    // boolean / counting probes
+    if (termName == "any" || termName == "contains") {
+        var anyStmts <- qmacro_block_to_array() {
+            return key_exists($i(sn), $e(keyX))
+        }
+        return adapter->wrap_invoke(anyStmts, null, false, at)
+    }
+    if (termName == "count") {
+        var cntStmts <- qmacro_block_to_array() {
+            return key_exists($i(sn), $e(keyX)) ? 1 : 0
+        }
+        return adapter->wrap_invoke(cntStmts, null, false, at)
+    }
+    // element probes: first / first_or_default, ± trailing select
+    var retT = strip_const_ref(clone_type(termCall._type))
+    retT.flags.removeConstant = false
+    let kName = qn("plk_k", at)
+    let dName = qn("plk_d", at)
+    var stmts : array<Expression?>
+    if (termName == "first_or_default") {
+        // eager default bind, matching linq.das argument evaluation order
+        stmts |> push <| qmacro_expr() {
+            let $i(dName) = $e(termCall.arguments[1])
+        }
+    }
+    stmts |> push <| qmacro_expr() {
+        let $i(kName) = $e(keyX)
+    }
+    var missTail : Expression?
+    if (termName == "first_or_default") {
+        missTail = qmacro_expr() {
+            return $i(dName)
+        }
+    } else {
+        missTail = qmacro_expr() {
+            panic("sequence contains no elements")
+        }
+    }
+    if (adapter.lane == TableLane.KEYS) {
+        stmts |> push <| qmacro_expr() {
+            if (!key_exists($i(sn), $i(kName))) {
+                $e(missTail)
+            }
+        }
+        if (selCall != null) {
+            var proj = peel_lambda_rename_var(selCall.arguments[1], kName)
+            stmts |> push <| qmacro_expr() {
+                return $e(proj)
+            }
+        } else {
+            stmts |> push <| qmacro_expr() {
+                return $i(kName)
+            }
+        }
+        return adapter->wrap_invoke(stmts, retT, false, at)
+    }
+    // KV lane: probe the value pointer, materialize the (key, value) pair on hit. Table safe-index is
+    // unsafe (the pointer dangles on rehash) — fine here, the generated invoke never mutates the table.
+    let pName = qn("plk_p", at)
+    stmts |> push_from <| qmacro_block_to_array() {
+        let $i(pName) = unsafe($i(sn)?[$i(kName)])
+        if ($i(pName) == null) {
+            $e(missTail)
+        }
+    }
+    if (selCall != null) {
+        let bName = qn("plk_kv", at)
+        var proj = peel_lambda_rename_var(selCall.arguments[1], bName)
+        stmts |> push_from <| qmacro_block_to_array() {
+            let $i(bName) = (key = $i(kName), value = *$i(pName))
+            return $e(proj)
+        }
+    } else {
+        stmts |> push <| qmacro_expr() {
+            return (key = $i(kName), value = *$i(pName))
+        }
+    }
+    return adapter->wrap_invoke(stmts, retT, false, at)
+}
+
 // Recognize an `each_kv(tab)` / `keys(tab)` / `values(tab)` chain top. Returns the call (caller reads
 // arguments[0] = the table, `_type.firstType` = element); null otherwise. Name + table-typed-arg match,
 // like extract_json_source — the strong arg-type gate keeps an unrelated user `keys` from firing this.
diff --git a/doc/source/reference/linq_fold_patterns.rst b/doc/source/reference/linq_fold_patterns.rst
index 2d5bdc524..3ced024a9 100644
--- a/doc/source/reference/linq_fold_patterns.rst
+++ b/doc/source/reference/linq_fold_patterns.rst
@@ -150,7 +150,7 @@ Source-side entry points
      - Optional source — only when the ``pugixml`` module is linked (``require ?pugixml`` + ``static_if (typeinfo builtin_module_exists(pugixml))``). Emits an inlined DOM child-element walk replacing the generator, and **field-prunes** the per-element materialization (pass 2b): the chain body is scanned for the ``Row`` fields it reads, and only those attributes are read via ``read_xml_field`` into scalar locals — unread fields (notably ``string`` fields, whose ``clone_string`` is the alloc cost) are never touched, so a float-only chain runs alloc-free and JIT beats the equivalent SQLite query. A whole-row escape (``to_array`` / identity ``_select(_)`` / pass-to-fn) routes to the full ``build_xml_row`` instead. The ``XmlAdapter`` **rides every pattern row** (``try_splice_patterns`` runs with no ``onlyRow`` restriction); per-row ``requires`` predicates and the adapter's capability hooks (``can_join`` / ``can_group_by`` / ``defers_materialization`` / the ``non_array_source`` gate) decide what fuses, and a shape it can't fuse cascades to tier-2 — see :ref:`linq_fold_xml_patterns` for the full fuse/defer breakdown. ``unsafe`` is required (the source is ``[unsafe_outside_of_for]``) and the node is passed by value (``var root`` — ``_fold``'s macro-arg inference skips the const&→value copy).
    * - ``unsafe(each_kv(tab))`` / ``keys(tab)`` / ``values(tab)``
      - ``extract_table_source`` (``TableAdapter``, ``daslib/linq_fold_table.das``)
-     - In-tree source — recognized by name **plus** a table-typed argument (``table<K;V>`` / ``table<K>``), so an unrelated user ``keys`` never fires it. The kv lane (``each_kv``) binds ``kv.key`` / ``kv.value`` and **usage-prunes the walk**: a chain touching only ``.value`` walks ``values(tab)`` alone, only ``.key`` (or neither) walks ``keys(tab)`` alone — half the slot-skip work of the zipped two-iterator form, which is emitted only when both sides (or the whole pair) are read. A whole-pair escape binds a named-tuple copy, so the kv lane fuses **copyable value types only** — a non-copyable-valued ``each_kv`` falls through and the surviving instantiation concept-asserts (error 31400; the keys/values lanes still fuse such tables). Bare ``count()`` / ``long_count()`` folds to O(1) ``length(tab)``; a plain ``distinct`` over raw keys/kv elements is **dropped** before matching (keys are unique by construction; only uniqueness-preserving prefix ops allow the drop — a preceding ``select`` keeps the distinct, and the values lane always keeps it). ``order_by`` / ``take`` / ``first`` observe the table's unspecified slot order, exactly like a hand ``for (k, v in keys(t), values(t))`` loop. ``can_join`` / ``can_group_by`` are off and reverse has no backward slot walk — those shapes cascade to tier-2 (the join probe and key-lookup folds are staged: see ``benchmarks/sql/LINQ_TO_TABLE.md``). ``unsafe`` is required at an unfused chain head (the sources are ``[unsafe_outside_of_for]``); fused chains rewrite the head before inference.
+     - In-tree source — recognized by name **plus** a table-typed argument (``table<K;V>`` / ``table<K>``), so an unrelated user ``keys`` never fires it. The kv lane (``each_kv``) binds ``kv.key`` / ``kv.value`` and **usage-prunes the walk**: a chain touching only ``.value`` walks ``values(tab)`` alone, only ``.key`` (or neither) walks ``keys(tab)`` alone — half the slot-skip work of the zipped two-iterator form, which is emitted only when both sides (or the whole pair) are read. A whole-pair escape binds a named-tuple copy, so the kv lane fuses **copyable value types only** — a non-copyable-valued ``each_kv`` falls through and the surviving instantiation concept-asserts (error 31400; the keys/values lanes still fuse such tables). Bare ``count()`` / ``long_count()`` folds to O(1) ``length(tab)``; a plain ``distinct`` over raw keys/kv elements is **dropped** before matching (keys are unique by construction; only uniqueness-preserving prefix ops allow the drop — a preceding ``select`` keeps the distinct, and the values lane always keeps it). **Point-lookup folds** (``try_table_point_lookup``): a key-equality ``where`` (``kv.key == X``, bare ``k == X`` on the keys lane, either operand order; predicate-form ``any(p)`` / ``count(p)`` too) against a loop-invariant, side-effect-free ``X`` folds the whole walk to an O(1) probe — ``any`` / keys-lane ``contains(X)`` → ``key_exists(tab, X)``, ``count`` → ``key_exists ? 1 : 0``, ``first`` / ``first_or_default`` (± one trailing ``select``) → a ``tab?[X]`` probe with the scan's exact semantics (panic on a missing ``first``, eagerly-bound default value otherwise). Anything else — compound ``&&`` predicates, other comparison operators, an ``X`` that reads the binder or has side effects (the scan evaluates ``X`` per element, the probe once) — keeps the scan. ``order_by`` / ``take`` / ``first`` observe the table's unspecified slot order, exactly like a hand ``for (k, v in keys(t), values(t))`` loop. ``can_join`` / ``can_group_by`` are off and reverse has no backward slot walk — those shapes cascade to tier-2 (the join probe is staged: see ``benchmarks/sql/LINQ_TO_TABLE.md``). ``unsafe`` is required at an unfused chain head (the sources are ``[unsafe_outside_of_for]``); fused chains rewrite the head before inference.
    * - ``unsafe(from_json(jv, type<Row>))``
      - ``extract_json_source`` (``JsonAdapter``, ``daslib/linq_fold_json.das``)
      - In-tree source — the adapter is compiled in unconditionally (no ``static_if`` gate, unlike XML's pugixml one), but a program only pulls JSON into scope by requiring ``json`` / ``json_boost`` itself. ``extract_json_source`` matches a ``from_json`` whose first argument is a ``json::JsonValue?``, so a JSON-less program returns null and the chain falls to the array tier. The adapter pulls in **no** json dependency — it emits ``from_json`` / ``read_json_field`` by name (resolved at the user's splice site, like ``linq_fold_decs`` emits ``for_each_archetype``; ``from_JV`` is emitted only for a non-struct element type). Emits an inlined ``for (e in jv.value as _array)`` walk replacing the generator, and **field-prunes** the per-element materialization (pass 2b): only the keys the chain reads are pulled via ``read_json_field`` by name — unread keys (notably ``string`` fields whose materialization clones) are never touched, so a scalar-only chain skips ~all of the full per-row build (3.6× over the full materialize — see ``benchmarks/micro/json_source_shapes.das``). A whole-row escape reads **every** top-level field by name (``emit_full_row_by_name``), so a custom whole-row ``from_JV(Row)`` override is **not** honored (Option B — this is a flat query source, not a deserializer; materialize the array with an explicit ``from_JV`` first for that). ``unsafe`` is required (the source is ``[unsafe_outside_of_for]``). Deferred materialization mirrors XML: order/distinct/take buffer a cheap ``(orderKey, JsonValue?)`` surrogate and materialize only the K survivors — by name (``emit_full_row_by_name``), so a struct survivor reads each field by key; only a non-struct ``Row`` falls back to ``outBind <- from_JV(handle, type<Row>)``. The ``JsonAdapter`` also fuses ``join`` / ``join |> group_by`` (``emit_join_hook`` + ``JsonJoinAdapter`` off ``build_group_by_adapter``'s upstream-join arm), reusing the array-join machinery (``build_join_standalone_pieces`` / ``build_join_adapter_pieces``): srcB is collected into a ``table<KEY; array<TUPB>>`` and the field-pruned array walk is the probe side, so the join key reads only its own field per element (e.g. ``read_json_field(jcur, "brand", …)``). Standalone ``group_join`` and a trailing ``where`` / ``select`` / ``count`` over group-join rows defer to tier-2, mirroring XML.
diff --git a/tests/linq/test_linq_table_source.das b/tests/linq/test_linq_table_source.das
index bcbfef726..630960d18 100644
--- a/tests/linq/test_linq_table_source.das
+++ b/tests/linq/test_linq_table_source.das
@@ -205,6 +205,79 @@ def test_table_fold_set_form(t : T?) {
     }
 }
 
+def private bump_key(var c : int&) : int {
+    c++
+    return 2
+}
+
+// Point-lookup folds: `where(kv.key == X)` + terminator → O(1) key probe (key_exists / `tab?[X]`).
+// Probes must agree with the scan on hit AND miss; non-probe shapes must keep riding the scan.
+
+[test]
+def test_table_point_lookup(t : T?) {
+    t |> run("kv any/count/first probes, hit and miss") @(t : T?) {
+        var tab <- make_int_table(10)
+        let k = 7
+        t |> equal(_fold(each_kv(tab)._where(_.key == k).any()), true)
+        t |> equal(_fold(each_kv(tab)._where(_.key == 99).any()), false)
+        t |> equal(_fold(each_kv(tab)._any(_.key == k)), true)
+        t |> equal(_fold(each_kv(tab)._where(_.key == k).count()), 1)
+        t |> equal(_fold(each_kv(tab)._where(_.key == 99).count()), 0)
+        t |> equal(_fold(each_kv(tab)._count(_.key == 3)), 1)
+        let f = _fold(each_kv(tab)._where(k == _.key).first())   // flipped operand side
+        t |> equal(f.key, 7)
+        t |> equal(f.value, 70)
+        let m = _fold(each_kv(tab)._where(_.key == 99).first_or_default(default<tuple<key : int; value : int>>))
+        t |> equal(m.key, 0)
+        delete tab
+    }
+    t |> run("probe + trailing select projects the probed element") @(t : T?) {
+        var tab <- make_int_table(10)
+        t |> equal(_fold(each_kv(tab)._where(_.key == 4)._select(_.value).first()), 40)
+        t |> equal(_fold(each_kv(tab)._where(_.key == 99)._select(_.value).first_or_default(-1)), -1)
+        t |> equal(_fold(each_kv(tab)._where(_.key == 4)._select("{_.key}:{_.value}").first()), "4:40")
+        delete tab
+    }
+    t |> run("keys lane probes + set form contains") @(t : T?) {
+        var tab <- make_int_table(10)
+        t |> equal(_fold(keys(tab)._where(_ == 5).any()), true)
+        t |> equal(_fold(keys(tab)._where(_ == 5).first()), 5)
+        t |> equal(_fold(keys(tab)._where(_ == 5)._select(_ * 100).first()), 500)
+        t |> equal(_fold(keys(tab).contains(5)), true)
+        t |> equal(_fold(keys(tab).contains(55)), false)
+        delete tab
+        var s : table<string> <- { "x", "y" }
+        t |> equal(_fold(keys(s).contains("y")), true)
+        t |> equal(_fold(keys(s).contains("z")), false)
+        delete s
+    }
+    t |> run("first probe panics on a missing key, like the scan") @(t : T?) {
+        var tab <- make_int_table(4)
+        var panicked = false
+        try {
+            let _r = _fold(each_kv(tab)._where(_.key == 99).first())
+        } recover {
+            panicked = true
+        }
+        t |> equal(panicked, true)
+        delete tab
+    }
+    t |> run("non-probe shapes stay scans and stay correct") @(t : T?) {
+        var tab <- make_int_table(10)
+        t |> equal(_fold(each_kv(tab)._where(_.key != 5).count()), 9)                      // wrong operator
+        t |> equal(_fold(each_kv(tab)._where(_.key == _.value / 10).count()), 10)          // X references the binder
+        t |> equal(_fold(each_kv(tab)._where(_.key == 5)._where(_.value > 0).any()), true) // collapses to a compound && predicate
+        delete tab
+    }
+    t |> run("impure X stays a scan — per-element evaluation preserved") @(t : T?) {
+        var tab <- make_int_table(4)
+        var evals = 0
+        t |> equal(_fold(each_kv(tab)._where(_.key == bump_key(evals)).count()), 1)
+        t |> equal(evals, 4, "side-effectful X must evaluate per element, not once")
+        delete tab
+    }
+}
+
 // Tier-2 over the raw each_kv iterator (no _fold) — the [unsafe_outside_of_for] contract requires the
 // explicit unsafe(...) wrap at a bare chain head; fused chains rewrite the head before inference.
 

From 2742f6db2fe386541bbc9952a08ac97327907368 Mon Sep 17 00:00:00 2001
From: Boris Batkin <bbatkin@gmail.com>
Date: Thu, 11 Jun 2026 02:32:18 -0700
Subject: [PATCH 07/11] =?UTF-8?q?linq=5Ffold:=20table=20joins=20=E2=80=94?=
 =?UTF-8?q?=20adapter-generalized=20emit=5Farray=5Fjoin=20+=20table-srcB?=
 =?UTF-8?q?=20key=20probe?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Stage 5 of the table arc (benchmarks/sql/LINQ_TO_TABLE.md). Two halves:

1. Lead generalization: emit_array_join takes its lead loop, bind name, and
   lead invoke-param spelling from the adapter (wrap_source_loop / bind_name /
   new SourceAdapter.invoke_param_type), so TableAdapter sets can_join=true and
   routes emit_join_hook to the same emitter — table-lead joins walk the kv
   usage-pruned slot iterators (a join touching only c.value.* walks values(tab)
   alone), group joins stay outer over every slot.

2. Table-srcB probe: a join whose srcb is each_kv(tab)/keys(set) joined on its
   bare key skips the internal table<KEY; array<TUPB>> + build loop — srcB binds
   the user's table and the per-A probe is a key lookup, usage-pruned like the
   point-lookup fold (count/key-only -> key_exists, value shapes -> by-ref bind
   off tab?[k], whole-pair -> kv tuple). Unique table keys make probe == hash
   semantics exactly; non-bare keybs and group joins keep the hashed build.

Per-pair statements factored into build_join_pair_core, shared by
build_join_standalone_pieces (group-join arm + bucket wrap unchanged for the
decs/xml/json callers) and the new build_join_probe_pieces.

m7 sweep: join_count 195.0 -> 65.6 ns/elem INTERP, join_where_count 229.1 ->
81.4; new join_probe 47.3 vs join_probe_build 79.1 (probe ~1.7x on identical
rows). Tests: fused-vs-hand-loop agreement both leads, probe shapes, declines
(non-bare keyb, group join), %linq! set-srcB + into forms. INTERP 10947/0,
AOT+JIT linq 1949/1949, Sphinx -W clean.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 benchmarks/sql/LINQ_TO_TABLE.md             |  37 +-
 benchmarks/sql/results.md                   | 280 ++++++-------
 benchmarks/sql/table.das                    |  43 ++
 daslib/linq_fold.md                         |   1 +
 daslib/linq_fold_common.das                 | 414 +++++++++++++++-----
 daslib/linq_fold_table.das                  |  12 +-
 doc/source/reference/linq_das.rst           |   5 +
 doc/source/reference/linq_fold_patterns.rst |  47 ++-
 tests/linq/test_linq_das.das                |  28 +-
 tests/linq/test_linq_table_source.das       | 190 +++++++++
 10 files changed, 799 insertions(+), 258 deletions(-)

diff --git a/benchmarks/sql/LINQ_TO_TABLE.md b/benchmarks/sql/LINQ_TO_TABLE.md
index 77d3efdf1..6cd55c533 100644
--- a/benchmarks/sql/LINQ_TO_TABLE.md
+++ b/benchmarks/sql/LINQ_TO_TABLE.md
@@ -4,8 +4,35 @@ Sibling of [LINQ.md](LINQ.md) / [LINQ_TO_DECS.md](LINQ_TO_DECS.md). Plan of reco
 `table<K;V>` / `table<K>` as the 6th `_fold` source, plus the `to_table` sink.
 Edited in-place as PRs land.
 
-Status: **stage 4 committed** (point-lookup folds; stage 3 = `%linq!` table sources, 29d23baf6;
-stage 2 = TableAdapter + m7, 571fe879e; stage 1 = `each_kv` builtin, 8751bb9ba).
+Status: **stage 5 committed** (join probe + table-lead joins; stage 4 = point-lookup folds,
+ac441c4a0; stage 3 = `%linq!` table sources, 29d23baf6; stage 2 = TableAdapter + m7, 571fe879e;
+stage 1 = `each_kv` builtin, 8751bb9ba).
+
+Stage 5 findings:
+- **`emit_array_join` generalized instead of a parallel `emit_table_join`**: the lead loop, bind
+  name, and lead invoke-param spelling now come from the adapter (`wrap_source_loop` /
+  `bind_name` / new `invoke_param_type` capability), so `TableAdapter.emit_join_hook` just routes
+  to `emit_array_join` and the kv usage-pruner sees the whole probe body for free — a table-lead
+  join touching only `c.value.*` walks `values(tab)` alone. Any future direct-return loop source
+  joins the same way; decs/xml/json keep their own hooks (nested-callback walks).
+- **srcB probe**: `join_srcb_table_call` (each_kv/keys over a table in the srcb slot) +
+  `join_keyb_is_bare_key` (peeled keyb is bare `d.key` / bare set element) switch the emitter to
+  `build_join_probe_pieces` — srcB binds the user's table (const param), no internal
+  `table<KEY; array<TUPB>>`, no build loop; the per-A probe usage-prunes like the point lookup
+  (count-no-where / key-only → `key_exists`, value shapes → by-ref bind off `tab?[k]`, whole-pair
+  → kv tuple bind). Skipping keyb's per-B evaluation is unobservable (a bare field read is pure
+  by construction — no `has_sideeffects` gate needed, unlike stage 4's X).
+- **Shared per-pair core**: `build_join_pair_core` factored out of `build_join_standalone_pieces`
+  (which keeps the group-join arm + bucket wrap); both builders emit identical per-pair
+  statements, so hash-mode AST is unchanged for the decs/xml/json callers of the standalone
+  builder. Group joins never probe — their result consumes the whole bucket.
+- The `_join` predicate splitter is **position-based** (`<a-side> == <b-side>`); a flipped
+  `d.key == a` fails to compile for any source (pre-existing). The probe matcher therefore only
+  sees keyb on the b-side.
+- m7 (2026-06-11 sweep): table-lead joins leave tier-2 — `join_count` 195.0 → 65.6 ns/elem
+  INTERP (33.1 JIT), `join_where_count` 229.1 → 81.4 (37.9 JIT). The probe A/B pair:
+  `join_probe` 47.3 vs `join_probe_build` 79.1 INTERP (24.2 vs 38.1 JIT) — skipping the
+  internal hash is ~1.7× on identical rows.
 
 Stage 4 findings:
 - `try_table_point_lookup` (linq_fold_table.das) runs in the dispatcher arm BEFORE pattern dispatch;
@@ -172,6 +199,12 @@ End of arc: `skills/linq.md` + linq docs mention the table source.
   values, buffer `(orderKey, key)` surrogates and materialize survivors via `tab?[key]` — K
   probes instead of N value copies. The table handle is its key; clean fit for the existing
   4-hook surface. Revisit once m7 numbers show whether it matters.
+- **decs/xml/json lead × table srcB probe**: those leads keep their own `emit_join_hook`
+  (nested-callback walks) and hash a table srcB like any iterator. Correct, just unprobed —
+  port `build_join_probe_pieces` into their hooks if a real chain wants it.
+- **Group-join probe**: a table srcB group join could bind a 0/1-element bucket from the probe
+  instead of hashing; the result lambda consumes `array<B>`, so it needs a synthesized
+  one-element array per hit. Hashed build is correct; revisit on demand.
 - Set-ops probe (`except`/`intersect` where the *other* side is a `table<K>`) — rides the
   engine-wide set-ops edge.
 - Fused-kv-over-non-copyable values (loosening the uniform gate) — only if a real use case
diff --git a/benchmarks/sql/results.md b/benchmarks/sql/results.md
index aede8f85a..4a9015f62 100644
--- a/benchmarks/sql/results.md
+++ b/benchmarks/sql/results.md
@@ -17,8 +17,9 @@ are stable now).
 - **m6f JSON** — `_fold` over `from_json(jv, type<Car>)` (`JsonAdapter`, same machinery, array walk).
 - **m7 Table** — `_fold` over `each_kv(table<int; Car>)` (`TableAdapter`; kv usage-pruning picks keys-only /
   values-only / zipped slot walks; key-equality `where` + terminator folds to an O(1) probe — the
-  `point_lookup` / `point_lookup_scan` pair measures it; group_by / join / reverse defer to tier-2
-  until their stages land).
+  `point_lookup` / `point_lookup_scan` pair measures it; joins fuse on either side, and a table srcB
+  joined on its bare key probes the table instead of building the join hash — the `join_probe` /
+  `join_probe_build` pair measures it; group_by / reverse defer to tier-2 until their stages land).
 
 `0.00` = early-exit terminator below timer resolution ("free"). Chain shapes are in
 `benchmarks/README.md`; the splice arms each fires are in `doc/source/reference/linq_fold_patterns.rst`.
@@ -33,167 +34,171 @@ signal, JIT deltas as indicative.**
 
 | Benchmark | SQL (m1) | Array (m3f) | Decs (m4) | XML fold (m5f) | JSON fold (m6f) | Table fold (m7) |
 |---|---:|---:|---:|---:|---:|---:|
-| `aggregate_match` | 34.9 | 5.9 | 5.8 | 60.7 | 160.3 | 19.1 |
-| `all_match` | 27.5 | 3.5 | 3.4 | 55.9 | 154.1 | 15.8 |
+| `aggregate_match` | 34.8 | 5.9 | 5.8 | 60.6 | 159.5 | 19.2 |
+| `all_match` | 27.5 | 3.5 | 3.4 | 56.1 | 154.1 | 16.4 |
 | `any_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
-| `average_aggregate` | 30.5 | 5.9 | 8.8 | 60.2 | 163.1 | 17.3 |
-| `bare_last` | — | 4.2 | 0.0 | 0.0 | 4.2 | 29.2 |
-| `bare_order_where` | 278.2 | 117.7 | 126.7 | 299.6 | 292.7 | 166.4 |
-| `chained_select_collapse` | — | 17.7 | 17.4 | 70.4 | 168.3 | 27.8 |
-| `chained_where` | 35.9 | 6.6 | 7.1 | 104.9 | 184.0 | 24.1 |
-| `contains_match` | 0.0 | 2.3 | 1.5 | 29.1 | 72.4 | 6.6 |
-| `count_aggregate` | 30.0 | 4.1 | 4.2 | 63.7 | 155.2 | 20.2 |
-| `cross_join` | 12604.3 | 3685.2 | — | 4006.6 | 4040.5 | — |
+| `average_aggregate` | 30.6 | 5.9 | 8.8 | 58.4 | 164.3 | 17.3 |
+| `bare_last` | — | 4.2 | 0.0 | 0.0 | 4.2 | 30.6 |
+| `bare_order_where` | 284.5 | 117.8 | 126.7 | 300.9 | 291.5 | 163.8 |
+| `chained_select_collapse` | — | 18.3 | 17.5 | 70.4 | 162.2 | 28.0 |
+| `chained_where` | 36.1 | 6.6 | 7.1 | 104.9 | 183.8 | 24.1 |
+| `contains_match` | 0.0 | 2.2 | 1.4 | 29.1 | 72.0 | 6.6 |
+| `count_aggregate` | 29.8 | 4.1 | 4.1 | 63.7 | 155.9 | 20.3 |
+| `cross_join` | 12556.2 | 3697.8 | — | 4012.8 | 4069.8 | — |
 | `decs_count_bare_pred` | — | — | 4.1 | — | — | — |
-| `distinct_by_count` | 40.9 | 15.6 | 15.6 | 70.6 | 162.2 | 26.3 |
-| `distinct_by_order_take` | 239.3 | 22.1 | 23.4 | 123.3 | 162.4 | 48.6 |
-| `distinct_by_order_to_array` | 237.8 | 22.1 | 23.5 | 124.1 | 163.3 | 48.6 |
-| `distinct_count` | 41.2 | 15.8 | 15.7 | 70.8 | 163.6 | 26.9 |
-| `distinct_count_pred` | 252.2 | 15.7 | 15.9 | 112.1 | 178.4 | 26.3 |
+| `distinct_by_count` | 41.0 | 15.7 | 15.6 | 70.6 | 160.7 | 26.6 |
+| `distinct_by_order_take` | 239.3 | 22.1 | 23.4 | 123.7 | 163.1 | 48.5 |
+| `distinct_by_order_to_array` | 238.9 | 22.1 | 23.5 | 124.2 | 163.1 | 48.8 |
+| `distinct_count` | 41.0 | 15.8 | 15.8 | 70.8 | 162.4 | 27.0 |
+| `distinct_count_pred` | 254.3 | 15.8 | 15.9 | 112.2 | 177.8 | 26.8 |
 | `distinct_take` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
 | `element_at_match` | 0.0 | 0.0 | 0.0 | 0.4 | 0.3 | 0.0 |
 | `first_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
 | `first_or_default_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
-| `groupby_average` | 170.5 | 29.3 | 29.3 | 122.7 | 197.8 | — |
-| `groupby_count` | 141.6 | 19.5 | 19.4 | 74.7 | 169.0 | 163.3 |
-| `groupby_first` | 252.3 | 19.4 | 20.1 | 71.8 | 163.5 | — |
-| `groupby_having_count` | 141.3 | 19.5 | 19.5 | 74.3 | 169.3 | — |
-| `groupby_having_hidden_sum` | 175.7 | 22.4 | 22.6 | 118.5 | 192.1 | — |
-| `groupby_having_post_where` | 171.6 | 20.8 | 21.6 | 114.8 | 188.9 | — |
-| `groupby_max` | 174.1 | 24.7 | 25.6 | 119.8 | 192.6 | — |
-| `groupby_min` | 173.5 | 25.1 | 26.2 | 119.9 | 193.4 | — |
-| `groupby_multi_reducer` | 189.9 | 30.2 | 30.6 | 125.1 | 196.0 | — |
-| `groupby_select_order` | 174.3 | 20.8 | 20.8 | 114.6 | 189.8 | — |
-| `groupby_select_sum` | 197.9 | 38.5 | 40.7 | 101.5 | 196.1 | — |
-| `groupby_sum` | 171.2 | 20.7 | 20.8 | 115.0 | 190.5 | 192.9 |
-| `groupby_where_count` | 75.7 | 14.0 | 14.3 | 115.5 | 187.7 | — |
-| `groupby_where_sum` | 86.5 | 14.1 | 14.7 | 116.3 | 186.7 | — |
-| `join_count` | 38.3 | 51.2 | 64.3 | 113.1 | 184.5 | 194.6 |
-| `join_groupby_count` | 157.7 | 79.1 | 88.6 | 177.7 | 232.0 | — |
-| `join_groupby_to_array` | 189.0 | 78.1 | 90.1 | 215.3 | 215.6 | — |
-| `join_select` | 151.5 | 72.6 | 85.0 | 188.5 | 215.8 | — |
-| `join_where_count` | 48.8 | 61.5 | 76.7 | 160.0 | 201.9 | 229.1 |
-| `last_match` | 0.0 | 5.9 | 13.9 | 65.1 | 159.0 | 30.9 |
-| `long_count_aggregate` | 28.9 | 4.1 | 4.2 | 63.3 | 154.6 | 20.3 |
-| `max_aggregate` | 30.7 | 6.0 | 6.9 | 58.7 | 163.1 | 17.0 |
-| `min_aggregate` | 30.6 | 6.0 | 6.9 | 58.6 | 163.3 | 17.1 |
-| `order_by_multi_key` | 339.9 | 271.4 | 283.6 | 458.8 | 446.1 | 334.3 |
-| `order_distinct_take` | 137.9 | 15.9 | 100.3 | 72.5 | 164.1 | 31.1 |
-| `order_reverse_normalized` | 38.3 | 16.2 | 20.3 | 70.7 | 170.9 | — |
-| `order_take_desc` | 38.2 | 16.2 | 20.6 | 70.1 | 170.2 | 33.3 |
+| `groupby_average` | 171.8 | 29.2 | 29.3 | 123.7 | 197.4 | — |
+| `groupby_count` | 141.9 | 19.5 | 19.5 | 75.0 | 167.5 | 162.7 |
+| `groupby_first` | 252.6 | 19.5 | 20.2 | 72.2 | 162.7 | — |
+| `groupby_having_count` | 141.8 | 19.5 | 19.5 | 74.8 | 169.1 | — |
+| `groupby_having_hidden_sum` | 175.7 | 23.3 | 22.6 | 118.8 | 192.7 | — |
+| `groupby_having_post_where` | 171.2 | 20.8 | 20.8 | 114.6 | 189.2 | — |
+| `groupby_max` | 173.9 | 24.9 | 25.4 | 120.5 | 193.1 | — |
+| `groupby_min` | 173.7 | 25.0 | 25.1 | 120.0 | 192.9 | — |
+| `groupby_multi_reducer` | 190.8 | 30.2 | 30.6 | 124.9 | 196.2 | — |
+| `groupby_select_order` | 170.9 | 20.8 | 20.8 | 114.8 | 188.6 | — |
+| `groupby_select_sum` | 198.9 | 38.6 | 38.2 | 101.7 | 195.2 | — |
+| `groupby_sum` | 170.8 | 20.8 | 20.8 | 114.9 | 188.4 | 192.8 |
+| `groupby_where_count` | 76.0 | 14.1 | 14.3 | 116.6 | 186.3 | — |
+| `groupby_where_sum` | 86.7 | 14.1 | 14.7 | 116.4 | 186.4 | — |
+| `join_count` | 38.3 | 51.3 | 64.6 | 113.1 | 183.4 | 65.6 |
+| `join_groupby_count` | 157.6 | 77.4 | 88.8 | 177.7 | 230.9 | — |
+| `join_groupby_to_array` | 189.1 | 78.0 | 90.6 | 215.4 | 213.5 | — |
+| `join_probe` | — | — | — | — | — | 47.3 |
+| `join_probe_build` | — | — | — | — | — | 79.1 |
+| `join_select` | 152.6 | 72.5 | 84.7 | 188.7 | 214.4 | — |
+| `join_where_count` | 48.6 | 61.6 | 76.8 | 160.4 | 199.8 | 81.4 |
+| `last_match` | 0.0 | 6.1 | 13.9 | 65.1 | 159.7 | 31.0 |
+| `long_count_aggregate` | 29.1 | 4.1 | 4.1 | 63.4 | 154.3 | 21.2 |
+| `max_aggregate` | 30.7 | 6.0 | 6.8 | 58.6 | 163.1 | 17.0 |
+| `min_aggregate` | 31.2 | 6.0 | 6.9 | 58.7 | 163.6 | 17.0 |
+| `order_by_multi_key` | 348.8 | 272.2 | 282.9 | 458.7 | 449.2 | 334.0 |
+| `order_distinct_take` | 137.8 | 15.9 | 99.3 | 72.5 | 162.8 | 31.3 |
+| `order_reverse_normalized` | 38.1 | 16.3 | 20.0 | 70.7 | 170.6 | — |
+| `order_take_desc` | 38.5 | 16.2 | 20.4 | 70.1 | 170.4 | 33.3 |
 | `point_lookup` | — | — | — | — | — | 0.0 |
 | `point_lookup_scan` | — | — | — | — | — | 8.4 |
-| `reverse_distinct_by` | 294.0 | 21.1 | 28.1 | 71.1 | 162.6 | — |
-| `reverse_take` | 0.1 | 0.0 | 0.2 | 0.0 | 26.1 | 58.9 |
+| `reverse_distinct_by` | 295.5 | 21.3 | 28.0 | 70.9 | 162.2 | — |
+| `reverse_take` | 0.1 | 0.0 | 0.2 | 0.0 | 26.2 | 58.8 |
 | `reverse_take_select` | 0.0 | 0.0 | 0.2 | 0.0 | 26.2 | — |
-| `select_count` | 0.1 | 0.0 | 2.2 | 68.3 | 2.2 | 0.0 |
-| `select_many` | — | 191.5 | — | — | — | — |
-| `select_where` | 197.5 | 11.2 | 19.4 | 195.6 | 183.7 | 37.5 |
-| `select_where_count` | 32.2 | 5.1 | 7.5 | 64.8 | 157.1 | 21.9 |
-| `select_where_order_take` | 36.2 | 12.2 | 15.1 | 72.5 | 165.1 | 34.5 |
-| `select_where_sum` | 37.1 | 7.5 | 7.5 | 66.4 | 162.2 | 23.3 |
-| `single_match` | 0.0 | 2.9 | 5.5 | 58.5 | 151.1 | 22.8 |
+| `select_count` | 0.1 | 0.0 | 2.2 | 69.3 | 2.2 | 0.0 |
+| `select_many` | — | 190.7 | — | — | — | — |
+| `select_where` | 207.9 | 11.2 | 19.5 | 195.5 | 188.7 | 37.6 |
+| `select_where_count` | 32.4 | 5.1 | 7.4 | 64.6 | 158.7 | 21.7 |
+| `select_where_order_take` | 36.3 | 12.3 | 15.1 | 72.7 | 164.5 | 34.5 |
+| `select_where_sum` | 37.2 | 7.5 | 7.5 | 66.5 | 164.6 | 23.3 |
+| `single_match` | 0.0 | 2.9 | 5.5 | 58.4 | 151.5 | 22.6 |
 | `skip_take` | 0.5 | 0.1 | 0.2 | 3.0 | 2.8 | 0.3 |
-| `skip_while_match` | 3.5 | 5.3 | 5.3 | 60.2 | 153.8 | 18.3 |
-| `sort_first` | 38.0 | 11.1 | 13.3 | 65.0 | 167.1 | 31.7 |
-| `sort_take` | 38.2 | 16.3 | 21.1 | 70.2 | 170.7 | 33.2 |
-| `sort_take_select` | 38.1 | 16.3 | 21.8 | 71.1 | 170.6 | 33.3 |
-| `sum_aggregate` | 30.6 | 2.1 | 2.1 | 54.8 | 152.8 | 13.5 |
-| `sum_where` | 32.9 | 4.4 | 4.3 | 63.4 | 154.2 | 20.6 |
-| `take_count` | 3.6 | 0.2 | 0.4 | 2.9 | 2.7 | 0.5 |
+| `skip_while_match` | 3.5 | 5.3 | 5.3 | 59.9 | 153.1 | 18.3 |
+| `sort_first` | 37.9 | 11.0 | 13.3 | 64.9 | 167.0 | 32.0 |
+| `sort_take` | 38.4 | 16.3 | 20.9 | 70.5 | 171.5 | 33.3 |
+| `sort_take_select` | 38.2 | 16.3 | 20.9 | 71.0 | 170.8 | 33.2 |
+| `sum_aggregate` | 29.6 | 2.1 | 2.1 | 54.4 | 153.0 | 13.5 |
+| `sum_where` | 32.1 | 4.4 | 11.5 | 63.8 | 154.6 | 21.3 |
+| `take_count` | 3.9 | 0.2 | 0.4 | 2.9 | 2.7 | 0.5 |
 | `take_count_filtered` | 1.1 | 0.2 | 0.2 | 1.3 | 1.1 | 0.3 |
 | `take_sum_aggregate` | 0.8 | 0.1 | 0.1 | 0.6 | 0.5 | 0.1 |
 | `take_where_count` | 0.9 | 0.1 | 0.1 | 0.7 | 0.6 | 0.2 |
-| `take_while_match` | 7.8 | 2.4 | 2.4 | 30.3 | 75.4 | 16.5 |
-| `to_array_filter` | 70.0 | 11.8 | 11.7 | 71.3 | 164.9 | 28.7 |
-| `where_join_count` | 41.2 | 29.0 | 42.0 | 132.1 | 168.9 | — |
-| `zip_count_pred` | 39.2 | 15.9 | — | 313.8 | 322.0 | — |
-| `zip_dot_product` | 46.1 | 12.6 | 10.6 | 308.6 | 319.3 | — |
-| `zip_dot_product_3arg` | 46.1 | 12.8 | — | 309.7 | 319.0 | — |
-| `zip_reverse_to_array` | — | 31.6 | — | 343.4 | 353.5 | — |
+| `take_while_match` | 7.8 | 2.4 | 2.4 | 30.2 | 75.6 | 16.4 |
+| `to_array_filter` | 70.2 | 11.8 | 11.8 | 71.5 | 165.1 | 29.0 |
+| `where_join_count` | 41.2 | 29.1 | 41.7 | 132.7 | 168.6 | — |
+| `zip_count_pred` | 39.3 | 15.9 | — | 315.0 | 321.2 | — |
+| `zip_dot_product` | 46.2 | 12.6 | 10.6 | 309.2 | 319.0 | — |
+| `zip_dot_product_3arg` | 46.2 | 12.8 | — | 309.4 | 320.7 | — |
+| `zip_reverse_to_array` | — | 31.7 | — | 345.0 | 353.4 | — |
 
 ## JIT
 
 | Benchmark | SQL (m1) | Array (m3f) | Decs (m4) | XML fold (m5f) | JSON fold (m6f) | Table fold (m7) |
 |---|---:|---:|---:|---:|---:|---:|
-| `aggregate_match` | 35.1 | 0.3 | 0.6 | 21.8 | 26.0 | 13.4 |
-| `all_match` | 27.8 | 0.3 | 0.2 | 18.1 | 25.2 | 13.5 |
+| `aggregate_match` | 35.0 | 0.3 | 0.6 | 21.7 | 27.1 | 13.5 |
+| `all_match` | 27.9 | 0.3 | 0.2 | 18.1 | 26.2 | 13.5 |
 | `any_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
-| `average_aggregate` | 30.1 | 1.0 | 3.6 | 18.1 | 24.6 | 13.5 |
-| `bare_last` | — | 0.4 | 0.0 | 0.0 | 0.0 | 17.1 |
-| `bare_order_where` | 185.6 | 34.0 | 35.2 | 106.5 | 53.5 | 78.9 |
-| `chained_select_collapse` | — | 1.1 | 1.1 | 20.6 | 33.4 | 14.0 |
-| `chained_where` | 36.2 | 0.6 | 0.8 | 35.6 | 31.4 | 17.7 |
-| `contains_match` | 0.0 | 0.2 | 0.1 | 17.5 | 9.0 | 4.7 |
-| `count_aggregate` | 29.3 | 0.3 | 0.6 | 20.5 | 25.3 | 13.5 |
-| `cross_join` | 5962.8 | 733.1 | — | 836.0 | 773.4 | — |
+| `average_aggregate` | 30.5 | 1.0 | 3.6 | 18.1 | 25.7 | 13.5 |
+| `bare_last` | — | 0.4 | 0.0 | 0.0 | 0.0 | 17.2 |
+| `bare_order_where` | 188.1 | 35.3 | 35.5 | 106.7 | 53.3 | 79.0 |
+| `chained_select_collapse` | — | 1.1 | 1.1 | 20.6 | 33.5 | 14.1 |
+| `chained_where` | 36.1 | 0.6 | 0.8 | 35.7 | 32.0 | 17.7 |
+| `contains_match` | 0.0 | 0.2 | 0.1 | 17.5 | 9.2 | 4.7 |
+| `count_aggregate` | 29.6 | 0.3 | 0.6 | 20.6 | 26.4 | 13.5 |
+| `cross_join` | 5976.1 | 733.7 | — | 837.5 | 767.7 | — |
 | `decs_count_bare_pred` | — | — | 0.6 | — | — | — |
-| `distinct_by_count` | 41.2 | 1.1 | 1.1 | 20.6 | 33.3 | 14.0 |
-| `distinct_by_order_take` | 237.1 | 1.7 | 2.6 | 47.4 | 39.1 | 30.3 |
-| `distinct_by_order_to_array` | 242.4 | 1.8 | 2.6 | 47.4 | 38.7 | 30.3 |
-| `distinct_count` | 40.9 | 1.1 | 1.1 | 20.6 | 33.3 | 14.0 |
-| `distinct_count_pred` | 250.6 | 1.1 | 1.3 | 37.7 | 43.5 | 14.0 |
+| `distinct_by_count` | 41.2 | 1.1 | 1.1 | 20.6 | 33.6 | 14.1 |
+| `distinct_by_order_take` | 239.4 | 1.7 | 2.6 | 47.4 | 39.2 | 30.1 |
+| `distinct_by_order_to_array` | 239.3 | 1.7 | 2.7 | 47.4 | 38.9 | 30.1 |
+| `distinct_count` | 41.3 | 1.1 | 1.1 | 20.5 | 33.7 | 14.1 |
+| `distinct_count_pred` | 252.4 | 1.1 | 1.3 | 37.4 | 43.4 | 14.1 |
 | `distinct_take` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
-| `element_at_match` | 0.0 | 0.0 | 0.0 | 0.2 | 0.0 | 0.0 |
+| `element_at_match` | 0.0 | 0.0 | 0.0 | 0.1 | 0.0 | 0.0 |
 | `first_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
 | `first_or_default_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
-| `groupby_average` | 171.1 | 1.6 | 1.9 | 35.9 | 45.7 | — |
-| `groupby_count` | 141.4 | 1.3 | 1.5 | 20.6 | 33.9 | 45.7 |
-| `groupby_first` | 250.7 | 1.3 | 2.3 | 20.6 | 34.3 | — |
-| `groupby_having_count` | 141.5 | 1.3 | 1.5 | 20.6 | 33.8 | — |
-| `groupby_having_hidden_sum` | 174.4 | 1.5 | 1.7 | 35.9 | 45.3 | — |
-| `groupby_having_post_where` | 170.1 | 1.4 | 2.0 | 35.8 | 44.2 | — |
-| `groupby_max` | 175.6 | 1.5 | 2.0 | 36.0 | 46.0 | — |
-| `groupby_min` | 172.4 | 1.5 | 1.8 | 36.0 | 46.0 | — |
-| `groupby_multi_reducer` | 189.6 | 1.6 | 2.0 | 36.1 | 46.1 | — |
-| `groupby_select_order` | 170.1 | 1.4 | 1.9 | 35.9 | 44.3 | — |
-| `groupby_select_sum` | 197.0 | 2.8 | 3.2 | 32.2 | 40.0 | — |
-| `groupby_sum` | 170.5 | 1.4 | 1.6 | 35.9 | 43.4 | 54.2 |
-| `groupby_where_count` | 75.6 | 0.9 | 1.3 | 36.0 | 41.7 | — |
-| `groupby_where_sum` | 86.2 | 0.9 | 1.3 | 35.9 | 41.7 | — |
-| `join_count` | 38.2 | 11.0 | 11.7 | 43.6 | 71.4 | 63.1 |
-| `join_groupby_count` | 156.8 | 18.0 | 20.1 | 68.5 | 90.1 | — |
-| `join_groupby_to_array` | 189.5 | 17.4 | 19.4 | 80.5 | 36.0 | — |
-| `join_select` | 93.2 | 19.6 | 21.7 | 74.8 | 94.5 | — |
-| `join_where_count` | 48.3 | 19.0 | 20.7 | 64.5 | 78.3 | 80.0 |
-| `last_match` | 0.0 | 0.5 | 1.4 | 18.8 | 25.9 | 22.9 |
-| `long_count_aggregate` | 28.8 | 0.3 | 0.6 | 20.6 | 25.4 | 13.5 |
-| `max_aggregate` | 30.5 | 0.3 | 0.5 | 18.3 | 26.7 | 13.4 |
-| `min_aggregate` | 30.6 | 0.3 | 0.5 | 18.3 | 26.6 | 13.5 |
-| `order_by_multi_key` | 249.4 | 53.4 | 54.8 | 125.6 | 71.1 | 129.8 |
-| `order_distinct_take` | 138.1 | 1.1 | 75.6 | 20.9 | 35.8 | 14.0 |
-| `order_reverse_normalized` | 38.0 | 0.7 | 1.4 | 24.6 | 27.6 | — |
-| `order_take_desc` | 37.9 | 0.7 | 1.3 | 24.6 | 27.9 | 17.8 |
+| `groupby_average` | 170.7 | 1.6 | 1.9 | 35.9 | 44.3 | — |
+| `groupby_count` | 141.5 | 1.3 | 1.5 | 20.6 | 32.7 | 42.9 |
+| `groupby_first` | 252.2 | 1.3 | 2.3 | 20.6 | 33.3 | — |
+| `groupby_having_count` | 141.3 | 1.3 | 1.5 | 20.6 | 33.3 | — |
+| `groupby_having_hidden_sum` | 175.6 | 1.5 | 1.7 | 36.0 | 45.2 | — |
+| `groupby_having_post_where` | 171.9 | 1.6 | 2.0 | 35.9 | 44.3 | — |
+| `groupby_max` | 172.8 | 1.5 | 1.9 | 36.0 | 45.9 | — |
+| `groupby_min` | 173.4 | 1.5 | 1.8 | 35.9 | 45.9 | — |
+| `groupby_multi_reducer` | 190.6 | 1.6 | 2.0 | 36.2 | 46.1 | — |
+| `groupby_select_order` | 170.6 | 1.4 | 1.9 | 35.7 | 44.2 | — |
+| `groupby_select_sum` | 198.6 | 2.8 | 3.2 | 32.2 | 39.7 | — |
+| `groupby_sum` | 170.3 | 1.4 | 1.7 | 35.8 | 44.2 | 51.5 |
+| `groupby_where_count` | 76.0 | 0.9 | 1.3 | 36.1 | 41.8 | — |
+| `groupby_where_sum` | 86.7 | 0.9 | 1.3 | 36.0 | 41.7 | — |
+| `join_count` | 38.3 | 10.9 | 11.7 | 43.5 | 71.4 | 33.1 |
+| `join_groupby_count` | 157.6 | 18.2 | 20.1 | 68.5 | 89.9 | — |
+| `join_groupby_to_array` | 189.7 | 17.6 | 19.5 | 80.3 | 36.2 | — |
+| `join_probe` | — | — | — | — | — | 24.2 |
+| `join_probe_build` | — | — | — | — | — | 38.1 |
+| `join_select` | 95.4 | 19.7 | 21.7 | 75.0 | 94.3 | — |
+| `join_where_count` | 39.4 | 18.9 | 20.8 | 64.4 | 78.4 | 37.9 |
+| `last_match` | 0.0 | 0.5 | 1.4 | 18.9 | 26.8 | 22.9 |
+| `long_count_aggregate` | 29.0 | 0.3 | 0.6 | 20.5 | 26.4 | 13.5 |
+| `max_aggregate` | 30.7 | 0.3 | 0.5 | 18.4 | 27.7 | 13.5 |
+| `min_aggregate` | 30.7 | 0.3 | 0.5 | 18.4 | 27.7 | 13.5 |
+| `order_by_multi_key` | 252.6 | 53.4 | 55.0 | 125.4 | 71.9 | 129.1 |
+| `order_distinct_take` | 137.9 | 1.1 | 75.7 | 20.9 | 36.0 | 14.0 |
+| `order_reverse_normalized` | 38.2 | 0.7 | 1.4 | 24.6 | 28.5 | — |
+| `order_take_desc` | 38.1 | 0.7 | 1.4 | 24.6 | 28.4 | 17.7 |
 | `point_lookup` | — | — | — | — | — | 0.0 |
-| `point_lookup_scan` | — | — | — | — | — | 6.1 |
-| `reverse_distinct_by` | 295.6 | 1.6 | 3.2 | 20.6 | 34.3 | — |
+| `point_lookup_scan` | — | — | — | — | — | 6.0 |
+| `reverse_distinct_by` | 295.4 | 1.5 | 3.2 | 20.6 | 34.6 | — |
 | `reverse_take` | 0.0 | 0.0 | 0.0 | 0.0 | 3.7 | 26.9 |
-| `reverse_take_select` | 0.0 | 0.0 | 0.0 | 0.0 | 3.7 | — |
-| `select_count` | 0.1 | 0.0 | 0.0 | 68.7 | 0.0 | 0.0 |
-| `select_many` | — | 64.0 | — | — | — | — |
-| `select_where` | 110.6 | 4.2 | 5.3 | 76.5 | 22.0 | 28.1 |
-| `select_where_count` | 32.3 | 0.3 | 0.6 | 18.6 | 26.7 | 13.5 |
-| `select_where_order_take` | 37.1 | 0.7 | 1.4 | 19.1 | 27.4 | 23.0 |
-| `select_where_sum` | 36.9 | 0.4 | 0.6 | 18.2 | 25.2 | 13.4 |
-| `single_match` | 0.0 | 0.4 | 1.1 | 46.3 | 22.2 | 17.3 |
-| `skip_take` | 0.3 | 0.0 | 0.0 | 1.3 | 0.2 | 0.2 |
-| `skip_while_match` | 3.5 | 0.4 | 0.4 | 46.7 | 21.7 | 13.3 |
-| `sort_first` | 38.3 | 0.4 | 1.3 | 18.2 | 26.7 | 17.3 |
-| `sort_take` | 38.2 | 0.7 | 1.4 | 24.7 | 27.8 | 17.8 |
-| `sort_take_select` | 37.6 | 0.7 | 1.4 | 24.7 | 27.8 | 17.8 |
-| `sum_aggregate` | 29.3 | 0.3 | 0.1 | 23.4 | 24.6 | 13.5 |
-| `sum_where` | 31.8 | 0.3 | 0.6 | 18.6 | 26.4 | 13.4 |
-| `take_count` | 1.9 | 0.1 | 0.1 | 1.2 | 0.2 | 0.2 |
-| `take_count_filtered` | 1.1 | 0.0 | 0.0 | 0.4 | 0.1 | 0.2 |
+| `reverse_take_select` | 0.0 | 0.0 | 0.0 | 0.0 | 3.9 | — |
+| `select_count` | 0.1 | 0.0 | 0.0 | 66.0 | 0.0 | 0.0 |
+| `select_many` | — | 62.7 | — | — | — | — |
+| `select_where` | 109.1 | 4.1 | 5.3 | 76.2 | 23.0 | 28.1 |
+| `select_where_count` | 32.3 | 0.3 | 0.6 | 18.5 | 27.2 | 13.4 |
+| `select_where_order_take` | 36.5 | 0.7 | 1.4 | 19.0 | 27.9 | 23.0 |
+| `select_where_sum` | 37.1 | 0.4 | 0.6 | 18.0 | 26.3 | 13.4 |
+| `single_match` | 0.0 | 0.4 | 1.1 | 46.3 | 23.2 | 17.4 |
+| `skip_take` | 0.3 | 0.0 | 0.0 | 1.3 | 0.2 | 0.1 |
+| `skip_while_match` | 3.5 | 0.4 | 0.4 | 45.8 | 22.7 | 13.3 |
+| `sort_first` | 37.9 | 0.4 | 1.3 | 18.1 | 27.5 | 17.3 |
+| `sort_take` | 37.9 | 0.7 | 1.4 | 24.6 | 28.3 | 17.8 |
+| `sort_take_select` | 37.8 | 0.7 | 1.4 | 24.6 | 28.4 | 17.8 |
+| `sum_aggregate` | 29.9 | 0.3 | 0.1 | 23.2 | 25.6 | 13.5 |
+| `sum_where` | 32.1 | 0.3 | 0.6 | 18.5 | 27.2 | 13.4 |
+| `take_count` | 1.8 | 0.1 | 0.1 | 1.2 | 0.3 | 0.2 |
+| `take_count_filtered` | 1.1 | 0.0 | 0.0 | 0.4 | 0.1 | 0.1 |
 | `take_sum_aggregate` | 0.8 | 0.0 | 0.0 | 0.2 | 0.0 | 0.1 |
 | `take_where_count` | 0.9 | 0.0 | 0.0 | 0.2 | 0.0 | 0.1 |
-| `take_while_match` | 7.7 | 0.2 | 0.3 | 17.3 | 9.0 | 13.4 |
-| `to_array_filter` | 48.2 | 3.2 | 3.3 | 21.6 | 35.0 | 20.4 |
-| `where_join_count` | 41.2 | 5.8 | 6.7 | 49.6 | 41.9 | — |
-| `zip_count_pred` | 38.6 | 0.1 | — | 117.0 | 33.9 | — |
-| `zip_dot_product` | 46.0 | 0.1 | 0.1 | 116.8 | 33.8 | — |
-| `zip_dot_product_3arg` | 45.9 | 0.1 | — | 116.8 | 33.7 | — |
-| `zip_reverse_to_array` | — | 4.6 | — | 128.3 | 51.4 | — |
+| `take_while_match` | 7.8 | 0.2 | 0.3 | 17.1 | 9.3 | 13.5 |
+| `to_array_filter` | 47.4 | 3.3 | 3.3 | 21.5 | 35.1 | 20.2 |
+| `where_join_count` | 39.4 | 5.8 | 6.8 | 49.7 | 42.3 | — |
+| `zip_count_pred` | 39.4 | 0.1 | — | 117.0 | 33.9 | — |
+| `zip_dot_product` | 46.5 | 0.1 | 0.1 | 117.1 | 33.8 | — |
+| `zip_dot_product_3arg` | 46.4 | 0.1 | — | 116.9 | 33.7 | — |
+| `zip_reverse_to_array` | — | 4.5 | — | 128.4 | 51.3 | — |
 <!-- BENCH:TABLES END -->
 
 ## Missing lanes (the `—` cells)
@@ -209,8 +214,9 @@ Each empty cell's reason is also in the bench `.das` file's comment; SQL gaps ar
 - **`reverse_distinct_by` m4 / m5f** — array uses the backward-index walk; non-array sources fuse the forward keep-last splice (decs 27.6/5.0, XML 74.5/22.2); SQL uses MAX(pk).
 - **`order_distinct_take` m4 vs m3f** — `unique_key` hashes workhorse keys directly (array `int`) but string-interpolates structs (decs `DecsBrand`); the gap is per-element string hashing, not decs-walk. `distinct_by_count` is the key-based variant (m4 parity).
 - **`zip_reverse_to_array` / `zip_*` SQL / Decs** — `reverse` has no SQL order key; zip is not relational / not expressible over one archetype walk. By design. (XML/JSON zip lanes are lit, partially fused.)
-- **m7 absent families** — `zip_*` / `cross_join` (lockstep over an unordered slot walk is meaningless), `select_many` (flat fixture, no nested array field), `order_reverse_normalized` / `reverse_take_select` / `reverse_distinct_by` (no backward slot walk; `reverse_take` is kept as the single deferral marker), the group-by tail beyond `groupby_count`/`groupby_sum` and joins beyond `join_count`/`join_where_count` (table group_by/join fusion is staged — see `LINQ_TO_TABLE.md`; the four marker cells track the tier-2 cost until then), `decs_count_bare_pred` (decs-only).
+- **m7 absent families** — `zip_*` / `cross_join` (lockstep over an unordered slot walk is meaningless), `select_many` (flat fixture, no nested array field), `order_reverse_normalized` / `reverse_take_select` / `reverse_distinct_by` (no backward slot walk; `reverse_take` is kept as the single deferral marker), the group-by tail beyond `groupby_count`/`groupby_sum` (table group_by fusion is staged — see `LINQ_TO_TABLE.md`; the two marker cells track the tier-2 cost until then) plus the join-composition lanes (`join_select` / `where_join_count` would fuse today but aren't instantiated; `join_groupby_*` needs the staged group_by), `decs_count_bare_pred` (decs-only).
 - **`point_lookup` / `point_lookup_scan` non-m7** — m7-only pair: only a table source has a key to probe (`where(kv.key == X)` + terminator → `key_exists` / `tab?[X]`, O(1)); the `_scan` twin forces the same query through the walk (compound `&&` predicate declines the probe) to show the gap. Other sources have no analog by design.
+- **`join_probe` / `join_probe_build` non-m7** — m7-only A/B pair: a table srcB joined on its bare key probes the user's table per lead row (no internal join hash, no build loop); the `_build` twin feeds the identical rows pre-materialized to a kv array, forcing the hashed build. Other sources have no keyed-srcB analog by design.
 
 ## Accepted floors
 
diff --git a/benchmarks/sql/table.das b/benchmarks/sql/table.das
index 2b49e1c31..e33e7ee64 100644
--- a/benchmarks/sql/table.das
+++ b/benchmarks/sql/table.das
@@ -10,20 +10,31 @@ require _common public
 let N = 100000
 
 typedef CarKV = tuple<key : int; value : Car>
+typedef DKV = tuple<key : int; value : Dealer>
 
 var g_t : table<int; Car>
 var g_dealers : array<Dealer>
+var g_dealer_t : table<int; Dealer>   // dealers keyed by id — the join_probe srcB
+var g_dealer_kv : array<DKV>          // same rows pre-materialized in slot order — the build-side baseline
 
 [init]
 def table_bench_init {
     g_t <- fixture_table(N)
     g_dealers <- fixture_dealers_array()
+    for (d in g_dealers) {
+        g_dealer_t |> insert(d.id, d)
+    }
+    for (k, v in keys(g_dealer_t), values(g_dealer_t)) {
+        g_dealer_kv |> push((key = k, value = v))
+    }
 }
 
 [finalize]
 def table_bench_fini {
     delete g_t
     delete g_dealers
+    delete g_dealer_t
+    delete g_dealer_kv
 }
 
 [benchmark]
@@ -292,6 +303,38 @@ def join_count_m7(b : B?) {
     }
 }
 
+[benchmark]
+def join_probe_m7(b : B?) {
+    // srcB is a table joined on its bare key → fused key probe, no internal join hash
+    b |> run("join_probe", N) {
+        let c = _fold(unsafe(each_kv(g_t)) |> _join(unsafe(each_kv(g_dealer_t)),
+                                $(c : CarKV, d : DKV) => c.value.dealer_id == d.key,
+                                $(c : CarKV, d : DKV) => (CarPrice = c.value.price, DealerName = d.value.name))
+                       |> _where(_.CarPrice > 500)
+                       |> count())
+        b |> accept(c)
+        if (c == 0) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def join_probe_build_m7(b : B?) {
+    // build-side baseline: identical rows from the pre-materialized kv array — the join hashes srcB
+    b |> run("join_probe_build", N) {
+        let c = _fold(unsafe(each_kv(g_t)) |> _join(g_dealer_kv,
+                                $(c : CarKV, d : DKV) => c.value.dealer_id == d.key,
+                                $(c : CarKV, d : DKV) => (CarPrice = c.value.price, DealerName = d.value.name))
+                       |> _where(_.CarPrice > 500)
+                       |> count())
+        b |> accept(c)
+        if (c == 0) {
+            b->failNow()
+        }
+    }
+}
+
 [benchmark]
 def join_where_count_m7(b : B?) {
     b |> run("join_where_count", N) {
diff --git a/daslib/linq_fold.md b/daslib/linq_fold.md
index 8f313f4b4..67388d8db 100644
--- a/daslib/linq_fold.md
+++ b/daslib/linq_fold.md
@@ -660,6 +660,7 @@ The imperative code has a few subtle co-occurrence rules that may not map cleanl
 - **2026-05-31 (deferred materialization — handle-buffering for buffered reducers)** — the buffered reducers (`order_by`/`sort`/`reverse` + `take`/`first`, `distinct_by |> order_by`) materialized the full `Car` (its `name` clone) for *every* source element before the reducer kept only K — `from_xml_node` builds all N. Fix: the reducer buffers a cheap **surrogate** — `(orderKey, xml_node)` for the order emits, a bare `xml_node` for reverse (no key) — and `build_xml_row` runs only for the K survivors. The comparator is the fixed `_::less(a._0, b._0)` on the precomputed key; where/distinct are consumed during the walk (cheap field reads gate which elements get a surrogate), so they never enter the surrogate. **The abstraction is source-generic** (an "element handle"): the surrogate machinery + materialize-survivors tail live in `linq_fold_common` (`build_surrogate_type` / `build_surrogate_cmp` / `build_surrogate_materialize_loop`); each source supplies `defers_materialization()` + `handle_type()` + `current_handle_expr()` + `materialize_handle()`. Only `XmlAdapter` overrides them this PR (a future `linq_json` is just those 4 hooks); `array`/`decs` inherit the no-defer default and stay **byte-identical** (their backing store is pre-materialized, so their reducers already clone only ~K heap-entrants — confirmed by `benchmarks/micro/sort_distinct_take_shapes.das`, where the `array<Car?>` pointer form is slower or tied). Wired into `emit_bounded_heap` (take), `emit_fused_prefilter` (distinct-only no-take arm — the pure-where case is already materialize-under-guard'd), `emit_streaming_min` (first), and `emit_reverse_buffer_inplace` (reverse + take). **Design validated by hand-coded micro-bench first** (`benchmarks/micro/sort_distinct_take_shapes.das`). Wins (m5f INTERP / JIT): `sort_take` 338 → 69 / 17, `order_take_desc` 343 → 69, `distinct_by_order_take` 354 → 126 / 46, `select_where_order_take` 228 → 71, `distinct_by_order_to_array` 356 → 131 / 46, `sort_first` 336 → 64 / 17, `reverse_take` 360 → 90 / 70 — string clones 100 000 → K everywhere. Not deferred (inherent floor / out of scope): `bare_order_where` (already at the under-guard survivor floor), `order_reverse_normalized` (order+reverse → all rows out), `reverse_distinct_by` (tier-2, no fused emit), `groupby_first` (group_by path).
 - **2026-05-31 (deferred materialization — `last` + group-by `first`)** — extends the element-handle deferral to the two remaining survivors-≪-N reducers: the full-walk `last`/`last_or_default` terminator (in `emit_early_exit_lane`) and `first`-per-group inside `plan_group_by_core`. `last` cloned the whole `Car` (`lst := it`) on *every* match and kept only the final one; over a deferring source it now stores the node **handle** per match and runs `materialize_handle` once, for the single survivor. `group_by(brand) |> select((key, first per group))` pinned the whole row (`slot := it`) in `mk_reducer_first`, forcing `wrap_source_loop` to build every element; a new `mk_reducer_first_deferred` materializes from the handle *inside the table miss-branch*, so the walk field-prunes to just the group key and `build_xml_row` runs only once per distinct group. Both ride the same four `SourceAdapter` hooks — only `XmlAdapter` defers; `array`/`decs` pass `null`/no-defer and stay byte-identical (the `emit_reducer_branches` adapter param defaults to `null`; the group-by gate also requires the bind be the raw element — `itName == bind_name`, i.e. no upstream `_select` rebinds it — since the handle yields the raw row). **Design validated by hand-coded micro-bench first** (the `last_match` / `groupby_first` lanes in `benchmarks/micro/sort_distinct_take_shapes.das`). Wins (m5f INTERP / JIT, string clones 100 000 → K): `last_match` 219 → 65 / 21 (K=1), `groupby_first` 339 → 72 / 22 (K=#brands). Closes `groupby_first` (the last item on the prior entry's floor list). Still not deferred: `bare_order_where` / `order_reverse_normalized` (all rows out), `reverse_distinct_by` (tier-2, no fused emit).
 - **2026-05-31 (forward keep-last — `reverse |> distinct[_by]` over forward sources)** — the only buffered shape still falling to tier-2 over a forward source. `reverse() |> distinct_by(K) |> to_array()` means "keep the LAST forward row per key, output in reverse-discovery order." The sole fused emit was `emit_reverse_backward_walk_dset_gate` — a backward **index** walk (`src[len-1-k]`) gated `array_source`, so XML / decs / plain iterators (forward-only, no random access) cascaded: `reverse()` materialized all N, then `distinct_by` walked. New `emit_reverse_distinct_forward_keeplast` (R-2b, gated by the exact complement `non_array_source`) does a single forward pass instead — `table<key; (seq, val)>`, **OVERWRITE** the slot per element (so it ends at the last forward occurrence + its seq), then sort survivors by **descending seq** (`build_surrogate_cmp(true)`) and emit. Output-identical to the backward walk (descending forward-index of each last occurrence), proven by parity vs both `m3f` (array backward walk) and the tier-2 cascade. It rides `emit_terminator_lane` → `wrap_source_loop`, so it's source-generic: **XML defers** (the table holds `(seq, xml_node)` and `build_xml_row` runs only for the K survivors — field-pruned to the key); **decs / iterator** store the full element (no handle), winning single-pass over the cascade's reverse-buffer + second walk. `ctx.top` is `null` for decs (bridge-driven), so `elemType` falls back to `ctx.src->element_type()`; arrays still match the backward-walk row first (registered earlier), so they're byte-identical. **Design validated by hand-coded micro-bench first** (the `reverse_distinct_by` lane in `benchmarks/micro/sort_distinct_take_shapes.das`: INTERP 405.8 → 88.6, JIT 162.6 → 37.0, string clones 100 000 → #keys). Wins: `reverse_distinct_by` m5f **429 → 74 INTERP / 166.6 → 22 JIT** (clones 100 000 → 5), and the previously-`—` decs **m4 lights up at 27.7 / 5.0** (near the array fast path). Closes `reverse_distinct_by` — the last forward-source buffered floor.
+- **2026-06-11 (table joins — adapter-generalized `emit_array_join` + table-srcB probe)** — table-arc stage 5 (branch `bbatkin/linq-table-each-kv`; plan: `benchmarks/sql/LINQ_TO_TABLE.md`). Two halves. (1) **Lead generalization**: `emit_array_join` no longer hand-rolls its `for (tup_a in srcA)` — the lead loop, bind name, and lead invoke-param spelling come from the adapter (`wrap_source_loop(LoopDispatch(Each=null))` / `bind_name(at)` / new `SourceAdapter.invoke_param_type()` capability, default `invoke_src_param_type(arrayTop())`), so `TableAdapter` just sets `can_join() = true` and routes `emit_join_hook` to the same emitter: a table-lead join walks the kv usage-pruned slot iterator(s) — a join body touching only `c.value.*` walks `values(tab)` alone — and group joins stay outer over every slot. decs/xml/json hooks untouched (nested-callback walks). (2) **Table-srcB probe**: when the join's srcb is `each_kv(tab)` / `keys(set)` joined on its **bare key** (`join_srcb_table_call` + `join_keyb_is_bare_key` on the peeled keyb), the emitter skips the internal `table<KEY; array<TUPB>>` + build loop entirely — srcB binds the user's table (const param) and the per-A probe is a key lookup, usage-pruned like the point-lookup fold (count-no-where / key-only → `key_exists`, value shapes → by-ref bind off `unsafe(tab?[k])`, whole-pair → kv-tuple bind). Unique table keys ⇒ probe ≡ hash semantics exactly; a bare field read is pure by construction so skipping keyb's per-B evaluation is unobservable; non-bare keybs and `group_join` (result consumes the whole bucket) keep the hashed build. Plumbing: per-pair statements factored into `build_join_pair_core` (`JoinPairCore`), shared by `build_join_standalone_pieces` (keeps the group-join arm + `get`-bucket wrap — hash-mode AST unchanged for the decs/xml/json callers) and the new `build_join_probe_pieces`. m7: `join_count` / `join_where_count` (table lead) leave tier-2; new `join_probe` vs `join_probe_build` A/B lanes.
 
 ## Open questions
 
diff --git a/daslib/linq_fold_common.das b/daslib/linq_fold_common.das
index f08036349..3037749f2 100644
--- a/daslib/linq_fold_common.das
+++ b/daslib/linq_fold_common.das
@@ -126,6 +126,10 @@ class SourceAdapter {
     def arraySrcName() : string {
         return ""
     }
+    def invoke_param_type() : TypeDeclPtr {     // invoke-param spelling for the source argument (join lane's lead param)
+        var top = arrayTop()
+        return top != null ? invoke_src_param_type(top) : null
+    }
 }
 
 // Decorator adapter — absorbs a leading `_select(f)` source projection: binds `projName = f(innerBind)` atop
@@ -163,11 +167,9 @@ class ProjectedSourceAdapter : SourceAdapter {
     }
 }
 
-// ===== Field-pruning row-usage scanner (shared by the XML / JSON source adapters) =====
-// Scan a chain body for which Row fields it reads, so a deferred source's per-element materialization
-// reads only those fields. Pure AST (bind name → field reads); each adapter keeps its own per-field read
-// / full-row build. A whole-`it` ref (to_array push_clone, identity select, pass-to-fn) sets
-// allFieldsUsed → the caller falls back to full materialization.
+// ===== Field-pruning row-usage scanner (shared by the XML / JSON / table source adapters) =====
+// Scans a chain body for which Row fields it reads so per-element materialization fetches only those;
+// a whole-`it` ref (to_array push_clone, identity select, pass-to-fn) sets allFieldsUsed → full row.
 
 class RowUsageScanner : AstVisitor {
     bindName      : string
@@ -5541,6 +5543,120 @@ def extract_join_lead_where(var c : Captures) : Expression? {
     return call != null ? call.arguments[1] : null
 }
 
+// JoinPairCore — per-matched-(a,b)-pair statements shared by the hashed-join builder (wraps them in the
+// bucket walk) and the table-srcB probe builder (runs them once per key hit). pairStmts reference
+// tupAName + bElemName. Group joins are NOT built here (their result consumes the whole bucket).
+struct JoinPairCore {
+    preludeStmts  : array<Expression?>
+    pairStmts     : array<Expression?>
+    returnStmt    : Expression?
+    invokeRetType : TypeDeclPtr
+    cntName       : string   // count terminator's accumulator; "" otherwise
+}
+
+[macro_function]
+def build_join_pair_core(
+                         var joinCall  : ExprCall?;
+                         var whereLam  : Expression?;
+                         var selectLam : Expression?;
+                         countOnly     : bool;
+                         tupAName      : string;
+                         bElemName     : string;
+                         namePrefix    : string;
+                         at            : LineInfo
+                         ) : JoinPairCore? {
+    let cntName     = qn(namePrefix + "_cnt", at)
+    let bufName     = qn(namePrefix + "_buf", at)
+    let resBindName = qn(namePrefix + "_res", at)
+    var preludeStmts : array<Expression?>
+    var pairStmts : array<Expression?>
+    var returnStmt : Expression?
+    var invokeRetType : TypeDeclPtr
+    if (countOnly) {
+        invokeRetType = new TypeDecl(baseType = Type.tInt, at = at)
+        preludeStmts |> push <| qmacro_expr() {
+            var $i(cntName) : int = 0
+        }
+        if (whereLam == null) {
+            pairStmts |> push <| qmacro_expr() {
+                $i(cntName) ++
+            }
+        } else {
+            // HAVING-shape: bind result, evaluate predicate, conditional incr.
+            var resultLam = joinCall.arguments[4]
+            if (resultLam == null || resultLam._type == null || resultLam._type.firstType == null) return null
+            var resultBody = peel_lambda_rename_2vars(resultLam, tupAName, bElemName)
+            if (resultBody == null) return null
+            let joinResultType = strip_const_ref(clone_type(resultLam._type.firstType))
+            var wherePred = peel_lambda_replace_var(whereLam, qmacro($i(resBindName)))
+            pairStmts |> push_from <| qmacro_block_to_array() {
+                let $i(resBindName) : $t(joinResultType) = $e(resultBody)
+                if ($e(wherePred)) {
+                    $i(cntName) ++
+                }
+            }
+        }
+        returnStmt = qmacro_expr() {
+            return $i(cntName)
+        }
+        return <- new JoinPairCore(preludeStmts <- preludeStmts, pairStmts <- pairStmts,
+                                   returnStmt = returnStmt, invokeRetType = invokeRetType, cntName = cntName)
+    }
+    var resultLam = joinCall.arguments[4]
+    if (resultLam == null || resultLam._type == null || resultLam._type.firstType == null) return null
+    var resultBody = peel_lambda_rename_2vars(resultLam, tupAName, bElemName)
+    if (resultBody == null) return null
+    // Buffer element type = after-select projection when select is trailing, else result-lam return type. Use peel_lambda_single_return universally: selCall._type.firstType may stay as unresolved typedecl(result_selector(type<TT>)) when chain doesn't end with to_array() (array-overload select returns array directly so no enclosing wrap forces resolution). Lambda-body's _type is always resolved post inner-first expansion.
+    var resultType : TypeDeclPtr
+    if (selectLam != null) {
+        var selBody = peel_lambda_single_return(selectLam)
+        if (selBody == null || selBody._type == null) return null
+        resultType = strip_const_ref(clone_type(selBody._type))
+    } else {
+        // strip_const_ref: a scalar/field result (`$(c,d)=>c.name`, `string const&`) would make the buffer
+        // `array<string const>` and push_clone fail (error[30913]). A named-tuple result is already a
+        // fresh non-const value (no-op there) — which is why only bare-scalar join projections tripped it.
+        resultType = strip_const_ref(clone_type(resultLam._type.firstType))
+    }
+    if (resultType == null) return null
+    invokeRetType = new TypeDecl(baseType = Type.tArray, firstType = clone_type(resultType), at = at)
+    preludeStmts |> push <| qmacro_expr() {
+        var $i(bufName) : array<$t(resultType)>
+    }
+    let needBind = selectLam != null || whereLam != null
+    if (needBind) {
+        let joinResultType = strip_const_ref(clone_type(resultLam._type.firstType))
+        var pushExpr : Expression?
+        if (selectLam != null) {
+            var projBody = peel_lambda_replace_var(selectLam, qmacro($i(resBindName)))
+            pushExpr = qmacro($i(bufName) |> push_clone($e(projBody)))
+        } else {
+            pushExpr = qmacro($i(bufName) |> push_clone($i(resBindName)))
+        }
+        if (whereLam != null) {
+            var wherePred = peel_lambda_replace_var(whereLam, qmacro($i(resBindName)))
+            pairStmts |> push_from <| qmacro_block_to_array() {
+                let $i(resBindName) : $t(joinResultType) = $e(resultBody)
+                if ($e(wherePred)) {
+                    $e(pushExpr)
+                }
+            }
+        } else {
+            pairStmts |> push_from <| qmacro_block_to_array() {
+                let $i(resBindName) : $t(joinResultType) = $e(resultBody)
+                $e(pushExpr)
+            }
+        }
+    } else {
+        pairStmts |> push <| qmacro($i(bufName) |> push_clone($e(resultBody)))
+    }
+    returnStmt = qmacro_expr() {
+        return <- $i(bufName)
+    }
+    return <- new JoinPairCore(preludeStmts <- preludeStmts, pairStmts <- pairStmts,
+                               returnStmt = returnStmt, invokeRetType = invokeRetType, cntName = "")
+}
+
 [macro_function]
 def build_join_standalone_pieces(
                                  var joinCall   : ExprCall?;
@@ -5558,46 +5674,34 @@ def build_join_standalone_pieces(
                                  ) : JoinStandalonePieces? {
     let bElemName   = qn(namePrefix + "_b", at)
     let arrName     = qn(namePrefix + "_arr", at)
-    let cntName     = qn(namePrefix + "_cnt", at)
     let bufName     = qn(namePrefix + "_buf", at)
-    let resBindName = qn(namePrefix + "_res", at)
     let emptyArrName = qn(namePrefix + "_empty", at)   // group-join: empty bucket for an unmatched left row
     var preludeStmts : array<Expression?>
     var probeInnerStmts : array<Expression?>
     var returnStmt : Expression?
     var invokeRetType : TypeDeclPtr
     var groupEmptyExpr : Expression?   // group-join only: the `buf |> push_clone(result(a, <empty>))` for the outer branch
-    if (countOnly) {
-        invokeRetType = new TypeDecl(baseType = Type.tInt, at = at)
-        preludeStmts |> push <| qmacro_expr() {
-            var $i(cntName) : int = 0
-        }
-        if (whereLam == null) {
+    if (!isGroupJoin) {
+        var core = build_join_pair_core(joinCall, whereLam, selectLam, countOnly, tupAName, bElemName, namePrefix, at)
+        if (core == null) return null
+        preludeStmts <- core.preludeStmts
+        returnStmt = core.returnStmt
+        invokeRetType = core.invokeRetType
+        if (countOnly && whereLam == null) {
             // Fast path (PR D2-A guarantee): bucket-length sum at bucket granularity, never enters per-pair loop.
+            let cntName = core.cntName
             probeInnerStmts |> push <| qmacro_expr() {
                 $i(cntName) += length($i(arrName))
             }
         } else {
-            // HAVING-shape: bind result, evaluate predicate, conditional incr.
-            var resultLam = joinCall.arguments[4]
-            if (resultLam == null || resultLam._type == null || resultLam._type.firstType == null) return null
-            var resultBody = peel_lambda_rename_2vars(resultLam, tupAName, bElemName)
-            if (resultBody == null) return null
-            let joinResultType = strip_const_ref(clone_type(resultLam._type.firstType))
-            var wherePred = peel_lambda_replace_var(whereLam, qmacro($i(resBindName)))
+            var pairStmts <- core.pairStmts
             probeInnerStmts |> push <| qmacro_expr() {
                 for ($i(bElemName) in $i(arrName)) {
-                    let $i(resBindName) : $t(joinResultType) = $e(resultBody)
-                    if ($e(wherePred)) {
-                        $i(cntName) ++
-                    }
+                    $b(pairStmts)
                 }
             }
         }
-        returnStmt = qmacro_expr() {
-            return $i(cntName)
-        }
-    } elif (isGroupJoin) {
+    } else {
         // group join: the result's 2nd param is the WHOLE bucket (the group), so we emit one buffer row per
         // left row (no per-match inner loop). emit_*_join guarantees no trailing where/select/count reaches here.
         var resultLam = joinCall.arguments[4]
@@ -5621,66 +5725,6 @@ def build_join_standalone_pieces(
         returnStmt = qmacro_expr() {
             return <- $i(bufName)
         }
-    } else {
-        var resultLam = joinCall.arguments[4]
-        if (resultLam == null || resultLam._type == null || resultLam._type.firstType == null) return null
-        var resultBody = peel_lambda_rename_2vars(resultLam, tupAName, bElemName)
-        if (resultBody == null) return null
-        // Buffer element type = after-select projection when select is trailing, else result-lam return type. Use peel_lambda_single_return universally: selCall._type.firstType may stay as unresolved typedecl(result_selector(type<TT>)) when chain doesn't end with to_array() (array-overload select returns array directly so no enclosing wrap forces resolution). Lambda-body's _type is always resolved post inner-first expansion.
-        var resultType : TypeDeclPtr
-        if (selectLam != null) {
-            var selBody = peel_lambda_single_return(selectLam)
-            if (selBody == null || selBody._type == null) return null
-            resultType = strip_const_ref(clone_type(selBody._type))
-        } else {
-            // strip_const_ref: a scalar/field result (`$(c,d)=>c.name`, `string const&`) would make the buffer
-            // `array<string const>` and push_clone fail (error[30913]). A named-tuple result is already a
-            // fresh non-const value (no-op there) — which is why only bare-scalar join projections tripped it.
-            resultType = strip_const_ref(clone_type(resultLam._type.firstType))
-        }
-        if (resultType == null) return null
-        invokeRetType = new TypeDecl(baseType = Type.tArray, firstType = clone_type(resultType), at = at)
-        preludeStmts |> push <| qmacro_expr() {
-            var $i(bufName) : array<$t(resultType)>
-        }
-        let needBind = selectLam != null || whereLam != null
-        if (needBind) {
-            let joinResultType = strip_const_ref(clone_type(resultLam._type.firstType))
-            var pushExpr : Expression?
-            if (selectLam != null) {
-                var projBody = peel_lambda_replace_var(selectLam, qmacro($i(resBindName)))
-                pushExpr = qmacro($i(bufName) |> push_clone($e(projBody)))
-            } else {
-                pushExpr = qmacro($i(bufName) |> push_clone($i(resBindName)))
-            }
-            if (whereLam != null) {
-                var wherePred = peel_lambda_replace_var(whereLam, qmacro($i(resBindName)))
-                probeInnerStmts |> push <| qmacro_expr() {
-                    for ($i(bElemName) in $i(arrName)) {
-                        let $i(resBindName) : $t(joinResultType) = $e(resultBody)
-                        if ($e(wherePred)) {
-                            $e(pushExpr)
-                        }
-                    }
-                }
-            } else {
-                probeInnerStmts |> push <| qmacro_expr() {
-                    for ($i(bElemName) in $i(arrName)) {
-                        let $i(resBindName) : $t(joinResultType) = $e(resultBody)
-                        $e(pushExpr)
-                    }
-                }
-            }
-        } else {
-            probeInnerStmts |> push <| qmacro_expr() {
-                for ($i(bElemName) in $i(arrName)) {
-                    $i(bufName) |> push_clone($e(resultBody))
-                }
-            }
-        }
-        returnStmt = qmacro_expr() {
-            return <- $i(bufName)
-        }
     }
     // Probe-bucket wrap: get(hash, keya, $(var arr) { probeInnerStmts }) — matches table.get's mutating overload.
     var probeOuter <- qmacro_expr() {
@@ -5721,6 +5765,145 @@ def build_join_standalone_pieces(
     )
 }
 
+// srcb (arguments[1] of `join(...)`) spelled as `each_kv(tab)` / `keys(tab)` — the table-probe candidate.
+// Name + table-typed-arg match like extract_table_source (values() carries no key, so it stays hashed).
+[macro_function]
+def join_srcb_table_call(var joinCall : ExprCall?) : ExprCall? {
+    if (joinCall == null || (joinCall.arguments |> length) < 2) return null
+    var srcb = joinCall.arguments[1]
+    if (srcb == null || !(srcb is ExprCall)) return null
+    var call = srcb as ExprCall
+    let name = get_call_short_name(call)
+    if ((name != "each_kv" && name != "keys")
+            || call._type == null || !call._type.isIterator || call._type.firstType == null
+            || (call.arguments |> length) != 1) {
+        return null
+    }
+    let srcT = call.arguments[0]._type
+    if (srcT == null || !srcT.isGoodTableType) return null
+    return call
+}
+
+// keyb (peeled, binder renamed to bindName) selects the table key itself: bare `kv.key` (kv lane) or the
+// bare element (keys lane). Then the join key IS the table key and a lookup replaces the bucket walk.
+[macro_function]
+def join_keyb_is_bare_key(var keybBody : Expression?; bindName : string; kvLane : bool) : bool {
+    var k = keybBody
+    if (k != null && k is ExprRef2Value) {
+        k = (k as ExprRef2Value).subexpr
+    }
+    if (!kvLane) {
+        return k != null && k is ExprVar && (k as ExprVar).name == bindName
+    }
+    if (k == null || !(k is ExprField)) return false
+    var f = k as ExprField
+    if (f.name != "key") return false
+    var base = f.value
+    if (base != null && base is ExprRef2Value) {
+        base = (base as ExprRef2Value).subexpr
+    }
+    return base != null && base is ExprVar && (base as ExprVar).name == bindName
+}
+
+// Table-srcB twin of build_join_standalone_pieces: unique table keys ⇒ bucket size ≤ 1, so there is no
+// internal hash — probeOuter looks the lead key up in the table bound at tabSrcName. Group joins never
+// come here (caller gates); the count-no-where fast path probes key_exists only.
+[macro_function]
+def build_join_probe_pieces(
+                            var joinCall   : ExprCall?;
+                            var whereLam   : Expression?;
+                            var selectLam  : Expression?;
+                            var leadWhereLam : Expression?;
+                            countOnly      : bool;
+                            var keyaBody   : Expression?;
+                            tupAName       : string;
+                            tabSrcName     : string;
+                            srcbKv         : bool;
+                            namePrefix     : string;
+                            at             : LineInfo
+                            ) : JoinStandalonePieces? {
+    let bElemName = qn(namePrefix + "_b", at)
+    let kName     = qn(namePrefix + "_k", at)
+    let pName     = qn(namePrefix + "_p", at)
+    var core = build_join_pair_core(joinCall, whereLam, selectLam, countOnly, tupAName, bElemName, namePrefix, at)
+    if (core == null) return null
+    var pairStmts <- core.pairStmts
+    var probeOuter : Expression?
+    if (countOnly && whereLam == null) {
+        // membership is enough — pairStmts is the bare `cnt++` (the bucket-length analog for a unique key)
+        probeOuter = qmacro_expr() {
+            if (key_exists($i(tabSrcName), $e(keyaBody))) {
+                $b(pairStmts)
+            }
+        }
+    } elif (srcbKv) {
+        // kv lane, pruned to what the pair statements read from b: key-only shapes stay on key_exists,
+        // value shapes bind by reference from the probed pointer (no copy), whole-pair use binds the kv
+        // tuple. Safe-index is unsafe (rehash-dangling ptr) — fine, the generated invoke never mutates.
+        let vName = qn(namePrefix + "_v", at)
+        var pairBlock = qmacro_block() {
+            $b(pairStmts)
+        }
+        var (allUsed, usedFields) = collect_row_usage(pairBlock, bElemName)
+        if (allUsed) {
+            probeOuter = qmacro_block() {
+                let $i(kName) = $e(keyaBody)
+                let $i(pName) = unsafe($i(tabSrcName)?[$i(kName)])
+                if ($i(pName) != null) {
+                    let $i(bElemName) = (key = $i(kName), value = *$i(pName))
+                    $e(pairBlock)
+                }
+            }
+        } else {
+            var fieldToLocal <- { "key" => kName, "value" => vName }
+            var flatBlock = flatten_row_to_locals(pairBlock, bElemName, fieldToLocal)
+            if (usedFields |> has_value("value")) {
+                probeOuter = qmacro_block() {
+                    let $i(kName) = $e(keyaBody)
+                    let $i(pName) = unsafe($i(tabSrcName)?[$i(kName)])
+                    if ($i(pName) != null) {
+                        let $i(vName) & = unsafe(*$i(pName))
+                        $e(flatBlock)
+                    }
+                }
+            } else {
+                probeOuter = qmacro_block() {
+                    let $i(kName) = $e(keyaBody)
+                    if (key_exists($i(tabSrcName), $i(kName))) {
+                        $e(flatBlock)
+                    }
+                }
+            }
+        }
+    } else {
+        // keys lane: the element IS the key
+        probeOuter = qmacro_block() {
+            let $i(kName) = $e(keyaBody)
+            if (key_exists($i(tabSrcName), $i(kName))) {
+                let $i(bElemName) = $i(kName)
+                $b(pairStmts)
+            }
+        }
+    }
+    // Leading `where` — same fusion as the hashed builder: filter srcA inside the per-A probe.
+    if (leadWhereLam != null) {
+        var leadPred = peel_lambda_rename_var(leadWhereLam, tupAName)
+        if (leadPred == null) return null
+        var probeInner = probeOuter
+        probeOuter = qmacro_expr() {
+            if ($e(leadPred)) {
+                $e(probeInner)
+            }
+        }
+    }
+    return <- new JoinStandalonePieces(
+        preludeStmts <- core.preludeStmts,
+        probeOuter <- probeOuter,
+        returnStmt = core.returnStmt,
+        invokeRetType = core.invokeRetType
+    )
+}
+
 // ── join splice (one pattern for every source) ───────
 // `can_join()` admits the adapter; `emit_join_hook()` supplies the source-specific emit + srcb gate
 // (null → tier-2). decs / array / xml each override emit_join_hook — no parallel per-source pattern.
@@ -5750,7 +5933,9 @@ def emit_join(var c : Captures; var ctx : EmitCtx; at : LineInfo) : Expression?
 [macro_function]
 def emit_array_join(var c : Captures; var ctx : EmitCtx; at : LineInfo) : Expression? {
     // nolint:STYLE014 — emits a hashed equi-join as a 2-source invoke (mirrors Zip's 2-source wrap):
-    // build a hash from srcB, probe it with srcA. Generated code (canonical no-count/no-where/no-select shape):
+    // build a hash from srcB, probe it with srcA. The lead loop and param spelling come from the adapter
+    // (wrap_source_loop / invoke_param_type), so any direct-return loop source rides this — array `for`,
+    // table slot walk. Generated code (canonical no-count/no-where/no-select shape, array lead):
     //   invoke(
     //       (srcA, srcB) {
     //           var buf  : array<Result>
@@ -5770,6 +5955,9 @@ def emit_array_join(var c : Captures; var ctx : EmitCtx; at : LineInfo) : Expres
     //   count()    → `var cnt = 0`; probe body is `cnt += length(bucket)` (bucket-length fast path, no per-pair loop); `return cnt`
     //   where(p)   → per pair: `let res = result(a, b); if (p(res)) { ...push/incr... }`
     //   select(f)  → push `f(res)` instead of `res` (buffer element type = f's return type)
+    // Table-srcB probe: when srcB is `each_kv(tab)` / `keys(tab)` joined on its bare key (inner join only),
+    // there is no hash/build loop — srcB binds the table itself and the per-A probe is a key lookup
+    // (build_join_probe_pieces). Unique table keys make probe ≡ hash semantics exactly.
     var joinCall = c.single["join"]
     if (!srcb_is_array_shaped(c, "join")) return null   // srcb must be array/iterator-shaped (formerly the array_join_srcb_is_array requires gate, moved here so the unified pattern asks the adapter)
     let isGroupJoin = c.single_name |> key_exists("join") && c.single_name["join"] == "group_join"
@@ -5801,41 +5989,55 @@ def emit_array_join(var c : Captures; var ctx : EmitCtx; at : LineInfo) : Expres
     if (keyaLam == null || keybLam == null || keyaLam._type == null) return null
     let keyType = strip_const_ref(clone_type(keyaLam._type.firstType))
     if (!is_primitive_join_key_type(keyType)) return null
-    let srcAName  = qn("ajoin_srcA", at)
+    let srcAName  = ctx.src->arraySrcName()
     let srcBName  = qn("ajoin_srcB", at)
-    let tupAName  = qn("ajoin_tup_a", at)
+    let tupAName  = ctx.src->bind_name(at)
     let tupBName  = qn("ajoin_tup_b", at)
     let hashName  = qn("ajoin_hash", at)
     var keyaBody = peel_lambda_rename_var(keyaLam, tupAName)
     var keybBody = peel_lambda_rename_var(keybLam, tupBName)
     if (keyaBody == null || keybBody == null) return null
     let tupBType = strip_const_ref(clone_type(srcbSrc._type.firstType))
-    var pieces = build_join_standalone_pieces(joinCall, whereLam, selectLam, leadWhereLam, countOnly, keyaBody, tupAName, hashName, tupBType, isGroupJoin, "ajoin", at)
+    var srcbTab = join_srcb_table_call(joinCall)
+    let srcbKv = srcbTab != null && get_call_short_name(srcbTab) == "each_kv"
+    let probeMode = srcbTab != null && !isGroupJoin && join_keyb_is_bare_key(keybBody, tupBName, srcbKv)
+    var pieces : JoinStandalonePieces?
+    if (probeMode) {
+        pieces = build_join_probe_pieces(joinCall, whereLam, selectLam, leadWhereLam, countOnly, keyaBody, tupAName, srcBName, srcbKv, "ajoin", at)
+    } else {
+        pieces = build_join_standalone_pieces(joinCall, whereLam, selectLam, leadWhereLam, countOnly, keyaBody, tupAName, hashName, tupBType, isGroupJoin, "ajoin", at)
+    }
     if (pieces == null) return null
     var allStmts : array<Expression?>
     allStmts |> push_from(pieces.preludeStmts)
-    allStmts |> push <| qmacro_expr() {
-        var $i(hashName) : table<$t(keyType); array<$t(tupBType)>>
-    }
-    allStmts |> push <| qmacro_expr() {
-        for ($i(tupBName) in $i(srcBName)) {
-            // nolint:PERF006 per-key bucket size unknown ahead of time
-            $i(hashName)[$e(keybBody)] |> push_clone($i(tupBName))
+    if (!probeMode) {
+        allStmts |> push <| qmacro_expr() {
+            var $i(hashName) : table<$t(keyType); array<$t(tupBType)>>
         }
-    }
-    allStmts |> push <| qmacro_expr() {
-        for ($i(tupAName) in $i(srcAName)) {
-            $e(pieces.probeOuter)
+        allStmts |> push <| qmacro_expr() {
+            for ($i(tupBName) in $i(srcBName)) {
+                // nolint:PERF006 per-key bucket size unknown ahead of time
+                $i(hashName)[$e(keybBody)] |> push_clone($i(tupBName))
+            }
         }
     }
+    allStmts |> push <| ctx.src->wrap_source_loop(LoopDispatch(Each = null), pieces.probeOuter, at)
     allStmts |> push(pieces.returnStmt)
     var invokeRetType = pieces.invokeRetType
     var topClone = clone_expression(topSrc)
     topClone.genFlags.alwaysSafe = true
-    var srcbClone = clone_expression(srcbSrc)
+    var srcbArg = probeMode ? srcbTab.arguments[0] : srcbSrc
+    var srcbClone = clone_expression(srcbArg)
     srcbClone.genFlags.alwaysSafe = true
-    let srcAParamType = invoke_src_param_type(topSrc)
-    let srcBParamType = invoke_src_param_type(srcbSrc)
+    let srcAParamType = ctx.src->invoke_param_type()
+    var srcBParamType : TypeDeclPtr
+    if (probeMode) {
+        // bind the user's table itself, const-accepting (a non-const source adds-const cleanly)
+        srcBParamType = strip_const_ref(clone_type(srcbArg._type))
+        srcBParamType.flags.constant = true
+    } else {
+        srcBParamType = invoke_src_param_type(srcbSrc)
+    }
     var emission = qmacro(invoke(
         $($i(srcAName) : $t(srcAParamType), $i(srcBName) : $t(srcBParamType)) : $t(invokeRetType) {
             $b(allStmts)
diff --git a/daslib/linq_fold_table.das b/daslib/linq_fold_table.das
index c9fc01181..89b20b630 100644
--- a/daslib/linq_fold_table.das
+++ b/daslib/linq_fold_table.das
@@ -122,6 +122,12 @@ class TableAdapter : SourceAdapter {
     def override can_reserve_by_length() : bool {
         return true   // length(tab) is O(1); the shared reserve hint reads arrayTop/arraySrcName
     }
+    def override const can_join() : bool {
+        return true   // rides emit_array_join: direct-return lead loop via wrap_source_loop
+    }
+    def override emit_join_hook(var c : Captures; var ctx : EmitCtx; at : LineInfo) : Expression? {
+        return emit_array_join(c, ctx, at)
+    }
     def override arrayTop() : Expression? {
         // Feeds the reserve hint (type_has_length covers tables). The backward-index reverse lanes that
         // also read arrayTop gate on array_source, which is false here — matchTop stays iterator-typed.
@@ -146,10 +152,14 @@ class TableAdapter : SourceAdapter {
         var breakGuard = qmacro($i(takenName) >= $i(takeLimName))
         return build_table_walk(lane, srcName, bindName, perElement, breakGuard, at)
     }
-    def override wrap_invoke(var stmts : array<Expression?>; retType : TypeDeclPtr; wrapIter : bool; at : LineInfo) : Expression? {
+    def override invoke_param_type() : TypeDeclPtr {
         // Const-accepting param: the source table is often a `let`, and a non-const source adds-const cleanly.
         var tabType = strip_const_ref(clone_type(tabExpr._type))
         tabType.flags.constant = true
+        return tabType
+    }
+    def override wrap_invoke(var stmts : array<Expression?>; retType : TypeDeclPtr; wrapIter : bool; at : LineInfo) : Expression? {
+        var tabType = invoke_param_type()
         var tabClone = clone_expression(tabExpr)
         tabClone.genFlags.alwaysSafe = true
         let sn = srcName
diff --git a/doc/source/reference/linq_das.rst b/doc/source/reference/linq_das.rst
index 6d699c139..124867e38 100644
--- a/doc/source/reference/linq_das.rst
+++ b/doc/source/reference/linq_das.rst
@@ -361,6 +361,11 @@ source is built exactly like the first (untyped → array/table, typed → the
 ``from_in`` dispatch), so it may be a different kind of source than the left —
 a table works on either side (its kv pair is that side's row, e.g.
 ``on c.brand equals p.key``); note a table left source walks in slot order.
+A right-side table joined on its **bare key** — ``equals p.key``, or the bare
+element for a ``table<K>`` set — fuses as a per-row key probe of that table
+(no internal join hash gets built); any other right-key expression keeps the
+ordinary hashed join. Either way the results are identical — table keys are
+unique, so the probed "bucket" is the same 0-or-1 rows the hash would hold.
 
 The reader picks one of two emit shapes from the **post-join** clauses (it
 transpiles before type inference and cannot see the source, so it decides
diff --git a/doc/source/reference/linq_fold_patterns.rst b/doc/source/reference/linq_fold_patterns.rst
index 3ced024a9..c7ef01e48 100644
--- a/doc/source/reference/linq_fold_patterns.rst
+++ b/doc/source/reference/linq_fold_patterns.rst
@@ -150,7 +150,7 @@ Source-side entry points
      - Optional source — only when the ``pugixml`` module is linked (``require ?pugixml`` + ``static_if (typeinfo builtin_module_exists(pugixml))``). Emits an inlined DOM child-element walk replacing the generator, and **field-prunes** the per-element materialization (pass 2b): the chain body is scanned for the ``Row`` fields it reads, and only those attributes are read via ``read_xml_field`` into scalar locals — unread fields (notably ``string`` fields, whose ``clone_string`` is the alloc cost) are never touched, so a float-only chain runs alloc-free and JIT beats the equivalent SQLite query. A whole-row escape (``to_array`` / identity ``_select(_)`` / pass-to-fn) routes to the full ``build_xml_row`` instead. The ``XmlAdapter`` **rides every pattern row** (``try_splice_patterns`` runs with no ``onlyRow`` restriction); per-row ``requires`` predicates and the adapter's capability hooks (``can_join`` / ``can_group_by`` / ``defers_materialization`` / the ``non_array_source`` gate) decide what fuses, and a shape it can't fuse cascades to tier-2 — see :ref:`linq_fold_xml_patterns` for the full fuse/defer breakdown. ``unsafe`` is required (the source is ``[unsafe_outside_of_for]``) and the node is passed by value (``var root`` — ``_fold``'s macro-arg inference skips the const&→value copy).
    * - ``unsafe(each_kv(tab))`` / ``keys(tab)`` / ``values(tab)``
      - ``extract_table_source`` (``TableAdapter``, ``daslib/linq_fold_table.das``)
-     - In-tree source — recognized by name **plus** a table-typed argument (``table<K;V>`` / ``table<K>``), so an unrelated user ``keys`` never fires it. The kv lane (``each_kv``) binds ``kv.key`` / ``kv.value`` and **usage-prunes the walk**: a chain touching only ``.value`` walks ``values(tab)`` alone, only ``.key`` (or neither) walks ``keys(tab)`` alone — half the slot-skip work of the zipped two-iterator form, which is emitted only when both sides (or the whole pair) are read. A whole-pair escape binds a named-tuple copy, so the kv lane fuses **copyable value types only** — a non-copyable-valued ``each_kv`` falls through and the surviving instantiation concept-asserts (error 31400; the keys/values lanes still fuse such tables). Bare ``count()`` / ``long_count()`` folds to O(1) ``length(tab)``; a plain ``distinct`` over raw keys/kv elements is **dropped** before matching (keys are unique by construction; only uniqueness-preserving prefix ops allow the drop — a preceding ``select`` keeps the distinct, and the values lane always keeps it). **Point-lookup folds** (``try_table_point_lookup``): a key-equality ``where`` (``kv.key == X``, bare ``k == X`` on the keys lane, either operand order; predicate-form ``any(p)`` / ``count(p)`` too) against a loop-invariant, side-effect-free ``X`` folds the whole walk to an O(1) probe — ``any`` / keys-lane ``contains(X)`` → ``key_exists(tab, X)``, ``count`` → ``key_exists ? 1 : 0``, ``first`` / ``first_or_default`` (± one trailing ``select``) → a ``tab?[X]`` probe with the scan's exact semantics (panic on a missing ``first``, eagerly-bound default value otherwise). Anything else — compound ``&&`` predicates, other comparison operators, an ``X`` that reads the binder or has side effects (the scan evaluates ``X`` per element, the probe once) — keeps the scan. ``order_by`` / ``take`` / ``first`` observe the table's unspecified slot order, exactly like a hand ``for (k, v in keys(t), values(t))`` loop. ``can_join`` / ``can_group_by`` are off and reverse has no backward slot walk — those shapes cascade to tier-2 (the join probe is staged: see ``benchmarks/sql/LINQ_TO_TABLE.md``). ``unsafe`` is required at an unfused chain head (the sources are ``[unsafe_outside_of_for]``); fused chains rewrite the head before inference.
+     - In-tree source — recognized by name **plus** a table-typed argument (``table<K;V>`` / ``table<K>``), so an unrelated user ``keys`` never fires it. The kv lane (``each_kv``) binds ``kv.key`` / ``kv.value`` and **usage-prunes the walk**: a chain touching only ``.value`` walks ``values(tab)`` alone, only ``.key`` (or neither) walks ``keys(tab)`` alone — half the slot-skip work of the zipped two-iterator form, which is emitted only when both sides (or the whole pair) are read. A whole-pair escape binds a named-tuple copy, so the kv lane fuses **copyable value types only** — a non-copyable-valued ``each_kv`` falls through and the surviving instantiation concept-asserts (error 31400; the keys/values lanes still fuse such tables). Bare ``count()`` / ``long_count()`` folds to O(1) ``length(tab)``; a plain ``distinct`` over raw keys/kv elements is **dropped** before matching (keys are unique by construction; only uniqueness-preserving prefix ops allow the drop — a preceding ``select`` keeps the distinct, and the values lane always keeps it). **Point-lookup folds** (``try_table_point_lookup``): a key-equality ``where`` (``kv.key == X``, bare ``k == X`` on the keys lane, either operand order; predicate-form ``any(p)`` / ``count(p)`` too) against a loop-invariant, side-effect-free ``X`` folds the whole walk to an O(1) probe — ``any`` / keys-lane ``contains(X)`` → ``key_exists(tab, X)``, ``count`` → ``key_exists ? 1 : 0``, ``first`` / ``first_or_default`` (± one trailing ``select``) → a ``tab?[X]`` probe with the scan's exact semantics (panic on a missing ``first``, eagerly-bound default value otherwise). Anything else — compound ``&&`` predicates, other comparison operators, an ``X`` that reads the binder or has side effects (the scan evaluates ``X`` per element, the probe once) — keeps the scan. ``order_by`` / ``take`` / ``first`` observe the table's unspecified slot order, exactly like a hand ``for (k, v in keys(t), values(t))`` loop. **Joins fuse on either side** (``can_join`` is on; the adapter rides the shared ``emit_array_join`` through its own ``wrap_source_loop``): a table *lead* walks its pruned slot iterator(s) as the probe loop; a table in the *srcB slot* joined on its bare key — ``d.key`` on the kv lane, the bare element on a ``keys(set)`` source — skips the join's internal ``table<KEY; array<TUPB>>`` entirely and probes the user's table per lead row (``join_keyb_is_bare_key`` + ``build_join_probe_pieces``; unique table keys make the probe ≡ hash semantics exactly). The probe is itself usage-pruned: count-no-where and key-only shapes stay on ``key_exists``, value shapes bind the matched value **by reference** from a ``tab?[k]`` pointer (no copy), and only a whole-pair use binds the kv tuple. A non-bare b-key keeps the hashed build over the kv iterator; ``group_join`` (outer — its result consumes the whole bucket) always keeps it. ``can_group_by`` is off and reverse has no backward slot walk — those shapes cascade to tier-2 (see ``benchmarks/sql/LINQ_TO_TABLE.md``). ``unsafe`` is required at an unfused chain head (the sources are ``[unsafe_outside_of_for]``); fused chains rewrite the head before inference.
    * - ``unsafe(from_json(jv, type<Row>))``
      - ``extract_json_source`` (``JsonAdapter``, ``daslib/linq_fold_json.das``)
      - In-tree source — the adapter is compiled in unconditionally (no ``static_if`` gate, unlike XML's pugixml one), but a program only pulls JSON into scope by requiring ``json`` / ``json_boost`` itself. ``extract_json_source`` matches a ``from_json`` whose first argument is a ``json::JsonValue?``, so a JSON-less program returns null and the chain falls to the array tier. The adapter pulls in **no** json dependency — it emits ``from_json`` / ``read_json_field`` by name (resolved at the user's splice site, like ``linq_fold_decs`` emits ``for_each_archetype``; ``from_JV`` is emitted only for a non-struct element type). Emits an inlined ``for (e in jv.value as _array)`` walk replacing the generator, and **field-prunes** the per-element materialization (pass 2b): only the keys the chain reads are pulled via ``read_json_field`` by name — unread keys (notably ``string`` fields whose materialization clones) are never touched, so a scalar-only chain skips ~all of the full per-row build (3.6× over the full materialize — see ``benchmarks/micro/json_source_shapes.das``). A whole-row escape reads **every** top-level field by name (``emit_full_row_by_name``), so a custom whole-row ``from_JV(Row)`` override is **not** honored (Option B — this is a flat query source, not a deserializer; materialize the array with an explicit ``from_JV`` first for that). ``unsafe`` is required (the source is ``[unsafe_outside_of_for]``). Deferred materialization mirrors XML: order/distinct/take buffer a cheap ``(orderKey, JsonValue?)`` surrogate and materialize only the K survivors — by name (``emit_full_row_by_name``), so a struct survivor reads each field by key; only a non-struct ``Row`` falls back to ``outBind <- from_JV(handle, type<Row>)``. The ``JsonAdapter`` also fuses ``join`` / ``join |> group_by`` (``emit_join_hook`` + ``JsonJoinAdapter`` off ``build_group_by_adapter``'s upstream-join arm), reusing the array-join machinery (``build_join_standalone_pieces`` / ``build_join_adapter_pieces``): srcB is collected into a ``table<KEY; array<TUPB>>`` and the field-pruned array walk is the probe side, so the join key reads only its own field per element (e.g. ``read_json_field(jcur, "brand", …)``). Standalone ``group_join`` and a trailing ``where`` / ``select`` / ``count`` over group-join rows defer to tier-2, mirroring XML.
@@ -423,12 +423,16 @@ Array-array equi-join
 ``emit_array_join`` is the array-source mirror of ``emit_decs_join`` —
 hashed equi-join over two array / iterator sources. Algorithm is
 identical (collect srcb into ``table<KEY; array<TUPB>>`` in one pass,
-then walk srca and probe via ``table.get``) but the per-source
-iteration is a plain ``for (elem in src) { ... }`` loop instead of
-``for_each_archetype + build_decs_inner_for``. Both sources bind as
-invoke parameters (2-source wrap, mirrors ``Zip``). Same primitive
-equi-key gate as the decs side; non-primitive keys cascade to
-``join_impl_const``.
+then walk srca and probe via ``table.get``) but the lead iteration
+comes from the adapter (``wrap_source_loop`` / ``bind_name`` /
+``invoke_param_type``), so any direct-return loop source rides it —
+``ArrayAdapter`` frames a plain ``for (elem in src)``, ``TableAdapter``
+its pruned slot walk (vs ``for_each_archetype + build_decs_inner_for``
+on the decs side). Both sources bind as invoke parameters (2-source
+wrap, mirrors ``Zip``). Same primitive equi-key gate as the decs side;
+non-primitive keys cascade to ``join_impl_const``. When srcB is a
+table walked on its bare key, the internal hash is skipped entirely —
+see the table-source row above and the probe row below.
 
 .. list-table::
    :header-rows: 1
@@ -468,6 +472,28 @@ equi-key gate as the decs side; non-primitive keys cascade to
        ``join`` is the separate trailing slot. Composes with the trailing
        ``_where`` / ``_select`` forms. Wrapping lives in the shared
        ``build_join_standalone_pieces``, so decs / XML / JSON inherit it.
+   * - ``arrA |> _join(unsafe(each_kv(tab)), <on a == d.key>, ...)`` (or ``keys(set)`` with a bare-element key; any terminator/where/select form above)
+     - probe mode (``join_srcb_table_call`` + ``join_keyb_is_bare_key`` → ``build_join_probe_pieces``)
+     - **Table-srcB probe**: the b-key selector IS the table key, so no
+       hash and no build loop — srcB binds the user's table itself
+       (const param) and the per-A probe is a key lookup. Unique table
+       keys ⇒ bucket ≤ 1 ⇒ probe ≡ hash semantics exactly (b-key is a
+       bare field read, so skipping its per-B evaluation is
+       unobservable). Usage-pruned like the point-lookup fold:
+       count-no-where / key-only shapes probe ``key_exists`` (value
+       never touched), value shapes bind by reference from
+       ``tab?[k]``, a whole-pair use binds the kv tuple. Non-bare
+       b-keys and ``group_join`` keep the hashed build over the kv
+       iterator. Composes with every lead the emitter serves (array
+       lead, table lead — table×table probes both sides).
+   * - ``unsafe(each_kv(tabA)) |> _join(srcB, on, into) |> ...`` (table lead; ``keys`` / ``values`` lanes too)
+     - pattern ``join_general`` → ``TableAdapter.emit_join_hook`` → ``emit_array_join``
+     - **Table lead**: same emitter, lead loop framed by
+       ``TableAdapter.wrap_source_loop`` — the kv usage-pruner sees the
+       whole probe body (key lambda + result + trailing where/select),
+       so a join touching only ``c.value.*`` walks ``values(tab)``
+       alone. All srcB modes compose (hashed array/iterator srcB,
+       table-srcB probe); ``group_join`` stays outer over every slot.
    * - ``arrA |> _group_join(arrB, on, into)`` (+ optional leading ``_where``)
      - pattern ``join_general`` with the ``group_join`` literal (``isGroupJoin``)
      - C# GroupJoin (**outer**): one result row per srcA row — ``result(a,
@@ -477,10 +503,11 @@ equi-key gate as the decs side; non-primitive keys cascade to
        "group_join"]``; ``isGroupJoin`` threads through
        ``build_join_standalone_pieces``, which rebinds the result lambda's 2nd
        param to the whole bucket (``array<TUPB>``) so the per-group aggregate
-       runs inside the result. **Array sources only** — decs / XML / JSON group joins
-       defer to tier-2 (their ``emit_join_hook`` returns ``null`` for
+       runs inside the result. **Array / table leads only** — decs / XML / JSON
+       group joins defer to tier-2 (their ``emit_join_hook`` returns ``null`` for
        ``group_join``); a trailing ``where`` / ``select`` / ``count`` over the
-       group rows also defers.
+       group rows also defers, and a table srcB keeps the hashed build (the
+       probe never serves group joins).
    * - ``arrA |> _join(arrB, ...) |> _group_by(K) |> _select(reduce) |> count() / to_array()``
      - ``plan_group_by_core`` via ``SourceAdapter.ArrayJoin`` (chunk N+2)
      - Cross-arm composition. ``emit_group_by``'s Array branch
diff --git a/tests/linq/test_linq_das.das b/tests/linq/test_linq_das.das
index bdfce8c2b..7c37fcd07 100644
--- a/tests/linq/test_linq_das.das
+++ b/tests/linq/test_linq_das.das
@@ -286,7 +286,8 @@ def test_table_arbitrary_range_var_name(t : T?) {
 
 [test]
 def test_table_as_join_right_source(t : T?) {
-    // a table works on either side of a join (tier-2; the kv pair is that side's row).
+    // a table works on either side of a join (the kv pair is that side's row). A right-side table
+    // joined on its bare `.key` fuses as a key probe — no internal join hash.
     // left side is the array → result follows array order, deterministic
     let cars <- mk_cars()
     let prio <- { "eco" => 10, "lux" => 99 }
@@ -301,7 +302,7 @@ def test_table_as_join_right_source(t : T?) {
 def test_table_as_join_left_source(t : T?) {
     let cars <- mk_cars()
     let prio <- { "eco" => 10, "lux" => 99 }
-    // left side is the table → slot order, so sort before asserting
+    // left side is the table → fused slot walk; slot order, so sort before asserting
     var rows <- %linq! from p in prio join c in cars on p.key equals c.brand select "{c.name}={p.value}" %%
     rows |> sort()
     t |> equal(length(rows), 3)
@@ -310,6 +311,29 @@ def test_table_as_join_left_source(t : T?) {
     t |> equal(rows[2], "mid=10")
 }
 
+[test]
+def test_set_as_join_source(t : T?) {
+    // a table<K> set joins on its bare element — membership probe
+    let cars <- mk_cars()
+    let vip : table<string> <- { "lux" }
+    let names <- %linq! from c in cars join b in vip on c.brand equals b select c.name %%
+    t |> equal(length(names), 1)
+    t |> equal(names[0], "lux")
+}
+
+[test]
+def test_table_join_into_group(t : T?) {
+    // `into` (group join) over a table right source stays OUTER — the probe never fires (its result
+    // consumes the whole bucket), the hashed walk does
+    let cars <- mk_cars()
+    let prio <- { "eco" => 10 }
+    let rows <- %linq! from c in cars join p in prio on c.brand equals p.key into ps select (Name = c.name, N = length(ps)) %%
+    t |> equal(length(rows), 3)
+    t |> equal(rows[0].N, 1)
+    t |> equal(rows[1].N, 1)
+    t |> equal(rows[2].N, 0)
+}
+
 // ===== orderby (single key, optional `descending`) =====
 
 [test]
diff --git a/tests/linq/test_linq_table_source.das b/tests/linq/test_linq_table_source.das
index 630960d18..ed0330600 100644
--- a/tests/linq/test_linq_table_source.das
+++ b/tests/linq/test_linq_table_source.das
@@ -278,6 +278,196 @@ def test_table_point_lookup(t : T?) {
     }
 }
 
+typedef IKV = tuple<key : int; value : int>
+
+// Joins: a table in the srcB slot joined on its bare key (`d.key` / bare set element) probes the user's
+// table instead of building the join's internal hash; a table lead rides the same emitter through the
+// pruned slot walk. Either way must agree with the hash/hand-loop semantics: inner joins drop misses,
+// keep lead duplicates; group joins stay outer.
+
+[test]
+def test_table_join_srcb_probe(t : T?) {
+    t |> run("inner join on the table key: count + rows in lead order") @(t : T?) {
+        var ids <- [0, 1, 1, 4, 9, 2, 0, 7]   // dups stay, 9/7 miss
+        var dtab <- make_int_table(5)
+        let n = _fold(ids |> _join(unsafe(each_kv(dtab)),
+                                   $(a : int, d : IKV) => a == d.key,
+                                   $(a : int, d : IKV) => d.value)
+                          |> count())
+        t |> equal(n, 6)
+        var got <- _fold(ids |> _join(unsafe(each_kv(dtab)),
+                                      $(a : int, d : IKV) => a == d.key,
+                                      $(a : int, d : IKV) => (A = a, V = d.value))
+                             |> to_array())
+        var expected <- [(A = 0, V = 0), (A = 1, V = 10), (A = 1, V = 10), (A = 4, V = 40), (A = 2, V = 20), (A = 0, V = 0)]
+        t |> equal(length(got), length(expected))
+        for (i in range(length(expected))) {
+            t |> equal(got[i].A, expected[i].A)
+            t |> equal(got[i].V, expected[i].V)
+        }
+        delete got
+        delete expected
+        delete dtab
+        delete ids
+    }
+    t |> run("trailing where + select on the probed pair") @(t : T?) {
+        var ids <- [0, 1, 1, 4, 9, 2, 0, 7]
+        var dtab <- make_int_table(5)
+        var got <- _fold(ids |> _join(unsafe(each_kv(dtab)),
+                                      $(a : int, d : IKV) => a == d.key,
+                                      $(a : int, d : IKV) => (A = a, V = d.value))
+                             |> _where(_.V > 5)
+                             |> _select("{_.A}:{_.V}")
+                             |> to_array())
+        t |> equal(length(got), 4)
+        t |> equal(got[0], "1:10")
+        t |> equal(got[1], "1:10")
+        t |> equal(got[2], "4:40")
+        t |> equal(got[3], "2:20")
+        delete got
+        delete dtab
+        delete ids
+    }
+    t |> run("empty table srcB matches nothing") @(t : T?) {
+        var ids <- [0, 1, 2]
+        let e : table<int; int>
+        let n = _fold(ids |> _join(unsafe(each_kv(e)),
+                                   $(a : int, d : IKV) => a == d.key,
+                                   $(a : int, d : IKV) => d.value)
+                          |> count())
+        t |> equal(n, 0)
+        delete ids
+    }
+    t |> run("set srcB joins on membership") @(t : T?) {
+        var ids <- [0, 1, 1, 4, 9]
+        var s : table<int> <- { 1, 4 }
+        let n = _fold(ids |> _join(unsafe(keys(s)),
+                                   $(a : int, k : int) => a == k,
+                                   $(a : int, k : int) => a)
+                          |> count())
+        t |> equal(n, 3)
+        delete s
+        delete ids
+    }
+    t |> run("non-bare b key stays hashed and agrees") @(t : T?) {
+        var ids <- [0, 1, 1, 4, 9, 2, 0, 7]
+        var dtab <- make_int_table(5)
+        // value/10 == key for this fixture, so the hashed walk must find the same 6 pairs
+        let n = _fold(ids |> _join(unsafe(each_kv(dtab)),
+                                   $(a : int, d : IKV) => a == d.value / 10,
+                                   $(a : int, d : IKV) => d.value)
+                          |> count())
+        t |> equal(n, 6)
+        delete dtab
+        delete ids
+    }
+    t |> run("group join over a table srcB stays outer") @(t : T?) {
+        var ids <- [0, 9, 4]
+        var dtab <- make_int_table(5)
+        var got <- _fold(ids |> _group_join(unsafe(each_kv(dtab)),
+                                            $(a : int, d : IKV) => a == d.key,
+                                            $(a : int, ds : array<IKV>) => (A = a, N = length(ds)))
+                             |> to_array())
+        t |> equal(length(got), 3)
+        t |> equal(got[0].N, 1)
+        t |> equal(got[1].N, 0)
+        t |> equal(got[2].N, 1)
+        delete got
+        delete dtab
+        delete ids
+    }
+    t |> run("table lead joining a table srcB probes both sides") @(t : T?) {
+        var ctab <- make_int_table(8)
+        var dtab <- make_int_table(5)
+        let n = _fold(each_kv(ctab) |> _join(unsafe(each_kv(dtab)),
+                                             $(c : IKV, d : IKV) => c.key == d.key,
+                                             $(c : IKV, d : IKV) => c.value + d.value)
+                                    |> count())
+        t |> equal(n, 5)
+        delete dtab
+        delete ctab
+    }
+}
+
+[test]
+def test_table_join_lead(t : T?) {
+    t |> run("table lead, array srcB: count keeps bucket multiplicity") @(t : T?) {
+        var ctab <- make_int_table(10)
+        var darr <- [0, 2, 2, 4, 11]
+        let n = _fold(each_kv(ctab) |> _join(darr,
+                                             $(c : IKV, d : int) => c.key == d,
+                                             $(c : IKV, d : int) => (K = c.key, V = c.value))
+                                    |> count())
+        t |> equal(n, 4)
+        delete darr
+        delete ctab
+    }
+    t |> run("table lead to_array agrees with a hand loop in slot order") @(t : T?) {
+        var ctab <- make_int_table(10)
+        var darr <- [0, 2, 2, 4, 11]
+        var expected : array<int>
+        for (k, v in keys(ctab), values(ctab)) {
+            for (d in darr) {
+                if (k == d) {
+                    expected |> push(v + d)
+                }
+            }
+        }
+        var got <- _fold(each_kv(ctab) |> _join(darr,
+                                                $(c : IKV, d : int) => c.key == d,
+                                                $(c : IKV, d : int) => c.value + d)
+                                       |> to_array())
+        t |> equal(length(got), length(expected))
+        for (i in range(length(expected))) {
+            t |> equal(got[i], expected[i])
+        }
+        delete got
+        delete expected
+        delete darr
+        delete ctab
+    }
+    t |> run("lead where filters before the join") @(t : T?) {
+        var ctab <- make_int_table(10)
+        var darr <- [0, 2, 2, 4, 11]
+        let n = _fold(each_kv(ctab) |> _where(_.key > 1)
+                                    |> _join(darr,
+                                             $(c : IKV, d : int) => c.key == d,
+                                             $(c : IKV, d : int) => c.value)
+                                    |> count())
+        t |> equal(n, 3)
+        delete darr
+        delete ctab
+    }
+    t |> run("values lane lead") @(t : T?) {
+        var ctab <- make_int_table(10)
+        var darr <- [0, 2, 2, 4, 11]
+        let n = _fold(values(ctab) |> _join(darr,
+                                            $(v : int, d : int) => v == d * 10,
+                                            $(v : int, d : int) => v)
+                                   |> count())
+        t |> equal(n, 4)
+        delete darr
+        delete ctab
+    }
+    t |> run("table lead group_join stays outer over every slot") @(t : T?) {
+        var ctab <- make_int_table(10)
+        var darr <- [0, 2, 2, 4, 11]
+        var got <- _fold(each_kv(ctab) |> _group_join(darr,
+                                                      $(c : IKV, d : int) => c.key == d,
+                                                      $(c : IKV, ds : array<int>) => length(ds))
+                                       |> to_array())
+        t |> equal(length(got), 10)
+        var total = 0
+        for (n in got) {
+            total += n
+        }
+        t |> equal(total, 4)
+        delete got
+        delete darr
+        delete ctab
+    }
+}
+
 // Tier-2 over the raw each_kv iterator (no _fold) — the [unsafe_outside_of_for] contract requires the
 // explicit unsafe(...) wrap at a bare chain head; fused chains rewrite the head before inference.
 

From 0b90ee59b8c3f5f0db93e8531f327497f8bbd766 Mon Sep 17 00:00:00 2001
From: Boris Batkin <bbatkin@gmail.com>
Date: Thu, 11 Jun 2026 03:05:08 -0700
Subject: [PATCH 08/11] linq table arc: record fixed-array-rework merge +
 each_kv re-validation in the plan doc

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 benchmarks/sql/LINQ_TO_TABLE.md | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/benchmarks/sql/LINQ_TO_TABLE.md b/benchmarks/sql/LINQ_TO_TABLE.md
index 6cd55c533..e3fb497cc 100644
--- a/benchmarks/sql/LINQ_TO_TABLE.md
+++ b/benchmarks/sql/LINQ_TO_TABLE.md
@@ -97,6 +97,16 @@ work. Cut the PR only after the rework has landed and been merged in here. At th
 re-validate the `each_kv` dim-array-value reject overload and `auto(valT)[]` matching — fixed
 arrays are exactly what is being reworked.
 
+**Merge done (2026-06-11, after stage 5):** rework (#3095) merged in; one conflict in
+`daslib/builtin.das` — master deleted the dim-array `values()` overloads (plain `auto(valT)`
+now binds the whole `T[N]`), our `each_kv` block kept. Re-validation: `auto(valT)[]` in table
+value position still matches dim-valued tables (the reject overload fires its 31400), and the
+plain `each_kv` generic still does NOT match `table<K; V[N]>` (table-position generic matching
+doesn't bind fixed-array values), so the explicit rejects remain the right design — without
+them the dim case would be a cryptic 30341, not a workable path. The dim-array-valued each_kv
+deferred edge is therefore engine-gated (table generic matcher), not ours. Gates green: full
+INTERP 10965/10971 (0 failed, 6 skipped), AOT linq 1949/1949, JIT linq 1949/1949.
+
 PR1 findings:
 - **Pre-existing generator-lowering bug, fixed in PR1**: the yield-for lowering emitted
   `loop &&= _builtin_iterator_first(...)` per source — short-circuiting `first()` on later

From b72f62515cb3a927220d990b8f2a7a2271b83a2a Mon Sep 17 00:00:00 2001
From: Boris Batkin <bbatkin@gmail.com>
Date: Thu, 11 Jun 2026 03:53:59 -0700
Subject: [PATCH 09/11] =?UTF-8?q?linq=5Ffold:=20to=5Ftable=20sink=20?=
 =?UTF-8?q?=E2=80=94=20fused=20insert-loop=20terminator=20+=20selector-fre?=
 =?UTF-8?q?e=20tier-2=20forms?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Stage 6 of the table arc (benchmarks/sql/LINQ_TO_TABLE.md), closing the arc. Two layers:

1. Tier-2 surface (daslib/linq.das): selector-free to_table over iterators and
   arrays — iterator<tuple<K;V>> -> table<K;V> map, iterator<K> -> table<K> set,
   plus borrowing array forms with reserve. Iterator params are const-qualified
   (the 50609 mangler-ICE defuse) so each_kv's -const flavor and to_sequence's -&
   flavor converge on one instantiation. Duplicate keys keep the last occurrence
   (das insert semantics, not C#'s throw).

2. Fused emit: to_table joins loop_terminator_family + the ARRAY materializer
   lane; the new arm rides emit_fold_array_lane via FoldArraySpec.bufDeclStmt
   (table buffer instead of the array decl) — where/select/ranges plumbing all
   shared. A (k => v) MakeTuple projection splits so key and value evaluate
   exactly once; other projections bind to a local; pass-through spells the kv
   access with the element tuple's real field names so the kv usage-pruner maps
   them. Reserve fires on unfiltered walks only (table over-reserve is worse
   than an array's slack), with the take-min variant. Map-vs-set falls out of
   the resolved terminator type. Declines that keep tier-2: the 3-arg selector
   form, decs sources (explicit guard — the decs lane's implicit-to_array
   fall-through would mis-emit an array for a table-typed expr).

m7: to_table 32.5 vs to_table_staged (materialize + builtin to_table_move) 68.3
ns/elem INTERP (28.8 vs 41.6 JIT). 13 new tests (58/58 in the arc file); full
INTERP 10978/10984 0 failed, AOT linq 1962/1962, JIT linq 1962/1962, Sphinx -W
clean. results.md re-swept (82 families); skills/linq.md gains the table-source
+ to_table section (end-of-arc item).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 benchmarks/sql/LINQ_TO_TABLE.md             |  34 ++-
 benchmarks/sql/results.md                   | 285 ++++++++++----------
 benchmarks/sql/table.das                    |  27 ++
 daslib/linq.das                             |  41 +++
 daslib/linq_fold.md                         |   1 +
 daslib/linq_fold_common.das                 | 108 +++++++-
 daslib/linq_fold_decs.das                   |   2 +
 doc/source/reference/linq_fold_patterns.rst |  10 +-
 skills/linq.md                              |  24 ++
 tests/linq/test_linq_table_source.das       | 125 +++++++++
 10 files changed, 508 insertions(+), 149 deletions(-)

diff --git a/benchmarks/sql/LINQ_TO_TABLE.md b/benchmarks/sql/LINQ_TO_TABLE.md
index e3fb497cc..36122143f 100644
--- a/benchmarks/sql/LINQ_TO_TABLE.md
+++ b/benchmarks/sql/LINQ_TO_TABLE.md
@@ -4,9 +4,37 @@ Sibling of [LINQ.md](LINQ.md) / [LINQ_TO_DECS.md](LINQ_TO_DECS.md). Plan of reco
 `table<K;V>` / `table<K>` as the 6th `_fold` source, plus the `to_table` sink.
 Edited in-place as PRs land.
 
-Status: **stage 5 committed** (join probe + table-lead joins; stage 4 = point-lookup folds,
-ac441c4a0; stage 3 = `%linq!` table sources, 29d23baf6; stage 2 = TableAdapter + m7, 571fe879e;
-stage 1 = `each_kv` builtin, 8751bb9ba).
+Status: **stage 6 committed — arc complete** (to_table sink; stage 5 = join probe + table-lead
+joins, 2742f6db2; stage 4 = point-lookup folds, ac441c4a0; stage 3 = `%linq!` table sources,
+29d23baf6; stage 2 = TableAdapter + m7, 571fe879e; stage 1 = `each_kv` builtin, 8751bb9ba;
+master's fixed-array rework merged in after stage 5, 1ab3e6a67).
+
+Stage 6 findings:
+- **Tier-2 surface required for typing**: `_fold`'s argument must fully type before the macro
+  runs, so the selector-free `to_table` generics in `daslib/linq.das` are load-bearing — map vs
+  set in the fused emit falls out of the *resolved* terminator type (`secondType == void`), not
+  from chain inspection. Iterator forms are const-qualified (`tuple<…> const` / `auto(keyT)
+  const`) — the standard 50609 mangler-ICE defuse — and the named kv tuple matches the
+  positional `tuple<auto;auto>` generic directly.
+- **The fused arm is ~60 lines riding existing machinery**: `to_table` joins
+  `loop_terminator_family` + the ARRAY materializer lane; a new `FoldArraySpec.bufDeclStmt` slot
+  swaps the array buffer decl for the table decl and `emit_fold_array_lane` does the rest
+  (where/select/ranges/reserve plumbing shared with to_array chains).
+- **Field names matter for the kv pruner**: the pass-through insert must spell `it.key` /
+  `it.value` (the element tuple's real field names), not positional `._0`/`._1` — the row-usage
+  scanner maps named fields only, and an unmapped reference leaves the bind var undeclared.
+  A `(k => v)` MakeTuple projection splits so each side evaluates exactly once.
+- **`to_table_move` is not a chain terminator**: over an iterator there is nothing to steal —
+  elements are yielded temporaries, so "move" reduces to clone. The consuming builtin
+  `to_table_move(array)` forms still serve materialized arrays (the bench staged baseline uses
+  exactly that); a fused move of non-copyable select-temps stays a deferred edge
+  (fused-kv-non-copyable).
+- **decs needs an explicit decline**: `emit_loop_or_count_lane_decs` falls through unknown
+  terminators to its implicit-to_array arm, which would mis-emit an array for a table-typed
+  expr — guarded with `if (termName == "to_table") return null` (tier-2 cascade).
+- **where-after-select + any terminator already cascades** (pre-existing lane behavior, count
+  and to_table alike) — not a stage-6 regression; left as-is.
+- m7: `to_table` 32.3 vs `to_table_staged` 71.5 ns/elem INTERP (~2.2× over materialize-then-convert).
 
 Stage 5 findings:
 - **`emit_array_join` generalized instead of a parallel `emit_table_join`**: the lead loop, bind
diff --git a/benchmarks/sql/results.md b/benchmarks/sql/results.md
index 4a9015f62..30402ff49 100644
--- a/benchmarks/sql/results.md
+++ b/benchmarks/sql/results.md
@@ -19,7 +19,9 @@ are stable now).
   values-only / zipped slot walks; key-equality `where` + terminator folds to an O(1) probe — the
   `point_lookup` / `point_lookup_scan` pair measures it; joins fuse on either side, and a table srcB
   joined on its bare key probes the table instead of building the join hash — the `join_probe` /
-  `join_probe_build` pair measures it; group_by / reverse defer to tier-2 until their stages land).
+  `join_probe_build` pair measures it; a trailing `to_table()` inserts straight into the result
+  table with no intermediate array — the `to_table` / `to_table_staged` pair measures it;
+  group_by / reverse defer to tier-2).
 
 `0.00` = early-exit terminator below timer resolution ("free"). Chain shapes are in
 `benchmarks/README.md`; the splice arms each fires are in `doc/source/reference/linq_fold_patterns.rst`.
@@ -34,171 +36,175 @@ signal, JIT deltas as indicative.**
 
 | Benchmark | SQL (m1) | Array (m3f) | Decs (m4) | XML fold (m5f) | JSON fold (m6f) | Table fold (m7) |
 |---|---:|---:|---:|---:|---:|---:|
-| `aggregate_match` | 34.8 | 5.9 | 5.8 | 60.6 | 159.5 | 19.2 |
-| `all_match` | 27.5 | 3.5 | 3.4 | 56.1 | 154.1 | 16.4 |
+| `aggregate_match` | 34.9 | 5.9 | 5.8 | 60.8 | 158.9 | 19.5 |
+| `all_match` | 27.7 | 3.5 | 3.4 | 56.0 | 153.2 | 15.9 |
 | `any_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
-| `average_aggregate` | 30.6 | 5.9 | 8.8 | 58.4 | 164.3 | 17.3 |
-| `bare_last` | — | 4.2 | 0.0 | 0.0 | 4.2 | 30.6 |
-| `bare_order_where` | 284.5 | 117.8 | 126.7 | 300.9 | 291.5 | 163.8 |
-| `chained_select_collapse` | — | 18.3 | 17.5 | 70.4 | 162.2 | 28.0 |
-| `chained_where` | 36.1 | 6.6 | 7.1 | 104.9 | 183.8 | 24.1 |
-| `contains_match` | 0.0 | 2.2 | 1.4 | 29.1 | 72.0 | 6.6 |
-| `count_aggregate` | 29.8 | 4.1 | 4.1 | 63.7 | 155.9 | 20.3 |
-| `cross_join` | 12556.2 | 3697.8 | — | 4012.8 | 4069.8 | — |
+| `average_aggregate` | 30.3 | 5.9 | 8.7 | 58.5 | 163.4 | 17.3 |
+| `bare_last` | — | 4.2 | 0.0 | 0.0 | 4.2 | 30.1 |
+| `bare_order_where` | 282.9 | 118.2 | 125.0 | 300.5 | 290.8 | 163.1 |
+| `chained_select_collapse` | — | 17.8 | 17.5 | 70.4 | 161.7 | 27.7 |
+| `chained_where` | 41.5 | 6.6 | 7.1 | 104.8 | 182.1 | 24.0 |
+| `contains_match` | 0.0 | 2.2 | 1.4 | 28.9 | 71.5 | 6.6 |
+| `count_aggregate` | 29.6 | 4.3 | 4.1 | 63.5 | 154.0 | 20.2 |
+| `cross_join` | 12896.3 | 3681.4 | — | 4018.5 | 4096.4 | — |
 | `decs_count_bare_pred` | — | — | 4.1 | — | — | — |
-| `distinct_by_count` | 41.0 | 15.7 | 15.6 | 70.6 | 160.7 | 26.6 |
-| `distinct_by_order_take` | 239.3 | 22.1 | 23.4 | 123.7 | 163.1 | 48.5 |
-| `distinct_by_order_to_array` | 238.9 | 22.1 | 23.5 | 124.2 | 163.1 | 48.8 |
-| `distinct_count` | 41.0 | 15.8 | 15.8 | 70.8 | 162.4 | 27.0 |
-| `distinct_count_pred` | 254.3 | 15.8 | 15.9 | 112.2 | 177.8 | 26.8 |
+| `distinct_by_count` | 41.4 | 15.7 | 15.7 | 70.4 | 161.3 | 26.8 |
+| `distinct_by_order_take` | 239.9 | 22.3 | 23.3 | 123.9 | 162.0 | 48.8 |
+| `distinct_by_order_to_array` | 237.8 | 22.3 | 23.3 | 124.3 | 162.5 | 48.8 |
+| `distinct_count` | 41.8 | 15.9 | 15.7 | 70.7 | 161.8 | 27.0 |
+| `distinct_count_pred` | 252.1 | 15.8 | 15.6 | 111.9 | 176.7 | 26.6 |
 | `distinct_take` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
 | `element_at_match` | 0.0 | 0.0 | 0.0 | 0.4 | 0.3 | 0.0 |
 | `first_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
 | `first_or_default_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
-| `groupby_average` | 171.8 | 29.2 | 29.3 | 123.7 | 197.4 | — |
-| `groupby_count` | 141.9 | 19.5 | 19.5 | 75.0 | 167.5 | 162.7 |
-| `groupby_first` | 252.6 | 19.5 | 20.2 | 72.2 | 162.7 | — |
-| `groupby_having_count` | 141.8 | 19.5 | 19.5 | 74.8 | 169.1 | — |
-| `groupby_having_hidden_sum` | 175.7 | 23.3 | 22.6 | 118.8 | 192.7 | — |
-| `groupby_having_post_where` | 171.2 | 20.8 | 20.8 | 114.6 | 189.2 | — |
-| `groupby_max` | 173.9 | 24.9 | 25.4 | 120.5 | 193.1 | — |
-| `groupby_min` | 173.7 | 25.0 | 25.1 | 120.0 | 192.9 | — |
-| `groupby_multi_reducer` | 190.8 | 30.2 | 30.6 | 124.9 | 196.2 | — |
-| `groupby_select_order` | 170.9 | 20.8 | 20.8 | 114.8 | 188.6 | — |
-| `groupby_select_sum` | 198.9 | 38.6 | 38.2 | 101.7 | 195.2 | — |
-| `groupby_sum` | 170.8 | 20.8 | 20.8 | 114.9 | 188.4 | 192.8 |
-| `groupby_where_count` | 76.0 | 14.1 | 14.3 | 116.6 | 186.3 | — |
-| `groupby_where_sum` | 86.7 | 14.1 | 14.7 | 116.4 | 186.4 | — |
-| `join_count` | 38.3 | 51.3 | 64.6 | 113.1 | 183.4 | 65.6 |
-| `join_groupby_count` | 157.6 | 77.4 | 88.8 | 177.7 | 230.9 | — |
-| `join_groupby_to_array` | 189.1 | 78.0 | 90.6 | 215.4 | 213.5 | — |
-| `join_probe` | — | — | — | — | — | 47.3 |
-| `join_probe_build` | — | — | — | — | — | 79.1 |
-| `join_select` | 152.6 | 72.5 | 84.7 | 188.7 | 214.4 | — |
-| `join_where_count` | 48.6 | 61.6 | 76.8 | 160.4 | 199.8 | 81.4 |
-| `last_match` | 0.0 | 6.1 | 13.9 | 65.1 | 159.7 | 31.0 |
-| `long_count_aggregate` | 29.1 | 4.1 | 4.1 | 63.4 | 154.3 | 21.2 |
-| `max_aggregate` | 30.7 | 6.0 | 6.8 | 58.6 | 163.1 | 17.0 |
-| `min_aggregate` | 31.2 | 6.0 | 6.9 | 58.7 | 163.6 | 17.0 |
-| `order_by_multi_key` | 348.8 | 272.2 | 282.9 | 458.7 | 449.2 | 334.0 |
-| `order_distinct_take` | 137.8 | 15.9 | 99.3 | 72.5 | 162.8 | 31.3 |
-| `order_reverse_normalized` | 38.1 | 16.3 | 20.0 | 70.7 | 170.6 | — |
-| `order_take_desc` | 38.5 | 16.2 | 20.4 | 70.1 | 170.4 | 33.3 |
+| `groupby_average` | 171.0 | 29.4 | 29.0 | 123.0 | 196.4 | — |
+| `groupby_count` | 142.4 | 19.2 | 19.1 | 74.8 | 167.1 | 164.5 |
+| `groupby_first` | 251.1 | 19.2 | 19.7 | 72.1 | 162.2 | — |
+| `groupby_having_count` | 142.0 | 19.1 | 19.1 | 74.7 | 166.3 | — |
+| `groupby_having_hidden_sum` | 176.6 | 22.3 | 22.3 | 118.0 | 187.9 | — |
+| `groupby_having_post_where` | 173.2 | 20.5 | 20.4 | 114.4 | 187.4 | — |
+| `groupby_max` | 173.5 | 24.9 | 24.8 | 119.6 | 191.4 | — |
+| `groupby_min` | 173.8 | 25.3 | 24.8 | 119.6 | 192.5 | — |
+| `groupby_multi_reducer` | 190.5 | 30.4 | 30.0 | 124.7 | 196.1 | — |
+| `groupby_select_order` | 172.1 | 20.5 | 20.4 | 114.3 | 188.6 | — |
+| `groupby_select_sum` | 199.6 | 38.5 | 38.0 | 101.5 | 194.4 | — |
+| `groupby_sum` | 172.1 | 20.5 | 20.4 | 114.6 | 187.6 | 194.6 |
+| `groupby_where_count` | 76.4 | 14.1 | 14.2 | 115.1 | 185.8 | — |
+| `groupby_where_sum` | 87.5 | 14.2 | 14.5 | 116.0 | 186.7 | — |
+| `join_count` | 38.4 | 51.4 | 63.6 | 112.9 | 183.8 | 65.4 |
+| `join_groupby_count` | 158.4 | 77.8 | 87.8 | 177.4 | 233.1 | — |
+| `join_groupby_to_array` | 189.8 | 78.7 | 89.6 | 214.7 | 214.1 | — |
+| `join_probe` | — | — | — | — | — | 46.9 |
+| `join_probe_build` | — | — | — | — | — | 79.5 |
+| `join_select` | 151.8 | 72.8 | 84.9 | 189.5 | 217.4 | — |
+| `join_where_count` | 39.7 | 61.7 | 78.7 | 160.5 | 199.8 | 81.6 |
+| `last_match` | 0.0 | 5.9 | 14.0 | 65.0 | 159.2 | 31.0 |
+| `long_count_aggregate` | 29.9 | 4.1 | 4.1 | 63.4 | 154.0 | 20.1 |
+| `max_aggregate` | 31.1 | 6.0 | 6.8 | 58.7 | 162.1 | 16.9 |
+| `min_aggregate` | 31.0 | 6.0 | 6.9 | 58.7 | 162.9 | 17.0 |
+| `order_by_multi_key` | 340.9 | 270.9 | 279.5 | 459.2 | 446.7 | 336.4 |
+| `order_distinct_take` | 138.7 | 15.9 | 98.6 | 72.6 | 162.8 | 31.6 |
+| `order_reverse_normalized` | 38.8 | 16.3 | 19.8 | 70.9 | 169.9 | — |
+| `order_take_desc` | 38.5 | 16.3 | 19.9 | 70.1 | 170.8 | 33.3 |
 | `point_lookup` | — | — | — | — | — | 0.0 |
-| `point_lookup_scan` | — | — | — | — | — | 8.4 |
-| `reverse_distinct_by` | 295.5 | 21.3 | 28.0 | 70.9 | 162.2 | — |
-| `reverse_take` | 0.1 | 0.0 | 0.2 | 0.0 | 26.2 | 58.8 |
-| `reverse_take_select` | 0.0 | 0.0 | 0.2 | 0.0 | 26.2 | — |
-| `select_count` | 0.1 | 0.0 | 2.2 | 69.3 | 2.2 | 0.0 |
-| `select_many` | — | 190.7 | — | — | — | — |
-| `select_where` | 207.9 | 11.2 | 19.5 | 195.5 | 188.7 | 37.6 |
-| `select_where_count` | 32.4 | 5.1 | 7.4 | 64.6 | 158.7 | 21.7 |
-| `select_where_order_take` | 36.3 | 12.3 | 15.1 | 72.7 | 164.5 | 34.5 |
-| `select_where_sum` | 37.2 | 7.5 | 7.5 | 66.5 | 164.6 | 23.3 |
-| `single_match` | 0.0 | 2.9 | 5.5 | 58.4 | 151.5 | 22.6 |
-| `skip_take` | 0.5 | 0.1 | 0.2 | 3.0 | 2.8 | 0.3 |
-| `skip_while_match` | 3.5 | 5.3 | 5.3 | 59.9 | 153.1 | 18.3 |
-| `sort_first` | 37.9 | 11.0 | 13.3 | 64.9 | 167.0 | 32.0 |
-| `sort_take` | 38.4 | 16.3 | 20.9 | 70.5 | 171.5 | 33.3 |
-| `sort_take_select` | 38.2 | 16.3 | 20.9 | 71.0 | 170.8 | 33.2 |
-| `sum_aggregate` | 29.6 | 2.1 | 2.1 | 54.4 | 153.0 | 13.5 |
-| `sum_where` | 32.1 | 4.4 | 11.5 | 63.8 | 154.6 | 21.3 |
-| `take_count` | 3.9 | 0.2 | 0.4 | 2.9 | 2.7 | 0.5 |
+| `point_lookup_scan` | — | — | — | — | — | 8.3 |
+| `reverse_distinct_by` | 295.3 | 21.3 | 28.2 | 71.1 | 161.9 | — |
+| `reverse_take` | 0.1 | 0.0 | 0.2 | 0.0 | 26.1 | 58.5 |
+| `reverse_take_select` | 0.0 | 0.0 | 0.2 | 0.0 | 26.1 | — |
+| `select_count` | 0.1 | 0.0 | 2.2 | 68.3 | 2.2 | 0.0 |
+| `select_many` | — | 191.7 | — | — | — | — |
+| `select_where` | 204.1 | 11.2 | 19.3 | 197.1 | 183.4 | 37.7 |
+| `select_where_count` | 32.5 | 5.1 | 7.4 | 64.9 | 156.9 | 22.7 |
+| `select_where_order_take` | 37.1 | 12.3 | 14.8 | 72.8 | 165.4 | 35.3 |
+| `select_where_sum` | 37.1 | 7.5 | 7.5 | 66.5 | 161.9 | 25.0 |
+| `single_match` | 0.0 | 2.9 | 5.5 | 58.2 | 151.2 | 22.6 |
+| `skip_take` | 0.5 | 0.1 | 0.2 | 3.1 | 2.8 | 0.3 |
+| `skip_while_match` | 3.5 | 5.3 | 5.3 | 60.0 | 153.2 | 18.2 |
+| `sort_first` | 38.4 | 11.1 | 13.3 | 65.1 | 166.7 | 32.2 |
+| `sort_take` | 38.7 | 16.3 | 20.0 | 70.8 | 170.4 | 33.1 |
+| `sort_take_select` | 38.7 | 16.3 | 20.1 | 71.3 | 170.6 | 33.3 |
+| `sum_aggregate` | 30.5 | 2.1 | 2.1 | 54.6 | 153.2 | 13.4 |
+| `sum_where` | 33.2 | 4.3 | 4.2 | 63.4 | 154.6 | 20.4 |
+| `take_count` | 3.8 | 0.2 | 0.4 | 2.9 | 2.7 | 0.5 |
 | `take_count_filtered` | 1.1 | 0.2 | 0.2 | 1.3 | 1.1 | 0.3 |
 | `take_sum_aggregate` | 0.8 | 0.1 | 0.1 | 0.6 | 0.5 | 0.1 |
 | `take_where_count` | 0.9 | 0.1 | 0.1 | 0.7 | 0.6 | 0.2 |
-| `take_while_match` | 7.8 | 2.4 | 2.4 | 30.2 | 75.6 | 16.4 |
-| `to_array_filter` | 70.2 | 11.8 | 11.8 | 71.5 | 165.1 | 29.0 |
-| `where_join_count` | 41.2 | 29.1 | 41.7 | 132.7 | 168.6 | — |
-| `zip_count_pred` | 39.3 | 15.9 | — | 315.0 | 321.2 | — |
-| `zip_dot_product` | 46.2 | 12.6 | 10.6 | 309.2 | 319.0 | — |
-| `zip_dot_product_3arg` | 46.2 | 12.8 | — | 309.4 | 320.7 | — |
-| `zip_reverse_to_array` | — | 31.7 | — | 345.0 | 353.4 | — |
+| `take_while_match` | 7.8 | 2.4 | 2.4 | 30.1 | 76.2 | 16.4 |
+| `to_array_filter` | 71.1 | 11.8 | 11.7 | 71.3 | 164.3 | 28.9 |
+| `to_table` | — | — | — | — | — | 32.5 |
+| `to_table_staged` | — | — | — | — | — | 68.3 |
+| `where_join_count` | 41.5 | 28.8 | 40.9 | 132.1 | 167.0 | — |
+| `zip_count_pred` | 39.5 | 15.8 | — | 318.5 | 320.2 | — |
+| `zip_dot_product` | 47.2 | 12.6 | 10.8 | 312.7 | 318.6 | — |
+| `zip_dot_product_3arg` | 47.1 | 12.7 | — | 312.8 | 317.5 | — |
+| `zip_reverse_to_array` | — | 31.4 | — | 348.8 | 352.2 | — |
 
 ## JIT
 
 | Benchmark | SQL (m1) | Array (m3f) | Decs (m4) | XML fold (m5f) | JSON fold (m6f) | Table fold (m7) |
 |---|---:|---:|---:|---:|---:|---:|
-| `aggregate_match` | 35.0 | 0.3 | 0.6 | 21.7 | 27.1 | 13.5 |
-| `all_match` | 27.9 | 0.3 | 0.2 | 18.1 | 26.2 | 13.5 |
+| `aggregate_match` | 35.1 | 0.3 | 0.6 | 22.8 | 26.2 | 13.5 |
+| `all_match` | 27.9 | 0.3 | 0.2 | 17.5 | 25.3 | 13.6 |
 | `any_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
-| `average_aggregate` | 30.5 | 1.0 | 3.6 | 18.1 | 25.7 | 13.5 |
-| `bare_last` | — | 0.4 | 0.0 | 0.0 | 0.0 | 17.2 |
-| `bare_order_where` | 188.1 | 35.3 | 35.5 | 106.7 | 53.3 | 79.0 |
-| `chained_select_collapse` | — | 1.1 | 1.1 | 20.6 | 33.5 | 14.1 |
-| `chained_where` | 36.1 | 0.6 | 0.8 | 35.7 | 32.0 | 17.7 |
-| `contains_match` | 0.0 | 0.2 | 0.1 | 17.5 | 9.2 | 4.7 |
-| `count_aggregate` | 29.6 | 0.3 | 0.6 | 20.6 | 26.4 | 13.5 |
-| `cross_join` | 5976.1 | 733.7 | — | 837.5 | 767.7 | — |
+| `average_aggregate` | 30.5 | 1.0 | 3.5 | 17.4 | 24.7 | 13.5 |
+| `bare_last` | — | 0.4 | 0.0 | 0.0 | 0.0 | 17.1 |
+| `bare_order_where` | 186.2 | 34.1 | 35.0 | 104.9 | 52.8 | 78.8 |
+| `chained_select_collapse` | — | 1.1 | 1.1 | 20.6 | 33.5 | 14.0 |
+| `chained_where` | 36.9 | 0.6 | 0.8 | 34.7 | 31.3 | 17.8 |
+| `contains_match` | 0.0 | 0.2 | 0.1 | 17.5 | 8.9 | 4.7 |
+| `count_aggregate` | 29.7 | 0.3 | 0.6 | 17.5 | 25.5 | 13.4 |
+| `cross_join` | 5965.9 | 731.0 | — | 833.2 | 770.0 | — |
 | `decs_count_bare_pred` | — | — | 0.6 | — | — | — |
-| `distinct_by_count` | 41.2 | 1.1 | 1.1 | 20.6 | 33.6 | 14.1 |
-| `distinct_by_order_take` | 239.4 | 1.7 | 2.6 | 47.4 | 39.2 | 30.1 |
-| `distinct_by_order_to_array` | 239.3 | 1.7 | 2.7 | 47.4 | 38.9 | 30.1 |
-| `distinct_count` | 41.3 | 1.1 | 1.1 | 20.5 | 33.7 | 14.1 |
-| `distinct_count_pred` | 252.4 | 1.1 | 1.3 | 37.4 | 43.4 | 14.1 |
+| `distinct_by_count` | 41.7 | 1.1 | 1.1 | 20.6 | 33.5 | 14.0 |
+| `distinct_by_order_take` | 239.3 | 1.7 | 2.6 | 46.3 | 38.8 | 30.1 |
+| `distinct_by_order_to_array` | 240.2 | 1.7 | 2.7 | 46.4 | 38.7 | 30.3 |
+| `distinct_count` | 41.6 | 1.1 | 1.1 | 20.6 | 33.5 | 14.0 |
+| `distinct_count_pred` | 251.7 | 1.1 | 1.3 | 37.7 | 43.8 | 14.0 |
 | `distinct_take` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
 | `element_at_match` | 0.0 | 0.0 | 0.0 | 0.1 | 0.0 | 0.0 |
 | `first_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
 | `first_or_default_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
-| `groupby_average` | 170.7 | 1.6 | 1.9 | 35.9 | 44.3 | — |
-| `groupby_count` | 141.5 | 1.3 | 1.5 | 20.6 | 32.7 | 42.9 |
-| `groupby_first` | 252.2 | 1.3 | 2.3 | 20.6 | 33.3 | — |
-| `groupby_having_count` | 141.3 | 1.3 | 1.5 | 20.6 | 33.3 | — |
-| `groupby_having_hidden_sum` | 175.6 | 1.5 | 1.7 | 36.0 | 45.2 | — |
-| `groupby_having_post_where` | 171.9 | 1.6 | 2.0 | 35.9 | 44.3 | — |
-| `groupby_max` | 172.8 | 1.5 | 1.9 | 36.0 | 45.9 | — |
-| `groupby_min` | 173.4 | 1.5 | 1.8 | 35.9 | 45.9 | — |
-| `groupby_multi_reducer` | 190.6 | 1.6 | 2.0 | 36.2 | 46.1 | — |
-| `groupby_select_order` | 170.6 | 1.4 | 1.9 | 35.7 | 44.2 | — |
-| `groupby_select_sum` | 198.6 | 2.8 | 3.2 | 32.2 | 39.7 | — |
-| `groupby_sum` | 170.3 | 1.4 | 1.7 | 35.8 | 44.2 | 51.5 |
-| `groupby_where_count` | 76.0 | 0.9 | 1.3 | 36.1 | 41.8 | — |
-| `groupby_where_sum` | 86.7 | 0.9 | 1.3 | 36.0 | 41.7 | — |
-| `join_count` | 38.3 | 10.9 | 11.7 | 43.5 | 71.4 | 33.1 |
-| `join_groupby_count` | 157.6 | 18.2 | 20.1 | 68.5 | 89.9 | — |
-| `join_groupby_to_array` | 189.7 | 17.6 | 19.5 | 80.3 | 36.2 | — |
-| `join_probe` | — | — | — | — | — | 24.2 |
+| `groupby_average` | 171.5 | 1.5 | 1.9 | 35.5 | 45.5 | — |
+| `groupby_count` | 142.1 | 1.3 | 1.5 | 20.6 | 33.8 | 42.7 |
+| `groupby_first` | 251.9 | 1.3 | 2.3 | 20.6 | 34.4 | — |
+| `groupby_having_count` | 141.3 | 1.3 | 1.5 | 20.6 | 33.9 | — |
+| `groupby_having_hidden_sum` | 176.9 | 1.5 | 1.7 | 35.5 | 45.2 | — |
+| `groupby_having_post_where` | 171.1 | 1.4 | 1.9 | 35.5 | 44.1 | — |
+| `groupby_max` | 173.4 | 1.5 | 1.9 | 35.5 | 45.8 | — |
+| `groupby_min` | 172.8 | 1.5 | 1.8 | 35.6 | 45.8 | — |
+| `groupby_multi_reducer` | 190.2 | 1.6 | 1.9 | 35.8 | 46.1 | — |
+| `groupby_select_order` | 170.9 | 1.4 | 1.9 | 35.4 | 44.3 | — |
+| `groupby_select_sum` | 200.0 | 2.8 | 3.2 | 31.8 | 39.9 | — |
+| `groupby_sum` | 170.9 | 1.4 | 1.6 | 35.5 | 44.3 | 51.2 |
+| `groupby_where_count` | 76.3 | 0.9 | 1.3 | 35.6 | 41.8 | — |
+| `groupby_where_sum` | 87.6 | 0.9 | 1.3 | 35.6 | 41.9 | — |
+| `join_count` | 38.2 | 10.9 | 11.8 | 42.6 | 71.5 | 32.2 |
+| `join_groupby_count` | 156.9 | 17.6 | 19.5 | 68.3 | 89.8 | — |
+| `join_groupby_to_array` | 189.8 | 17.5 | 19.4 | 79.3 | 36.1 | — |
+| `join_probe` | — | — | — | — | — | 24.0 |
 | `join_probe_build` | — | — | — | — | — | 38.1 |
-| `join_select` | 95.4 | 19.7 | 21.7 | 75.0 | 94.3 | — |
-| `join_where_count` | 39.4 | 18.9 | 20.8 | 64.4 | 78.4 | 37.9 |
-| `last_match` | 0.0 | 0.5 | 1.4 | 18.9 | 26.8 | 22.9 |
-| `long_count_aggregate` | 29.0 | 0.3 | 0.6 | 20.5 | 26.4 | 13.5 |
-| `max_aggregate` | 30.7 | 0.3 | 0.5 | 18.4 | 27.7 | 13.5 |
-| `min_aggregate` | 30.7 | 0.3 | 0.5 | 18.4 | 27.7 | 13.5 |
-| `order_by_multi_key` | 252.6 | 53.4 | 55.0 | 125.4 | 71.9 | 129.1 |
-| `order_distinct_take` | 137.9 | 1.1 | 75.7 | 20.9 | 36.0 | 14.0 |
-| `order_reverse_normalized` | 38.2 | 0.7 | 1.4 | 24.6 | 28.5 | — |
-| `order_take_desc` | 38.1 | 0.7 | 1.4 | 24.6 | 28.4 | 17.7 |
+| `join_select` | 94.0 | 19.6 | 21.7 | 73.8 | 95.2 | — |
+| `join_where_count` | 39.8 | 18.9 | 20.8 | 63.5 | 78.3 | 37.8 |
+| `last_match` | 0.0 | 0.5 | 1.4 | 18.2 | 25.9 | 22.9 |
+| `long_count_aggregate` | 29.2 | 0.3 | 0.6 | 17.5 | 25.5 | 13.4 |
+| `max_aggregate` | 31.0 | 0.3 | 0.5 | 17.4 | 27.1 | 13.4 |
+| `min_aggregate` | 31.1 | 0.3 | 0.5 | 17.4 | 27.0 | 13.5 |
+| `order_by_multi_key` | 250.0 | 53.1 | 54.7 | 123.6 | 71.3 | 129.4 |
+| `order_distinct_take` | 138.1 | 1.1 | 75.3 | 20.9 | 35.7 | 14.0 |
+| `order_reverse_normalized` | 38.5 | 0.7 | 1.3 | 22.0 | 27.7 | — |
+| `order_take_desc` | 38.2 | 0.7 | 1.3 | 22.0 | 27.5 | 17.8 |
 | `point_lookup` | — | — | — | — | — | 0.0 |
 | `point_lookup_scan` | — | — | — | — | — | 6.0 |
-| `reverse_distinct_by` | 295.4 | 1.5 | 3.2 | 20.6 | 34.6 | — |
-| `reverse_take` | 0.0 | 0.0 | 0.0 | 0.0 | 3.7 | 26.9 |
+| `reverse_distinct_by` | 295.7 | 1.5 | 3.2 | 20.5 | 34.4 | — |
+| `reverse_take` | 0.0 | 0.0 | 0.0 | 0.0 | 3.8 | 26.9 |
 | `reverse_take_select` | 0.0 | 0.0 | 0.0 | 0.0 | 3.9 | — |
-| `select_count` | 0.1 | 0.0 | 0.0 | 66.0 | 0.0 | 0.0 |
-| `select_many` | — | 62.7 | — | — | — | — |
-| `select_where` | 109.1 | 4.1 | 5.3 | 76.2 | 23.0 | 28.1 |
-| `select_where_count` | 32.3 | 0.3 | 0.6 | 18.5 | 27.2 | 13.4 |
-| `select_where_order_take` | 36.5 | 0.7 | 1.4 | 19.0 | 27.9 | 23.0 |
-| `select_where_sum` | 37.1 | 0.4 | 0.6 | 18.0 | 26.3 | 13.4 |
-| `single_match` | 0.0 | 0.4 | 1.1 | 46.3 | 23.2 | 17.4 |
-| `skip_take` | 0.3 | 0.0 | 0.0 | 1.3 | 0.2 | 0.1 |
-| `skip_while_match` | 3.5 | 0.4 | 0.4 | 45.8 | 22.7 | 13.3 |
-| `sort_first` | 37.9 | 0.4 | 1.3 | 18.1 | 27.5 | 17.3 |
-| `sort_take` | 37.9 | 0.7 | 1.4 | 24.6 | 28.3 | 17.8 |
-| `sort_take_select` | 37.8 | 0.7 | 1.4 | 24.6 | 28.4 | 17.8 |
-| `sum_aggregate` | 29.9 | 0.3 | 0.1 | 23.2 | 25.6 | 13.5 |
-| `sum_where` | 32.1 | 0.3 | 0.6 | 18.5 | 27.2 | 13.4 |
-| `take_count` | 1.8 | 0.1 | 0.1 | 1.2 | 0.3 | 0.2 |
+| `select_count` | 0.1 | 0.0 | 0.0 | 67.0 | 0.0 | 0.0 |
+| `select_many` | — | 62.5 | — | — | — | — |
+| `select_where` | 110.7 | 4.1 | 5.3 | 74.8 | 22.1 | 27.9 |
+| `select_where_count` | 32.6 | 0.3 | 0.6 | 17.4 | 26.3 | 13.4 |
+| `select_where_order_take` | 36.7 | 0.7 | 1.3 | 18.4 | 27.3 | 23.1 |
+| `select_where_sum` | 37.2 | 0.4 | 0.6 | 17.4 | 25.6 | 13.4 |
+| `single_match` | 0.0 | 0.4 | 1.1 | 46.2 | 22.3 | 17.3 |
+| `skip_take` | 0.3 | 0.0 | 0.0 | 1.3 | 0.3 | 0.2 |
+| `skip_while_match` | 3.4 | 0.4 | 0.4 | 46.0 | 21.8 | 13.2 |
+| `sort_first` | 38.4 | 0.4 | 1.3 | 17.4 | 26.7 | 17.2 |
+| `sort_take` | 38.6 | 0.7 | 1.3 | 22.0 | 27.9 | 17.8 |
+| `sort_take_select` | 38.3 | 0.7 | 1.3 | 21.9 | 27.7 | 17.8 |
+| `sum_aggregate` | 30.6 | 0.3 | 0.1 | 17.7 | 24.9 | 13.5 |
+| `sum_where` | 33.0 | 0.3 | 0.6 | 17.4 | 26.3 | 13.4 |
+| `take_count` | 1.9 | 0.1 | 0.1 | 1.2 | 0.3 | 0.2 |
 | `take_count_filtered` | 1.1 | 0.0 | 0.0 | 0.4 | 0.1 | 0.1 |
 | `take_sum_aggregate` | 0.8 | 0.0 | 0.0 | 0.2 | 0.0 | 0.1 |
 | `take_where_count` | 0.9 | 0.0 | 0.0 | 0.2 | 0.0 | 0.1 |
-| `take_while_match` | 7.8 | 0.2 | 0.3 | 17.1 | 9.3 | 13.5 |
-| `to_array_filter` | 47.4 | 3.3 | 3.3 | 21.5 | 35.1 | 20.2 |
-| `where_join_count` | 39.4 | 5.8 | 6.8 | 49.7 | 42.3 | — |
-| `zip_count_pred` | 39.4 | 0.1 | — | 117.0 | 33.9 | — |
-| `zip_dot_product` | 46.5 | 0.1 | 0.1 | 117.1 | 33.8 | — |
-| `zip_dot_product_3arg` | 46.4 | 0.1 | — | 116.9 | 33.7 | — |
-| `zip_reverse_to_array` | — | 4.5 | — | 128.4 | 51.3 | — |
+| `take_while_match` | 7.8 | 0.2 | 0.3 | 17.4 | 8.9 | 13.4 |
+| `to_array_filter` | 48.9 | 3.2 | 3.3 | 20.8 | 35.5 | 20.2 |
+| `to_table` | — | — | — | — | — | 28.8 |
+| `to_table_staged` | — | — | — | — | — | 41.6 |
+| `where_join_count` | 41.3 | 5.7 | 6.8 | 48.8 | 41.8 | — |
+| `zip_count_pred` | 39.8 | 0.1 | — | 115.3 | 33.8 | — |
+| `zip_dot_product` | 47.3 | 0.1 | 0.1 | 115.4 | 33.8 | — |
+| `zip_dot_product_3arg` | 47.1 | 0.1 | — | 115.3 | 33.7 | — |
+| `zip_reverse_to_array` | — | 4.5 | — | 127.0 | 51.4 | — |
 <!-- BENCH:TABLES END -->
 
 ## Missing lanes (the `—` cells)
@@ -214,9 +220,10 @@ Each empty cell's reason is also in the bench `.das` file's comment; SQL gaps ar
 - **`reverse_distinct_by` m4 / m5f** — array uses the backward-index walk; non-array sources fuse the forward keep-last splice (decs 27.6/5.0, XML 74.5/22.2); SQL uses MAX(pk).
 - **`order_distinct_take` m4 vs m3f** — `unique_key` hashes workhorse keys directly (array `int`) but string-interpolates structs (decs `DecsBrand`); the gap is per-element string hashing, not decs-walk. `distinct_by_count` is the key-based variant (m4 parity).
 - **`zip_reverse_to_array` / `zip_*` SQL / Decs** — `reverse` has no SQL order key; zip is not relational / not expressible over one archetype walk. By design. (XML/JSON zip lanes are lit, partially fused.)
-- **m7 absent families** — `zip_*` / `cross_join` (lockstep over an unordered slot walk is meaningless), `select_many` (flat fixture, no nested array field), `order_reverse_normalized` / `reverse_take_select` / `reverse_distinct_by` (no backward slot walk; `reverse_take` is kept as the single deferral marker), the group-by tail beyond `groupby_count`/`groupby_sum` (table group_by fusion is staged — see `LINQ_TO_TABLE.md`; the two marker cells track the tier-2 cost until then) plus the join-composition lanes (`join_select` / `where_join_count` would fuse today but aren't instantiated; `join_groupby_*` needs the staged group_by), `decs_count_bare_pred` (decs-only).
+- **m7 absent families** — `zip_*` / `cross_join` (lockstep over an unordered slot walk is meaningless), `select_many` (flat fixture, no nested array field), `order_reverse_normalized` / `reverse_take_select` / `reverse_distinct_by` (no backward slot walk; `reverse_take` is kept as the single deferral marker), the group-by tail beyond `groupby_count`/`groupby_sum` (table group_by fusion is a named deferred edge — see `LINQ_TO_TABLE.md`; the two marker cells track the tier-2 cost) plus the join-composition lanes (`join_select` / `where_join_count` would fuse today but aren't instantiated; `join_groupby_*` needs the deferred group_by), `decs_count_bare_pred` (decs-only).
 - **`point_lookup` / `point_lookup_scan` non-m7** — m7-only pair: only a table source has a key to probe (`where(kv.key == X)` + terminator → `key_exists` / `tab?[X]`, O(1)); the `_scan` twin forces the same query through the walk (compound `&&` predicate declines the probe) to show the gap. Other sources have no analog by design.
 - **`join_probe` / `join_probe_build` non-m7** — m7-only A/B pair: a table srcB joined on its bare key probes the user's table per lead row (no internal join hash, no build loop); the `_build` twin feeds the identical rows pre-materialized to a kv array, forcing the hashed build. Other sources have no keyed-srcB analog by design.
+- **`to_table` / `to_table_staged` non-m7** — m7-only A/B pair for the `to_table()` sink: the fused insert-loop lands the kv chain straight in the result table (reserve from O(1) length); the `_staged` twin materializes the same projection to an array first, then converts via the consuming builtin `to_table_move` — the shape every chain had before the sink arm. The sink itself works over any direct-loop source (the array lane fuses it too); only the bench pair is table-scoped.
 
 ## Accepted floors
 
diff --git a/benchmarks/sql/table.das b/benchmarks/sql/table.das
index e33e7ee64..d0afb6557 100644
--- a/benchmarks/sql/table.das
+++ b/benchmarks/sql/table.das
@@ -687,3 +687,30 @@ def to_array_filter_m7(b : B?) {
         }
     }
 }
+
+[benchmark]
+def to_table_m7(b : B?) {
+    // fused insert-loop sink: the kv chain lands straight in the result table, reserve from O(1) length
+    b |> run("to_table", N) {
+        var tab <- _fold(unsafe(each_kv(g_t)) |> _select((_.key => _.value.price)) |> to_table())
+        b |> accept(length(tab))
+        if (empty(tab)) {
+            b->failNow()
+        }
+        delete tab
+    }
+}
+
+[benchmark]
+def to_table_staged_m7(b : B?) {
+    // staged baseline: materialize the kv tuples to an array, then convert — the shape without the sink arm
+    b |> run("to_table_staged", N) {
+        var rows <- _fold(unsafe(each_kv(g_t)) |> _select((_.key => _.value.price)) |> to_array())
+        var tab <- to_table_move(rows)
+        b |> accept(length(tab))
+        if (empty(tab)) {
+            b->failNow()
+        }
+        delete tab
+    }
+}
diff --git a/daslib/linq.das b/daslib/linq.das
index 206ac14d7..2906aa443 100644
--- a/daslib/linq.das
+++ b/daslib/linq.das
@@ -150,6 +150,47 @@ def to_table(a : array<auto(TT)>; key : block<(v : TT -&) : auto>; elementSelect
     return <- to_table_impl_const(a, type<TT -const -&>, key, elementSelector)
 }
 
+def to_table(var a : iterator<tuple<auto(keyT); auto(valT)> const>) : table<keyT -const; valT> {
+    //! Collects an iterator of `(key, value)` tuples (e.g. an `each_kv` chain or a `k => v`
+    //! projection) into a `table<keyT; valT>`. Duplicate keys keep the last occurrence.
+    var tab : table<keyT -const; valT>
+    for (x in a) {
+        tab[x._0] := x._1
+    }
+    return <- tab
+}
+
+def to_table(var a : iterator<auto(keyT) const>) : table<keyT -const> {
+    //! Collects an iterator of bare hashable keys into the `table<keyT>` set form.
+    var tab : table<keyT -const>
+    for (at in a) {
+        __builtin_table_set_insert(tab, at)
+    }
+    return <- tab
+}
+
+def to_table(a : array<tuple<auto(keyT); auto(valT)>>) : table<keyT -const; valT> {
+    //! Collects an array of `(key, value)` tuples into a `table<keyT; valT>` without consuming
+    //! the source (values are cloned). Duplicate keys keep the last occurrence.
+    var tab : table<keyT -const; valT>
+    tab |> reserve(length(a))
+    for (x in a) {
+        tab[x._0] := x._1
+    }
+    return <- tab
+}
+
+def to_table(a : array<auto(keyT)>) : table<keyT -const> {
+    //! Collects an array of bare hashable keys into the `table<keyT>` set form without
+    //! consuming the source.
+    var tab : table<keyT -const>
+    tab |> reserve(length(a))
+    for (at in a) {
+        __builtin_table_set_insert(tab, at)
+    }
+    return <- tab
+}
+
 [unused_argument(tt)]
 def private concat_impl(var a; var b; tt : auto(TT); reserveSize : int) : array<TT -const -&> {
     //! Concatenates two arrays or iterators
diff --git a/daslib/linq_fold.md b/daslib/linq_fold.md
index 67388d8db..741099e13 100644
--- a/daslib/linq_fold.md
+++ b/daslib/linq_fold.md
@@ -661,6 +661,7 @@ The imperative code has a few subtle co-occurrence rules that may not map cleanl
 - **2026-05-31 (deferred materialization — `last` + group-by `first`)** — extends the element-handle deferral to the two remaining survivors-≪-N reducers: the full-walk `last`/`last_or_default` terminator (in `emit_early_exit_lane`) and `first`-per-group inside `plan_group_by_core`. `last` cloned the whole `Car` (`lst := it`) on *every* match and kept only the final one; over a deferring source it now stores the node **handle** per match and runs `materialize_handle` once, for the single survivor. `group_by(brand) |> select((key, first per group))` pinned the whole row (`slot := it`) in `mk_reducer_first`, forcing `wrap_source_loop` to build every element; a new `mk_reducer_first_deferred` materializes from the handle *inside the table miss-branch*, so the walk field-prunes to just the group key and `build_xml_row` runs only once per distinct group. Both ride the same four `SourceAdapter` hooks — only `XmlAdapter` defers; `array`/`decs` pass `null`/no-defer and stay byte-identical (the `emit_reducer_branches` adapter param defaults to `null`; the group-by gate also requires the bind be the raw element — `itName == bind_name`, i.e. no upstream `_select` rebinds it — since the handle yields the raw row). **Design validated by hand-coded micro-bench first** (the `last_match` / `groupby_first` lanes in `benchmarks/micro/sort_distinct_take_shapes.das`). Wins (m5f INTERP / JIT, string clones 100 000 → K): `last_match` 219 → 65 / 21 (K=1), `groupby_first` 339 → 72 / 22 (K=#brands). Closes `groupby_first` (the last item on the prior entry's floor list). Still not deferred: `bare_order_where` / `order_reverse_normalized` (all rows out), `reverse_distinct_by` (tier-2, no fused emit).
 - **2026-05-31 (forward keep-last — `reverse |> distinct[_by]` over forward sources)** — the only buffered shape still falling to tier-2 over a forward source. `reverse() |> distinct_by(K) |> to_array()` means "keep the LAST forward row per key, output in reverse-discovery order." The sole fused emit was `emit_reverse_backward_walk_dset_gate` — a backward **index** walk (`src[len-1-k]`) gated `array_source`, so XML / decs / plain iterators (forward-only, no random access) cascaded: `reverse()` materialized all N, then `distinct_by` walked. New `emit_reverse_distinct_forward_keeplast` (R-2b, gated by the exact complement `non_array_source`) does a single forward pass instead — `table<key; (seq, val)>`, **OVERWRITE** the slot per element (so it ends at the last forward occurrence + its seq), then sort survivors by **descending seq** (`build_surrogate_cmp(true)`) and emit. Output-identical to the backward walk (descending forward-index of each last occurrence), proven by parity vs both `m3f` (array backward walk) and the tier-2 cascade. It rides `emit_terminator_lane` → `wrap_source_loop`, so it's source-generic: **XML defers** (the table holds `(seq, xml_node)` and `build_xml_row` runs only for the K survivors — field-pruned to the key); **decs / iterator** store the full element (no handle), winning single-pass over the cascade's reverse-buffer + second walk. `ctx.top` is `null` for decs (bridge-driven), so `elemType` falls back to `ctx.src->element_type()`; arrays still match the backward-walk row first (registered earlier), so they're byte-identical. **Design validated by hand-coded micro-bench first** (the `reverse_distinct_by` lane in `benchmarks/micro/sort_distinct_take_shapes.das`: INTERP 405.8 → 88.6, JIT 162.6 → 37.0, string clones 100 000 → #keys). Wins: `reverse_distinct_by` m5f **429 → 74 INTERP / 166.6 → 22 JIT** (clones 100 000 → 5), and the previously-`—` decs **m4 lights up at 27.7 / 5.0** (near the array fast path). Closes `reverse_distinct_by` — the last forward-source buffered floor.
 - **2026-06-11 (table joins — adapter-generalized `emit_array_join` + table-srcB probe)** — table-arc stage 5 (branch `bbatkin/linq-table-each-kv`; plan: `benchmarks/sql/LINQ_TO_TABLE.md`). Two halves. (1) **Lead generalization**: `emit_array_join` no longer hand-rolls its `for (tup_a in srcA)` — the lead loop, bind name, and lead invoke-param spelling come from the adapter (`wrap_source_loop(LoopDispatch(Each=null))` / `bind_name(at)` / new `SourceAdapter.invoke_param_type()` capability, default `invoke_src_param_type(arrayTop())`), so `TableAdapter` just sets `can_join() = true` and routes `emit_join_hook` to the same emitter: a table-lead join walks the kv usage-pruned slot iterator(s) — a join body touching only `c.value.*` walks `values(tab)` alone — and group joins stay outer over every slot. decs/xml/json hooks untouched (nested-callback walks). (2) **Table-srcB probe**: when the join's srcb is `each_kv(tab)` / `keys(set)` joined on its **bare key** (`join_srcb_table_call` + `join_keyb_is_bare_key` on the peeled keyb), the emitter skips the internal `table<KEY; array<TUPB>>` + build loop entirely — srcB binds the user's table (const param) and the per-A probe is a key lookup, usage-pruned like the point-lookup fold (count-no-where / key-only → `key_exists`, value shapes → by-ref bind off `unsafe(tab?[k])`, whole-pair → kv-tuple bind). Unique table keys ⇒ probe ≡ hash semantics exactly; a bare field read is pure by construction so skipping keyb's per-B evaluation is unobservable; non-bare keybs and `group_join` (result consumes the whole bucket) keep the hashed build. Plumbing: per-pair statements factored into `build_join_pair_core` (`JoinPairCore`), shared by `build_join_standalone_pieces` (keeps the group-join arm + `get`-bucket wrap — hash-mode AST unchanged for the decs/xml/json callers) and the new `build_join_probe_pieces`. m7: `join_count` / `join_where_count` (table lead) leave tier-2; new `join_probe` vs `join_probe_build` A/B lanes.
+- **2026-06-11 (`to_table` sink — fused insert-loop terminator)** — table-arc stage 6 (branch `bbatkin/linq-table-each-kv`; plan: `benchmarks/sql/LINQ_TO_TABLE.md`). Two layers. (1) **Tier-2 surface** (`daslib/linq.das`): selector-free `to_table` over iterators and arrays — `iterator<tuple<K;V> const>` → `table<K;V>` map (insert via `tab[x._0] := x._1`, builtin `to_table` clone semantics), `iterator<K const>` → `table<K>` set (`__builtin_table_set_insert`), plus borrowing `array<tuple<K;V>>` / `array<K>` forms with reserve (builtin only had the consuming `to_table_move` for dynamic arrays). The iterator params are **const-qualified** (`tuple<…> const` / `auto(keyT) const`) — the 50609 mangler-ICE defuse — so the `-const` flavor from `each_kv` chains and the `-&` flavor from `to_sequence` converge on one instantiation. The named kv tuple (`tuple<key:K; value:V>`) matches the positional `tuple<auto;auto>` generic directly. Duplicate keys keep the last occurrence (das insert semantics, not C#'s throw). (2) **Fused emit**: `to_table` joins `loop_terminator_family` + `classify_terminator`'s ARRAY (materializer) lane; the new arm in `emit_loop_or_count_lane` rides `emit_fold_array_lane` via a new `FoldArraySpec.bufDeclStmt` slot (replaces the array buffer decl with `var acc : table<…>`) — where/select/ranges/post-take-where plumbing all shared. Per-element insert by shape: a `(k => v)` `ExprMakeTuple` projection **splits** so key and value each evaluate exactly once (`acc[k] = v` direct, no tuple temp); other projections bind `let kvb := proj` once then index; pass-through spells the kv access with the element tuple's **real field names** (`.key`/`.value`) so the kv usage-pruner maps them (positional `._0` would not bind) — a bare `each_kv(tab).to_table()` is a reserve-ahead table clone through the pruned walk, and `keys(tab)` chains land in the set form via `insert`. Reserve fires only on unfiltered walks (`can_reserve_by_length` + no where — a thinned table over-reserves hash buckets, stricter than the array arm), with the take-min variant. Map-vs-set falls out of the terminator call's resolved type (`secondType == void`). Declines that keep tier-2: the 3-arg selector `to_table(key, elementSelector)`, decs sources (explicit guard in `emit_loop_or_count_lane_decs` — its implicit-to_array fall-through would mis-emit an array for a table-typed expr), MakeTuple projections of arity ≠ 2. m7: `to_table` 32.3 vs `to_table_staged` (fused-to_array + builtin `to_table_move`) 71.5 ns/elem INTERP (~2.2×).
 
 ## Open questions
 
diff --git a/daslib/linq_fold_common.das b/daslib/linq_fold_common.das
index 3037749f2..dbe37f40a 100644
--- a/daslib/linq_fold_common.das
+++ b/daslib/linq_fold_common.das
@@ -427,7 +427,7 @@ var alias_table : table<string; array<string>> <- {
                                      "min_max_average", "min_max_average_by",
                                      "any", "all", "contains", "first", "first_or_default",
                                      "last", "last_or_default", "single", "single_or_default",
-                                     "element_at", "element_at_or_default"],
+                                     "element_at", "element_at_or_default", "to_table"],
     // PR D1 — order-by-with-key (excludes bare `order` / `order_descending` which lack a key arg).
     // Used by plan_group_by's trailing_order slot.
     "order_by_family"            => ["order_by", "order_by_descending"]
@@ -956,8 +956,9 @@ enum LinqLane {
 def classify_terminator(name : string) : LinqLane {
     if (name == "count") return LinqLane.COUNTER
     // take/skip/take_while/skip_while trailing (after to_array strip) → ARRAY lane with implicit materialization.
+    // to_table is a materializer like the no-terminator (to_array) shape — same ARRAY lane, table buffer.
     if (name == "where_" || name == "select" || name == "take" || name == "skip"
-            || name == "take_while" || name == "skip_while") return LinqLane.ARRAY
+            || name == "take_while" || name == "skip_while" || name == "to_table") return LinqLane.ARRAY
     if (name == "sum" || name == "min" || name == "max" || name == "average" || name == "long_count") return LinqLane.ACCUMULATOR
     // EARLY_EXIT is also the dispatch lane for full-walk single-return terminators (last/single/element_at/aggregate) — same emit_early_exit_lane shape, different per-op state.
     if (name == "first" || name == "first_or_default" || name == "any" || name == "all" || name == "contains"
@@ -1354,6 +1355,7 @@ struct FoldArraySpec {
     prologueStmts    : array<Expression?>   // BEFORE bufDecl — takeN bind, etc.
     bufElemType      : TypeDeclPtr
     bufName          : string
+    bufDeclStmt      : Expression?          // non-null replaces the default `var buf : array<bufElemType>` decl (e.g. a table buffer for to_table)
     postBufDeclStmts : array<Expression?>   // AFTER bufDecl + optional dsetDecl, BEFORE reserve — e.g. early-return guards (bounded_heap)
     reserveStmts     : array<Expression?>   // AFTER postBufDecl, BEFORE source-loop — caller composes
     preCondStmts    : array<Expression?>   // bound-vars INSIDE for-loop body, OUTSIDE if-gate
@@ -1429,8 +1431,12 @@ def emit_fold_array_lane(var spec : FoldArraySpec; var adapter : SourceAdapter?;
     bodyStmts |> push_from(spec.prologueStmts)
     let bufName = spec.bufName
     var bufElemType = spec.bufElemType
-    bodyStmts |> push <| qmacro_expr() {
-        var $i(bufName) : array<$t(bufElemType)>
+    if (spec.bufDeclStmt != null) {
+        bodyStmts |> push <| spec.bufDeclStmt
+    } else {
+        bodyStmts |> push <| qmacro_expr() {
+            var $i(bufName) : array<$t(bufElemType)>
+        }
     }
     if (spec.distinctGate != null) {
         let dg = spec.distinctGate
@@ -2366,6 +2372,100 @@ def emit_loop_or_count_lane(var c : Captures; var ctx : EmitCtx; at : LineInfo)
         prepend_binds(stmts, intermediateBinds)
         wrap_with_ranges(stmts, skipExpr, takeExpr, skipWhileCond, takeWhileCond, names)
         loopBody = prepend_precond(wrap_with_condition(stmts_to_expr(stmts), whereCond), preCondStmts)
+    } elif (lastName == "to_table") {
+        // to_table materializer — the array arm's skeleton with a table buffer and key inserts.
+        // Map form splits a `(k => v)` projection so each side evaluates once; other projections
+        // bind to a local first. Duplicate keys keep the last occurrence (das insert semantics).
+        var termCall = c.single["term"]
+        // null type, or the selector-based to_table(key, elementSelector) — keep the tier-2 path
+        if (termCall._type == null || length(termCall.arguments) != 1) return null
+        var tabType = strip_const_ref(clone_type(termCall._type))
+        let isSet = tabType.secondType == null || tabType.secondType.baseType == Type.tVoid
+        var stmts : array<Expression?>
+        var pushExpr : Expression?
+        if (isSet) {
+            var keyExpr = projection != null ? projection : qmacro($i(itName))
+            pushExpr = qmacro_expr() {
+                $i(accName) |> insert($e(keyExpr))
+            }
+        } elif (projection == null) {
+            // spell the kv access with the element tuple's real field names — the kv usage-pruner
+            // maps named fields (`.key`/`.value` on an each_kv lane); positional `._0` would not bind
+            var f0 = "_0"
+            var f1 = "_1"
+            let elemTupT = top._type.firstType
+            if (elemTupT != null && elemTupT.argNames |> length == 2) {
+                f0 = string(elemTupT.argNames[0])
+                f1 = string(elemTupT.argNames[1])
+            }
+            pushExpr = qmacro_expr() {
+                $i(accName)[$i(itName).$f(f0)] := $i(itName).$f(f1)
+            }
+        } else {
+            var proj = projection
+            if (proj is ExprRef2Value) {
+                proj = (proj as ExprRef2Value).subexpr
+            }
+            if (proj is ExprMakeTuple) {
+                var mt = proj as ExprMakeTuple
+                if (mt.values |> length != 2) return null
+                var keyExpr = mt.values[0]
+                var valExpr = mt.values[1]
+                pushExpr = qmacro_expr() {
+                    $i(accName)[$e(keyExpr)] := $e(valExpr)
+                }
+            } else {
+                let kvbName = qn("kvb", at)
+                var bindInit = qmacro_expr() {
+                    let $i(kvbName) := $e(projection)
+                }
+                var bindInsert = qmacro_expr() {
+                    $i(accName)[$i(kvbName)._0] := $i(kvbName)._1
+                }
+                var bindStmts : array<Expression?> <- [bindInit, bindInsert]
+                pushExpr = stmts_to_expr(bindStmts)
+            }
+        }
+        stmts |> push(wrap_with_condition(pushExpr, postTakeWhereCond))
+        prepend_binds(stmts, intermediateBinds)
+        wrap_with_ranges(stmts, skipExpr, takeExpr, skipWhileCond, takeWhileCond, names)
+        var perElementPush = stmts_to_expr(stmts)
+        var prologueStmts : array<Expression?>
+        append_ranges_prelude(prologueStmts, skipExpr, takeExpr, skipWhileCond, names)
+        // Reserve only on an unfiltered walk — a where-thinned table over-reserves hash buckets
+        // (worse than an array's slack), so the gate is stricter than the array arm's.
+        var reserveStmts : array<Expression?>
+        if (ctx.src->can_reserve_by_length() && whereCond == null && postTakeWhereCond == null) {
+            let rtop = ctx.src->arrayTop()
+            if (rtop != null && rtop._type != null && type_has_length(rtop._type)) {
+                if (takeExpr != null) {
+                    reserveStmts |> push <| qmacro_expr() {
+                        $i(accName) |> reserve($e(takeExpr) < length($i(srcName)) ? $e(takeExpr) : length($i(srcName)))
+                    }
+                } else {
+                    reserveStmts |> push <| qmacro_expr() {
+                        $i(accName) |> reserve(length($i(srcName)))
+                    }
+                }
+            }
+        }
+        var bufDecl = qmacro_expr() {
+            var $i(accName) : $t(tabType)
+        }
+        var tailStmts : array<Expression?>
+        tailStmts |> push(buffer_return(accName, false))
+        return emit_fold_array_lane(FoldArraySpec(
+            bufDeclStmt = bufDecl,
+            bufElemType = elementType,
+            bufName = accName,
+            prologueStmts <- prologueStmts,
+            reserveStmts <- reserveStmts,
+            preCondStmts <- preCondStmts,
+            whereCond = whereCond,
+            perElementPush = perElementPush,
+            tailStmts <- tailStmts,
+            wrapIter = false
+        ), ctx.src, at)
     } else {
         // Range-prelude in prologueStmts so the lane emits it BEFORE bufDecl — matches the emit_array_lane shape.
         var stmts : array<Expression?>
diff --git a/daslib/linq_fold_decs.das b/daslib/linq_fold_decs.das
index 67076715a..1d6b661e4 100644
--- a/daslib/linq_fold_decs.das
+++ b/daslib/linq_fold_decs.das
@@ -324,6 +324,8 @@ def emit_loop_or_count_lane_decs(var bridge : DecsBridgeShape?; tupName : string
         if (rangeInfo.postTakeWhereCond == null) return null
     }
     // Terminator classification mirrors plan_decs_unroll imperative (linq_fold.das:5781-5791); differs from classify_terminator's 4-lane split because decs has dedicated min_max_by / walk / element_at emit fns with hoisted state.
+    // to_table is not implemented for decs — decline before the implicit-to_array arm mis-emits an array for a table-typed expr.
+    if (termName == "to_table") return null
     let isAccum = (termName == "count" || termName == "long_count" || termName == "sum"
         || termName == "min" || termName == "max" || termName == "average")
     let isEarlyExit = (termName == "first" || termName == "first_or_default"
diff --git a/doc/source/reference/linq_fold_patterns.rst b/doc/source/reference/linq_fold_patterns.rst
index c7ef01e48..fb22a137e 100644
--- a/doc/source/reference/linq_fold_patterns.rst
+++ b/doc/source/reference/linq_fold_patterns.rst
@@ -150,7 +150,7 @@ Source-side entry points
      - Optional source — only when the ``pugixml`` module is linked (``require ?pugixml`` + ``static_if (typeinfo builtin_module_exists(pugixml))``). Emits an inlined DOM child-element walk replacing the generator, and **field-prunes** the per-element materialization (pass 2b): the chain body is scanned for the ``Row`` fields it reads, and only those attributes are read via ``read_xml_field`` into scalar locals — unread fields (notably ``string`` fields, whose ``clone_string`` is the alloc cost) are never touched, so a float-only chain runs alloc-free and JIT beats the equivalent SQLite query. A whole-row escape (``to_array`` / identity ``_select(_)`` / pass-to-fn) routes to the full ``build_xml_row`` instead. The ``XmlAdapter`` **rides every pattern row** (``try_splice_patterns`` runs with no ``onlyRow`` restriction); per-row ``requires`` predicates and the adapter's capability hooks (``can_join`` / ``can_group_by`` / ``defers_materialization`` / the ``non_array_source`` gate) decide what fuses, and a shape it can't fuse cascades to tier-2 — see :ref:`linq_fold_xml_patterns` for the full fuse/defer breakdown. ``unsafe`` is required (the source is ``[unsafe_outside_of_for]``) and the node is passed by value (``var root`` — ``_fold``'s macro-arg inference skips the const&→value copy).
    * - ``unsafe(each_kv(tab))`` / ``keys(tab)`` / ``values(tab)``
      - ``extract_table_source`` (``TableAdapter``, ``daslib/linq_fold_table.das``)
-     - In-tree source — recognized by name **plus** a table-typed argument (``table<K;V>`` / ``table<K>``), so an unrelated user ``keys`` never fires it. The kv lane (``each_kv``) binds ``kv.key`` / ``kv.value`` and **usage-prunes the walk**: a chain touching only ``.value`` walks ``values(tab)`` alone, only ``.key`` (or neither) walks ``keys(tab)`` alone — half the slot-skip work of the zipped two-iterator form, which is emitted only when both sides (or the whole pair) are read. A whole-pair escape binds a named-tuple copy, so the kv lane fuses **copyable value types only** — a non-copyable-valued ``each_kv`` falls through and the surviving instantiation concept-asserts (error 31400; the keys/values lanes still fuse such tables). Bare ``count()`` / ``long_count()`` folds to O(1) ``length(tab)``; a plain ``distinct`` over raw keys/kv elements is **dropped** before matching (keys are unique by construction; only uniqueness-preserving prefix ops allow the drop — a preceding ``select`` keeps the distinct, and the values lane always keeps it). **Point-lookup folds** (``try_table_point_lookup``): a key-equality ``where`` (``kv.key == X``, bare ``k == X`` on the keys lane, either operand order; predicate-form ``any(p)`` / ``count(p)`` too) against a loop-invariant, side-effect-free ``X`` folds the whole walk to an O(1) probe — ``any`` / keys-lane ``contains(X)`` → ``key_exists(tab, X)``, ``count`` → ``key_exists ? 1 : 0``, ``first`` / ``first_or_default`` (± one trailing ``select``) → a ``tab?[X]`` probe with the scan's exact semantics (panic on a missing ``first``, eagerly-bound default value otherwise). Anything else — compound ``&&`` predicates, other comparison operators, an ``X`` that reads the binder or has side effects (the scan evaluates ``X`` per element, the probe once) — keeps the scan. ``order_by`` / ``take`` / ``first`` observe the table's unspecified slot order, exactly like a hand ``for (k, v in keys(t), values(t))`` loop. **Joins fuse on either side** (``can_join`` is on; the adapter rides the shared ``emit_array_join`` through its own ``wrap_source_loop``): a table *lead* walks its pruned slot iterator(s) as the probe loop; a table in the *srcB slot* joined on its bare key — ``d.key`` on the kv lane, the bare element on a ``keys(set)`` source — skips the join's internal ``table<KEY; array<TUPB>>`` entirely and probes the user's table per lead row (``join_keyb_is_bare_key`` + ``build_join_probe_pieces``; unique table keys make the probe ≡ hash semantics exactly). The probe is itself usage-pruned: count-no-where and key-only shapes stay on ``key_exists``, value shapes bind the matched value **by reference** from a ``tab?[k]`` pointer (no copy), and only a whole-pair use binds the kv tuple. A non-bare b-key keeps the hashed build over the kv iterator; ``group_join`` (outer — its result consumes the whole bucket) always keeps it. ``can_group_by`` is off and reverse has no backward slot walk — those shapes cascade to tier-2 (see ``benchmarks/sql/LINQ_TO_TABLE.md``). ``unsafe`` is required at an unfused chain head (the sources are ``[unsafe_outside_of_for]``); fused chains rewrite the head before inference.
+     - In-tree source — recognized by name **plus** a table-typed argument (``table<K;V>`` / ``table<K>``), so an unrelated user ``keys`` never fires it. The kv lane (``each_kv``) binds ``kv.key`` / ``kv.value`` and **usage-prunes the walk**: a chain touching only ``.value`` walks ``values(tab)`` alone, only ``.key`` (or neither) walks ``keys(tab)`` alone — half the slot-skip work of the zipped two-iterator form, which is emitted only when both sides (or the whole pair) are read. A whole-pair escape binds a named-tuple copy, so the kv lane fuses **copyable value types only** — a non-copyable-valued ``each_kv`` falls through and the surviving instantiation concept-asserts (error 31400; the keys/values lanes still fuse such tables). Bare ``count()`` / ``long_count()`` folds to O(1) ``length(tab)``; a plain ``distinct`` over raw keys/kv elements is **dropped** before matching (keys are unique by construction; only uniqueness-preserving prefix ops allow the drop — a preceding ``select`` keeps the distinct, and the values lane always keeps it). **Point-lookup folds** (``try_table_point_lookup``): a key-equality ``where`` (``kv.key == X``, bare ``k == X`` on the keys lane, either operand order; predicate-form ``any(p)`` / ``count(p)`` too) against a loop-invariant, side-effect-free ``X`` folds the whole walk to an O(1) probe — ``any`` / keys-lane ``contains(X)`` → ``key_exists(tab, X)``, ``count`` → ``key_exists ? 1 : 0``, ``first`` / ``first_or_default`` (± one trailing ``select``) → a ``tab?[X]`` probe with the scan's exact semantics (panic on a missing ``first``, eagerly-bound default value otherwise). Anything else — compound ``&&`` predicates, other comparison operators, an ``X`` that reads the binder or has side effects (the scan evaluates ``X`` per element, the probe once) — keeps the scan. ``order_by`` / ``take`` / ``first`` observe the table's unspecified slot order, exactly like a hand ``for (k, v in keys(t), values(t))`` loop. **Joins fuse on either side** (``can_join`` is on; the adapter rides the shared ``emit_array_join`` through its own ``wrap_source_loop``): a table *lead* walks its pruned slot iterator(s) as the probe loop; a table in the *srcB slot* joined on its bare key — ``d.key`` on the kv lane, the bare element on a ``keys(set)`` source — skips the join's internal ``table<KEY; array<TUPB>>`` entirely and probes the user's table per lead row (``join_keyb_is_bare_key`` + ``build_join_probe_pieces``; unique table keys make the probe ≡ hash semantics exactly). The probe is itself usage-pruned: count-no-where and key-only shapes stay on ``key_exists``, value shapes bind the matched value **by reference** from a ``tab?[k]`` pointer (no copy), and only a whole-pair use binds the kv tuple. A non-bare b-key keeps the hashed build over the kv iterator; ``group_join`` (outer — its result consumes the whole bucket) always keeps it. ``can_group_by`` is off and reverse has no backward slot walk — those shapes cascade to tier-2 (see ``benchmarks/sql/LINQ_TO_TABLE.md``). **``to_table()`` sinks fuse** (table-buffer materializer row above): the chain inserts straight into the result table — a bare ``each_kv(tab).to_table()`` is a reserve-ahead table clone through the fused walk, and a ``keys(tab)`` chain lands in the ``table<K>`` set form. ``unsafe`` is required at an unfused chain head (the sources are ``[unsafe_outside_of_for]``); fused chains rewrite the head before inference.
    * - ``unsafe(from_json(jv, type<Row>))``
      - ``extract_json_source`` (``JsonAdapter``, ``daslib/linq_fold_json.das``)
      - In-tree source — the adapter is compiled in unconditionally (no ``static_if`` gate, unlike XML's pugixml one), but a program only pulls JSON into scope by requiring ``json`` / ``json_boost`` itself. ``extract_json_source`` matches a ``from_json`` whose first argument is a ``json::JsonValue?``, so a JSON-less program returns null and the chain falls to the array tier. The adapter pulls in **no** json dependency — it emits ``from_json`` / ``read_json_field`` by name (resolved at the user's splice site, like ``linq_fold_decs`` emits ``for_each_archetype``; ``from_JV`` is emitted only for a non-struct element type). Emits an inlined ``for (e in jv.value as _array)`` walk replacing the generator, and **field-prunes** the per-element materialization (pass 2b): only the keys the chain reads are pulled via ``read_json_field`` by name — unread keys (notably ``string`` fields whose materialization clones) are never touched, so a scalar-only chain skips ~all of the full per-row build (3.6× over the full materialize — see ``benchmarks/micro/json_source_shapes.das``). A whole-row escape reads **every** top-level field by name (``emit_full_row_by_name``), so a custom whole-row ``from_JV(Row)`` override is **not** honored (Option B — this is a flat query source, not a deserializer; materialize the array with an explicit ``from_JV`` first for that). ``unsafe`` is required (the source is ``[unsafe_outside_of_for]``). Deferred materialization mirrors XML: order/distinct/take buffer a cheap ``(orderKey, JsonValue?)`` surrogate and materialize only the K survivors — by name (``emit_full_row_by_name``), so a struct survivor reads each field by key; only a non-struct ``Row`` falls back to ``outBind <- from_JV(handle, type<Row>)``. The ``JsonAdapter`` also fuses ``join`` / ``join |> group_by`` (``emit_join_hook`` + ``JsonJoinAdapter`` off ``build_group_by_adapter``'s upstream-join arm), reusing the array-join machinery (``build_join_standalone_pieces`` / ``build_join_adapter_pieces``): srcB is collected into a ``table<KEY; array<TUPB>>`` and the field-pruned array walk is the probe side, so the join key reads only its own field per element (e.g. ``read_json_field(jcur, "brand", …)``). Standalone ``group_join`` and a trailing ``where`` / ``select`` / ``count`` over group-join rows defer to tier-2, mirroring XML.
@@ -192,6 +192,9 @@ Array-source patterns
    * - ``._where(P).take_while(P2).<...>`` / ``.skip_while(P2).<...>``
      - ``plan_loop_or_count`` (predicate-driven ranges)
      - ``take_while`` exits on first non-match; ``skip_while`` toggles state.
+   * - ``._where(P)._select(K => V).to_table()`` (and bare / set forms)
+     - ``plan_loop_or_count`` (table-buffer materializer)
+     - Insert-loop straight into the result table — no intermediate array. A ``(k => v)`` tuple projection splits so key and value each evaluate once; other tuple projections bind to a local; a scalar chain lands in the ``table<K>`` set form. Reserve from O(1) source length on unfiltered walks. Duplicate keys keep the last occurrence (das ``insert`` semantics, not C#'s throw). The selector-based ``to_table(key, elementSelector)`` and decs sources keep the tier-2 path.
    * - ``._order_by(K).first()`` / ``.first_or_default()``
      - ``plan_order_family`` (streaming-min) → ``emit_streaming_min``
      - Single ``var best`` + ``var seen``, no buffer; one comparison per element.
@@ -676,8 +679,9 @@ Common cases that fall back:
   ``join_impl``.
 - **Aggregations on lazy groupings**: ``_group_by_lazy(K)._select(F)``
   with a non-bucket-reducing ``_select``.
-- **Materialization-only chains** that the standard linq surface
-  already lowers efficiently — e.g. ``to_table()`` on a finite array.
+- **Selector-based ``to_table(key, elementSelector)``** — the 3-arg form
+  keeps its tier-2 generic; only the selector-free ``to_table()``
+  terminator splices (see the table-buffer materializer row above).
 - **Chained ``_select(f) |> _select(g)`` with an impure inner**
   (``_ % N``, ``_ / N``, user-call inner that the typer can't prove
   pure). The ``collapse_chained_selects`` pre-pass is gated on
diff --git a/skills/linq.md b/skills/linq.md
index 8be49b8db..675c06be0 100644
--- a/skills/linq.md
+++ b/skills/linq.md
@@ -98,6 +98,30 @@ There are several tripwires to know about — they're not arbitrary, they fall o
 - **String `join(arr, sep)` lives in `strings` / `strings_boost`** — `linq` itself has a different `join` (SQL-style two-iterator inner join with key + result projection). They coexist; the typer picks the right one by argument types. If you see "module strings_boost is not visible" and "missing argument blk" pointed at your join call, you're missing `require daslib/strings_boost` (or `require strings`).
 - **The `_` placeholder is local to the closest enclosing `_<op>(...)`.** If you nest, give inner closures explicit names (`@@(x) => ...`). Don't try to shadow `_` between outer and inner shorthand calls.
 
+### Table sources and the `to_table` sink
+
+A `table<K;V>` (or `table<K>` set) is a first-class chain source — no key/value arrays needed:
+
+```das
+// each_kv yields (key, value) named tuples; keys/values give one lane.
+// Wrap the head in unsafe(...) — the sources are [unsafe_outside_of_for].
+let pricey = _fold(unsafe(each_kv(cars)) |> _where(_.value.price > 500) |> count())
+let ids <- _fold(unsafe(keys(cars)) |> _where(_ > 100) |> to_array())
+
+// to_table() lands a chain in a table: kv (or any (k => v) tuple) chain → table<K;V>,
+// scalar chain → table<K> set. Duplicate keys keep the last occurrence.
+var byId <- _fold(each(orders) |> _select(_.id => _.total) |> to_table())
+var index <- _fold(unsafe(each_kv(cars)) |> _where(_.value.in_stock) |> to_table())
+```
+
+The fused emitter walks only the iterators the chain touches (a `.value`-only chain never
+touches keys), folds `where(kv.key == X) … first/any/count` to an O(1) probe, joins on a bare
+table key by probing the table instead of hashing, and inserts straight into the `to_table`
+result with no intermediate array. `%linq!` queries dispatch table sources automatically
+(`from kv in tab`). Slot order is unspecified — don't write order-sensitive expectations over
+table chains. The 3-arg `to_table(it, keyBlock, elementSelectorBlock)` ToDictionary form also
+exists (tier-2 only). Full pattern reference: `doc/source/reference/linq_fold_patterns.rst`.
+
 ## Don't mix styles
 
 Pick **one** style per transformation and stay in it:
diff --git a/tests/linq/test_linq_table_source.das b/tests/linq/test_linq_table_source.das
index ed0330600..8900c05f2 100644
--- a/tests/linq/test_linq_table_source.das
+++ b/tests/linq/test_linq_table_source.das
@@ -280,6 +280,10 @@ def test_table_point_lookup(t : T?) {
 
 typedef IKV = tuple<key : int; value : int>
 
+def flip_kv(kv : IKV) : tuple<int; int> {
+    return (kv.value => kv.key)
+}
+
 // Joins: a table in the srcB slot joined on its bare key (`d.key` / bare set element) probes the user's
 // table instead of building the join's internal hash; a table lead rides the same emitter through the
 // pruned slot walk. Either way must agree with the hash/hand-loop semantics: inner joins drop misses,
@@ -494,3 +498,124 @@ def test_each_kv_tier2(t : T?) {
         t |> equal(n, 3)
     }
 }
+
+[test]
+def test_to_table_sink(t : T?) {
+    t |> run("kv pass-through + where agrees with a hand loop") @(t : T?) {
+        var src <- make_int_table(6)
+        var expected : table<int; int>
+        for (k, v in keys(src), values(src)) {
+            if (k > 1) {
+                expected[k] = v
+            }
+        }
+        var got <- _fold(each_kv(src) |> _where(_.key > 1) |> to_table())
+        t |> equal(length(got), length(expected))
+        for (k, v in keys(expected), values(expected)) {
+            t |> equal(got?[k] ?? -1, v)
+        }
+        delete got
+        delete expected
+        delete src
+    }
+    t |> run("projection remaps keys and values") @(t : T?) {
+        var src <- make_int_table(5)
+        var got <- _fold(each_kv(src) |> _select((_.key * 2) => _.value + 1) |> to_table())
+        t |> equal(length(got), 5)
+        t |> equal(got?[8] ?? -1, 41)
+        delete got
+        delete src
+    }
+    t |> run("bare to_table clones the table through the fused walk") @(t : T?) {
+        var src <- make_int_table(5)
+        var got <- _fold(each_kv(src) |> to_table())
+        t |> equal(length(got), 5)
+        t |> equal(got?[0] ?? -1, 0)
+        t |> equal(got?[4] ?? -1, 40)
+        delete got
+        delete src
+    }
+    t |> run("keys chain lands in the set form") @(t : T?) {
+        var src <- make_int_table(5)
+        var got <- _fold(keys(src) |> _where(_ != 3) |> to_table())
+        t |> equal(length(got), 4)
+        t |> success(key_exists(got, 0) && !key_exists(got, 3))
+        delete got
+        delete src
+    }
+    t |> run("array source with a tuple projection") @(t : T?) {
+        var arr <- [for (i in range(4)); i + 1]
+        var got <- _fold(arr |> _select(_ => _ * _) |> to_table())
+        t |> equal(length(got), 4)
+        t |> equal(got?[3] ?? -1, 9)
+        delete got
+        delete arr
+    }
+    t |> run("duplicate keys keep the last occurrence") @(t : T?) {
+        var arr <- [for (i in range(6)); i]
+        var got <- _fold(arr |> _select((_ % 2) => _) |> to_table())
+        t |> equal(length(got), 2)
+        t |> equal(got?[0] ?? -1, 4)
+        t |> equal(got?[1] ?? -1, 5)
+        delete got
+        delete arr
+    }
+    t |> run("take bounds the walk and the reserve") @(t : T?) {
+        var src <- make_int_table(5)
+        var got <- _fold(keys(src) |> take(2) |> to_table())
+        t |> equal(length(got), 2)
+        delete got
+        delete src
+    }
+    t |> run("non-tuple-literal projection rides the bind arm") @(t : T?) {
+        var src <- make_int_table(5)
+        var got <- _fold(each_kv(src) |> _select(flip_kv(_)) |> to_table())
+        t |> equal(length(got), 5)
+        t |> equal(got?[40] ?? -1, 4)
+        delete got
+        delete src
+    }
+    t |> run("string keys through the fused map arm") @(t : T?) {
+        var src <- make_int_table(4)
+        var got <- _fold(each_kv(src) |> _select(("k{_.key}" => _.value)) |> to_table())
+        t |> equal(length(got), 4)
+        t |> equal(got?["k2"] ?? -1, 20)
+        delete got
+        delete src
+    }
+    t |> run("tier-2 iterator to_table agrees with the fused emit") @(t : T?) {
+        var src <- make_int_table(6)
+        var fused <- _fold(each_kv(src) |> _where(_.key % 2 == 0) |> to_table())
+        var tier2 <- to_table(unsafe(each_kv(src)) |> _where(_.key % 2 == 0))
+        t |> equal(length(fused), length(tier2))
+        for (k, v in keys(tier2), values(tier2)) {
+            t |> equal(fused?[k] ?? -1, v)
+        }
+        delete fused
+        delete tier2
+        delete src
+    }
+    t |> run("array-input to_table forms borrow the source") @(t : T?) {
+        var pairs <- [for (i in range(3)); (i => i * i)]
+        var m <- to_table(pairs)
+        t |> equal(length(pairs), 3)   // source intact
+        t |> equal(length(m), 3)
+        t |> equal(m?[2] ?? -1, 4)
+        var bare <- [for (i in range(4)); i * 100]
+        var s <- to_table(bare)
+        t |> equal(length(bare), 4)
+        t |> success(key_exists(s, 300))
+        delete s
+        delete bare
+        delete m
+        delete pairs
+    }
+    t |> run("selector-based to_table keeps its tier-2 path") @(t : T?) {
+        var arr <- [for (i in range(4)); i]
+        var got <- to_table(to_sequence(arr), $(x : int) => x, $(x : int) => x * 10)
+        t |> equal(length(got), 4)
+        t |> equal(got?[3] ?? -1, 30)
+        delete got
+        delete arr
+    }
+}

From 9331bbc2ce26016528b061008a8199b3cfdcd6a3 Mon Sep 17 00:00:00 2001
From: Boris Batkin <bbatkin@gmail.com>
Date: Thu, 11 Jun 2026 09:14:37 -0700
Subject: [PATCH 10/11] bench: fill the m7 column + light up to_table across
 all in-memory sources
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The m7 column had 26 empty cells; only 7 were principled (zip_* x4 / cross_join —
lockstep pairing over an unordered slot walk is meaningless; select_many — flat
fixture; decs_count_bare_pred — decs-only). The rest were scoping debt:

- join_select / where_join_count — fuse today via the stage-5 join work; lanes
  simply hadn't been written. where_join_count lands at 46.8 ns/elem INTERP
  (lead-where pruned join); join_select 222.9 (iterator-typed join bail, tier-2).
- 12 groupby_* + join_groupby_count/to_array + order_reverse_normalized /
  reverse_take_select / reverse_distinct_by — instantiated as tier-2-cascade
  cells (table group_by fusion and a backward slot walk are named deferred
  edges); the cells now show the cost a future fix would improve.

to_table / to_table_staged gain m3f/m4/m5f/m6f lanes (only SQL stays absent —
_sql has no table sink): array fuses at 18.7 vs 54.8 staged (~3x), XML 118.2 vs
144.8, JSON 144.3 vs 166.8; decs declines by design and its 144.0 vs 56.8
staged gap is the motivating number for a future decs sink hook.

results.md re-swept (all 82 families, m7 dashes 26 -> 7); missing-lanes prose
rewritten to match.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 benchmarks/sql/array.das  |  27 ++++
 benchmarks/sql/decs.das   |  27 ++++
 benchmarks/sql/json.das   |  27 ++++
 benchmarks/sql/results.md | 294 +++++++++++++++++++-------------------
 benchmarks/sql/table.das  | 284 ++++++++++++++++++++++++++++++++++++
 benchmarks/sql/xml.das    |  27 ++++
 6 files changed, 539 insertions(+), 147 deletions(-)

diff --git a/benchmarks/sql/array.das b/benchmarks/sql/array.das
index 882eabcef..5259714e0 100644
--- a/benchmarks/sql/array.das
+++ b/benchmarks/sql/array.das
@@ -942,6 +942,33 @@ def to_array_filter_m3f(b : B?) {
     }
 }
 
+[benchmark]
+def to_table_m3f(b : B?) {
+    // fused insert-loop sink: the chain lands straight in the result table
+    b |> run("to_table", N) {
+        var tab <- _fold(each(g_arr) |> _select((_.id => _.price)) |> to_table())
+        b |> accept(length(tab))
+        if (empty(tab)) {
+            b->failNow()
+        }
+        delete tab
+    }
+}
+
+[benchmark]
+def to_table_staged_m3f(b : B?) {
+    // staged baseline: materialize the kv tuples to an array, then convert
+    b |> run("to_table_staged", N) {
+        var rows <- _fold(each(g_arr) |> _select((_.id => _.price)) |> to_array())
+        var tab <- to_table_move(rows)
+        b |> accept(length(tab))
+        if (empty(tab)) {
+            b->failNow()
+        }
+        delete tab
+    }
+}
+
 [benchmark]
 def where_join_count_m3f(b : B?) {
     b |> run("where_join_count", N) {
diff --git a/benchmarks/sql/decs.das b/benchmarks/sql/decs.das
index 8c44b5fd8..8ad4ff833 100644
--- a/benchmarks/sql/decs.das
+++ b/benchmarks/sql/decs.das
@@ -914,6 +914,33 @@ def to_array_filter_m4(b : B?) {
     }
 }
 
+[benchmark]
+def to_table_m4(b : B?) {
+    // fused insert-loop sink: the chain lands straight in the result table
+    b |> run("to_table", N) {
+        var tab <- _fold(from_decs_template(type<DecsCar>) |> _select((_.id => _.price)) |> to_table())
+        b |> accept(length(tab))
+        if (empty(tab)) {
+            b->failNow()
+        }
+        delete tab
+    }
+}
+
+[benchmark]
+def to_table_staged_m4(b : B?) {
+    // staged baseline: materialize the kv tuples to an array, then convert
+    b |> run("to_table_staged", N) {
+        var rows <- _fold(from_decs_template(type<DecsCar>) |> _select((_.id => _.price)) |> to_array())
+        var tab <- to_table_move(rows)
+        b |> accept(length(tab))
+        if (empty(tab)) {
+            b->failNow()
+        }
+        delete tab
+    }
+}
+
 [benchmark]
 def where_join_count_m4(b : B?) {
     b |> run("where_join_count", N) {
diff --git a/benchmarks/sql/json.das b/benchmarks/sql/json.das
index 6533ae6d8..b5e5acc45 100644
--- a/benchmarks/sql/json.das
+++ b/benchmarks/sql/json.das
@@ -894,6 +894,33 @@ def to_array_filter_m6f(b : B?) {
     }
 }
 
+[benchmark]
+def to_table_m6f(b : B?) {
+    // fused insert-loop sink: the chain lands straight in the result table
+    b |> run("to_table", N) {
+        var tab <- _fold(unsafe(from_json(g_jv, type<Car>)) |> _select((_.id => _.price)) |> to_table())
+        b |> accept(length(tab))
+        if (empty(tab)) {
+            b->failNow()
+        }
+        delete tab
+    }
+}
+
+[benchmark]
+def to_table_staged_m6f(b : B?) {
+    // staged baseline: materialize the kv tuples to an array, then convert
+    b |> run("to_table_staged", N) {
+        var rows <- _fold(unsafe(from_json(g_jv, type<Car>)) |> _select((_.id => _.price)) |> to_array())
+        var tab <- to_table_move(rows)
+        b |> accept(length(tab))
+        if (empty(tab)) {
+            b->failNow()
+        }
+        delete tab
+    }
+}
+
 [benchmark]
 def where_join_count_m6f(b : B?) {
     b |> run("where_join_count", N) {
diff --git a/benchmarks/sql/results.md b/benchmarks/sql/results.md
index 30402ff49..bb13ccbc7 100644
--- a/benchmarks/sql/results.md
+++ b/benchmarks/sql/results.md
@@ -36,175 +36,175 @@ signal, JIT deltas as indicative.**
 
 | Benchmark | SQL (m1) | Array (m3f) | Decs (m4) | XML fold (m5f) | JSON fold (m6f) | Table fold (m7) |
 |---|---:|---:|---:|---:|---:|---:|
-| `aggregate_match` | 34.9 | 5.9 | 5.8 | 60.8 | 158.9 | 19.5 |
-| `all_match` | 27.7 | 3.5 | 3.4 | 56.0 | 153.2 | 15.9 |
+| `aggregate_match` | 35.0 | 5.9 | 5.9 | 60.5 | 159.7 | 19.0 |
+| `all_match` | 27.7 | 3.5 | 3.4 | 56.1 | 153.8 | 15.8 |
 | `any_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
-| `average_aggregate` | 30.3 | 5.9 | 8.7 | 58.5 | 163.4 | 17.3 |
+| `average_aggregate` | 30.1 | 6.0 | 8.8 | 60.1 | 163.7 | 17.2 |
 | `bare_last` | — | 4.2 | 0.0 | 0.0 | 4.2 | 30.1 |
-| `bare_order_where` | 282.9 | 118.2 | 125.0 | 300.5 | 290.8 | 163.1 |
-| `chained_select_collapse` | — | 17.8 | 17.5 | 70.4 | 161.7 | 27.7 |
-| `chained_where` | 41.5 | 6.6 | 7.1 | 104.8 | 182.1 | 24.0 |
-| `contains_match` | 0.0 | 2.2 | 1.4 | 28.9 | 71.5 | 6.6 |
-| `count_aggregate` | 29.6 | 4.3 | 4.1 | 63.5 | 154.0 | 20.2 |
-| `cross_join` | 12896.3 | 3681.4 | — | 4018.5 | 4096.4 | — |
-| `decs_count_bare_pred` | — | — | 4.1 | — | — | — |
-| `distinct_by_count` | 41.4 | 15.7 | 15.7 | 70.4 | 161.3 | 26.8 |
-| `distinct_by_order_take` | 239.9 | 22.3 | 23.3 | 123.9 | 162.0 | 48.8 |
-| `distinct_by_order_to_array` | 237.8 | 22.3 | 23.3 | 124.3 | 162.5 | 48.8 |
-| `distinct_count` | 41.8 | 15.9 | 15.7 | 70.7 | 161.8 | 27.0 |
-| `distinct_count_pred` | 252.1 | 15.8 | 15.6 | 111.9 | 176.7 | 26.6 |
+| `bare_order_where` | 278.1 | 117.1 | 126.5 | 302.8 | 288.8 | 163.0 |
+| `chained_select_collapse` | — | 17.9 | 17.6 | 70.7 | 172.6 | 27.9 |
+| `chained_where` | 36.9 | 6.6 | 7.1 | 105.4 | 183.4 | 23.8 |
+| `contains_match` | 0.0 | 2.2 | 1.4 | 29.0 | 72.4 | 6.5 |
+| `count_aggregate` | 29.7 | 4.2 | 4.1 | 63.6 | 154.3 | 20.1 |
+| `cross_join` | 12597.0 | 3721.0 | — | 4040.3 | 4113.3 | — |
+| `decs_count_bare_pred` | — | — | 4.2 | — | — | — |
+| `distinct_by_count` | 41.6 | 16.4 | 15.8 | 70.8 | 162.9 | 26.9 |
+| `distinct_by_order_take` | 241.1 | 22.1 | 23.7 | 124.2 | 162.4 | 49.2 |
+| `distinct_by_order_to_array` | 241.0 | 22.2 | 23.8 | 125.0 | 163.2 | 48.9 |
+| `distinct_count` | 41.8 | 15.7 | 15.9 | 70.7 | 162.9 | 27.1 |
+| `distinct_count_pred` | 253.4 | 15.9 | 15.9 | 112.7 | 179.4 | 26.7 |
 | `distinct_take` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
 | `element_at_match` | 0.0 | 0.0 | 0.0 | 0.4 | 0.3 | 0.0 |
 | `first_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
 | `first_or_default_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
-| `groupby_average` | 171.0 | 29.4 | 29.0 | 123.0 | 196.4 | — |
-| `groupby_count` | 142.4 | 19.2 | 19.1 | 74.8 | 167.1 | 164.5 |
-| `groupby_first` | 251.1 | 19.2 | 19.7 | 72.1 | 162.2 | — |
-| `groupby_having_count` | 142.0 | 19.1 | 19.1 | 74.7 | 166.3 | — |
-| `groupby_having_hidden_sum` | 176.6 | 22.3 | 22.3 | 118.0 | 187.9 | — |
-| `groupby_having_post_where` | 173.2 | 20.5 | 20.4 | 114.4 | 187.4 | — |
-| `groupby_max` | 173.5 | 24.9 | 24.8 | 119.6 | 191.4 | — |
-| `groupby_min` | 173.8 | 25.3 | 24.8 | 119.6 | 192.5 | — |
-| `groupby_multi_reducer` | 190.5 | 30.4 | 30.0 | 124.7 | 196.1 | — |
-| `groupby_select_order` | 172.1 | 20.5 | 20.4 | 114.3 | 188.6 | — |
-| `groupby_select_sum` | 199.6 | 38.5 | 38.0 | 101.5 | 194.4 | — |
-| `groupby_sum` | 172.1 | 20.5 | 20.4 | 114.6 | 187.6 | 194.6 |
-| `groupby_where_count` | 76.4 | 14.1 | 14.2 | 115.1 | 185.8 | — |
-| `groupby_where_sum` | 87.5 | 14.2 | 14.5 | 116.0 | 186.7 | — |
-| `join_count` | 38.4 | 51.4 | 63.6 | 112.9 | 183.8 | 65.4 |
-| `join_groupby_count` | 158.4 | 77.8 | 87.8 | 177.4 | 233.1 | — |
-| `join_groupby_to_array` | 189.8 | 78.7 | 89.6 | 214.7 | 214.1 | — |
-| `join_probe` | — | — | — | — | — | 46.9 |
-| `join_probe_build` | — | — | — | — | — | 79.5 |
-| `join_select` | 151.8 | 72.8 | 84.9 | 189.5 | 217.4 | — |
-| `join_where_count` | 39.7 | 61.7 | 78.7 | 160.5 | 199.8 | 81.6 |
-| `last_match` | 0.0 | 5.9 | 14.0 | 65.0 | 159.2 | 31.0 |
-| `long_count_aggregate` | 29.9 | 4.1 | 4.1 | 63.4 | 154.0 | 20.1 |
-| `max_aggregate` | 31.1 | 6.0 | 6.8 | 58.7 | 162.1 | 16.9 |
-| `min_aggregate` | 31.0 | 6.0 | 6.9 | 58.7 | 162.9 | 17.0 |
-| `order_by_multi_key` | 340.9 | 270.9 | 279.5 | 459.2 | 446.7 | 336.4 |
-| `order_distinct_take` | 138.7 | 15.9 | 98.6 | 72.6 | 162.8 | 31.6 |
-| `order_reverse_normalized` | 38.8 | 16.3 | 19.8 | 70.9 | 169.9 | — |
-| `order_take_desc` | 38.5 | 16.3 | 19.9 | 70.1 | 170.8 | 33.3 |
+| `groupby_average` | 173.6 | 29.2 | 29.3 | 123.6 | 195.4 | 198.4 |
+| `groupby_count` | 144.5 | 19.2 | 19.2 | 75.0 | 168.4 | 164.3 |
+| `groupby_first` | 253.9 | 19.1 | 19.8 | 72.7 | 163.4 | 164.1 |
+| `groupby_having_count` | 142.6 | 19.2 | 19.2 | 75.4 | 168.9 | 186.7 |
+| `groupby_having_hidden_sum` | 176.8 | 22.2 | 22.9 | 118.6 | 192.0 | 216.5 |
+| `groupby_having_post_where` | 172.4 | 20.5 | 20.5 | 114.6 | 188.7 | 194.8 |
+| `groupby_max` | 175.4 | 24.9 | 25.2 | 120.0 | 192.4 | 202.4 |
+| `groupby_min` | 175.2 | 24.9 | 25.3 | 120.7 | 193.2 | 204.5 |
+| `groupby_multi_reducer` | 192.0 | 30.8 | 30.2 | 125.6 | 196.8 | 232.3 |
+| `groupby_select_order` | 172.8 | 20.5 | 20.5 | 115.2 | 188.1 | 195.3 |
+| `groupby_select_sum` | 199.8 | 38.7 | 38.7 | 102.3 | 193.5 | 191.2 |
+| `groupby_sum` | 176.8 | 20.5 | 20.5 | 114.9 | 188.1 | 194.5 |
+| `groupby_where_count` | 76.2 | 13.8 | 14.5 | 116.0 | 185.7 | 165.2 |
+| `groupby_where_sum` | 87.7 | 14.1 | 14.9 | 116.5 | 187.3 | 180.6 |
+| `join_count` | 38.0 | 51.3 | 64.7 | 113.3 | 183.5 | 66.0 |
+| `join_groupby_count` | 160.1 | 76.7 | 89.9 | 178.6 | 230.9 | 259.6 |
+| `join_groupby_to_array` | 194.1 | 78.4 | 91.4 | 216.3 | 212.7 | 290.0 |
+| `join_probe` | — | — | — | — | — | 46.6 |
+| `join_probe_build` | — | — | — | — | — | 79.9 |
+| `join_select` | 150.3 | 72.7 | 86.0 | 190.7 | 215.4 | 222.9 |
+| `join_where_count` | 39.1 | 61.6 | 79.4 | 161.2 | 198.1 | 80.1 |
+| `last_match` | 0.0 | 5.9 | 13.9 | 64.8 | 159.6 | 31.0 |
+| `long_count_aggregate` | 29.5 | 4.1 | 4.1 | 63.2 | 155.1 | 20.2 |
+| `max_aggregate` | 30.8 | 6.2 | 7.0 | 58.6 | 163.2 | 17.4 |
+| `min_aggregate` | 31.1 | 6.2 | 6.8 | 58.6 | 163.5 | 17.3 |
+| `order_by_multi_key` | 336.9 | 274.1 | 281.9 | 458.6 | 445.5 | 335.6 |
+| `order_distinct_take` | 140.6 | 15.9 | 99.4 | 72.3 | 163.8 | 31.6 |
+| `order_reverse_normalized` | 38.6 | 16.3 | 20.0 | 70.1 | 170.7 | 33.1 |
+| `order_take_desc` | 38.3 | 16.5 | 20.6 | 70.1 | 170.9 | 33.1 |
 | `point_lookup` | — | — | — | — | — | 0.0 |
-| `point_lookup_scan` | — | — | — | — | — | 8.3 |
-| `reverse_distinct_by` | 295.3 | 21.3 | 28.2 | 71.1 | 161.9 | — |
-| `reverse_take` | 0.1 | 0.0 | 0.2 | 0.0 | 26.1 | 58.5 |
-| `reverse_take_select` | 0.0 | 0.0 | 0.2 | 0.0 | 26.1 | — |
-| `select_count` | 0.1 | 0.0 | 2.2 | 68.3 | 2.2 | 0.0 |
-| `select_many` | — | 191.7 | — | — | — | — |
-| `select_where` | 204.1 | 11.2 | 19.3 | 197.1 | 183.4 | 37.7 |
-| `select_where_count` | 32.5 | 5.1 | 7.4 | 64.9 | 156.9 | 22.7 |
-| `select_where_order_take` | 37.1 | 12.3 | 14.8 | 72.8 | 165.4 | 35.3 |
-| `select_where_sum` | 37.1 | 7.5 | 7.5 | 66.5 | 161.9 | 25.0 |
-| `single_match` | 0.0 | 2.9 | 5.5 | 58.2 | 151.2 | 22.6 |
-| `skip_take` | 0.5 | 0.1 | 0.2 | 3.1 | 2.8 | 0.3 |
-| `skip_while_match` | 3.5 | 5.3 | 5.3 | 60.0 | 153.2 | 18.2 |
-| `sort_first` | 38.4 | 11.1 | 13.3 | 65.1 | 166.7 | 32.2 |
-| `sort_take` | 38.7 | 16.3 | 20.0 | 70.8 | 170.4 | 33.1 |
-| `sort_take_select` | 38.7 | 16.3 | 20.1 | 71.3 | 170.6 | 33.3 |
-| `sum_aggregate` | 30.5 | 2.1 | 2.1 | 54.6 | 153.2 | 13.4 |
-| `sum_where` | 33.2 | 4.3 | 4.2 | 63.4 | 154.6 | 20.4 |
-| `take_count` | 3.8 | 0.2 | 0.4 | 2.9 | 2.7 | 0.5 |
+| `point_lookup_scan` | — | — | — | — | — | 8.4 |
+| `reverse_distinct_by` | 308.2 | 21.2 | 27.9 | 70.8 | 163.1 | 44.6 |
+| `reverse_take` | 0.1 | 0.0 | 0.2 | 0.0 | 26.4 | 58.9 |
+| `reverse_take_select` | 0.0 | 0.0 | 0.2 | 0.0 | 26.3 | 58.6 |
+| `select_count` | 0.1 | 0.0 | 2.2 | 68.5 | 2.2 | 0.0 |
+| `select_many` | — | 192.1 | — | — | — | — |
+| `select_where` | 197.4 | 11.2 | 19.4 | 196.4 | 183.1 | 37.8 |
+| `select_where_count` | 32.6 | 5.1 | 7.4 | 64.4 | 157.5 | 22.8 |
+| `select_where_order_take` | 36.6 | 12.5 | 15.1 | 72.3 | 164.9 | 35.1 |
+| `select_where_sum` | 37.1 | 7.4 | 7.5 | 66.3 | 162.5 | 23.6 |
+| `single_match` | 0.0 | 2.8 | 5.4 | 58.0 | 151.0 | 22.8 |
+| `skip_take` | 0.5 | 0.1 | 0.2 | 3.0 | 2.8 | 0.3 |
+| `skip_while_match` | 3.4 | 5.3 | 5.3 | 59.9 | 153.2 | 18.3 |
+| `sort_first` | 37.9 | 11.1 | 13.4 | 65.2 | 166.1 | 32.2 |
+| `sort_take` | 38.3 | 16.3 | 20.4 | 70.3 | 171.0 | 33.1 |
+| `sort_take_select` | 38.3 | 16.3 | 20.2 | 70.7 | 170.5 | 33.2 |
+| `sum_aggregate` | 30.2 | 2.1 | 2.1 | 53.9 | 153.3 | 13.5 |
+| `sum_where` | 32.8 | 4.2 | 4.3 | 63.4 | 154.2 | 20.5 |
+| `take_count` | 3.6 | 0.2 | 0.4 | 2.9 | 2.7 | 0.5 |
 | `take_count_filtered` | 1.1 | 0.2 | 0.2 | 1.3 | 1.1 | 0.3 |
 | `take_sum_aggregate` | 0.8 | 0.1 | 0.1 | 0.6 | 0.5 | 0.1 |
 | `take_where_count` | 0.9 | 0.1 | 0.1 | 0.7 | 0.6 | 0.2 |
-| `take_while_match` | 7.8 | 2.4 | 2.4 | 30.1 | 76.2 | 16.4 |
-| `to_array_filter` | 71.1 | 11.8 | 11.7 | 71.3 | 164.3 | 28.9 |
-| `to_table` | — | — | — | — | — | 32.5 |
-| `to_table_staged` | — | — | — | — | — | 68.3 |
-| `where_join_count` | 41.5 | 28.8 | 40.9 | 132.1 | 167.0 | — |
-| `zip_count_pred` | 39.5 | 15.8 | — | 318.5 | 320.2 | — |
-| `zip_dot_product` | 47.2 | 12.6 | 10.8 | 312.7 | 318.6 | — |
-| `zip_dot_product_3arg` | 47.1 | 12.7 | — | 312.8 | 317.5 | — |
-| `zip_reverse_to_array` | — | 31.4 | — | 348.8 | 352.2 | — |
+| `take_while_match` | 7.8 | 2.4 | 2.4 | 30.1 | 75.7 | 16.4 |
+| `to_array_filter` | 70.3 | 11.8 | 11.8 | 70.9 | 163.7 | 29.0 |
+| `to_table` | — | 18.7 | 144.0 | 118.2 | 144.3 | 32.2 |
+| `to_table_staged` | — | 54.8 | 56.8 | 144.8 | 166.8 | 69.0 |
+| `where_join_count` | 41.2 | 29.1 | 41.8 | 131.7 | 167.5 | 46.8 |
+| `zip_count_pred` | 39.4 | 15.9 | — | 317.3 | 319.1 | — |
+| `zip_dot_product` | 46.6 | 12.7 | 10.6 | 314.0 | 316.5 | — |
+| `zip_dot_product_3arg` | 46.8 | 12.8 | — | 313.0 | 316.7 | — |
+| `zip_reverse_to_array` | — | 31.7 | — | 349.3 | 351.4 | — |
 
 ## JIT
 
 | Benchmark | SQL (m1) | Array (m3f) | Decs (m4) | XML fold (m5f) | JSON fold (m6f) | Table fold (m7) |
 |---|---:|---:|---:|---:|---:|---:|
-| `aggregate_match` | 35.1 | 0.3 | 0.6 | 22.8 | 26.2 | 13.5 |
-| `all_match` | 27.9 | 0.3 | 0.2 | 17.5 | 25.3 | 13.6 |
+| `aggregate_match` | 35.0 | 0.3 | 0.7 | 29.8 | 27.2 | 13.5 |
+| `all_match` | 27.9 | 0.3 | 0.2 | 18.8 | 26.2 | 13.5 |
 | `any_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
-| `average_aggregate` | 30.5 | 1.0 | 3.5 | 17.4 | 24.7 | 13.5 |
-| `bare_last` | — | 0.4 | 0.0 | 0.0 | 0.0 | 17.1 |
-| `bare_order_where` | 186.2 | 34.1 | 35.0 | 104.9 | 52.8 | 78.8 |
-| `chained_select_collapse` | — | 1.1 | 1.1 | 20.6 | 33.5 | 14.0 |
-| `chained_where` | 36.9 | 0.6 | 0.8 | 34.7 | 31.3 | 17.8 |
-| `contains_match` | 0.0 | 0.2 | 0.1 | 17.5 | 8.9 | 4.7 |
-| `count_aggregate` | 29.7 | 0.3 | 0.6 | 17.5 | 25.5 | 13.4 |
-| `cross_join` | 5965.9 | 731.0 | — | 833.2 | 770.0 | — |
+| `average_aggregate` | 30.2 | 1.0 | 3.5 | 18.8 | 25.7 | 13.5 |
+| `bare_last` | — | 0.4 | 0.0 | 0.0 | 0.0 | 17.2 |
+| `bare_order_where` | 185.1 | 34.2 | 35.0 | 105.5 | 53.0 | 78.8 |
+| `chained_select_collapse` | — | 1.1 | 1.1 | 20.6 | 33.9 | 14.0 |
+| `chained_where` | 36.9 | 0.6 | 0.8 | 36.6 | 32.1 | 17.8 |
+| `contains_match` | 0.0 | 0.2 | 0.1 | 17.5 | 9.4 | 4.7 |
+| `count_aggregate` | 29.5 | 0.3 | 0.6 | 29.5 | 26.4 | 13.5 |
+| `cross_join` | 5991.6 | 734.4 | — | 834.6 | 771.2 | — |
 | `decs_count_bare_pred` | — | — | 0.6 | — | — | — |
-| `distinct_by_count` | 41.7 | 1.1 | 1.1 | 20.6 | 33.5 | 14.0 |
-| `distinct_by_order_take` | 239.3 | 1.7 | 2.6 | 46.3 | 38.8 | 30.1 |
-| `distinct_by_order_to_array` | 240.2 | 1.7 | 2.7 | 46.4 | 38.7 | 30.3 |
-| `distinct_count` | 41.6 | 1.1 | 1.1 | 20.6 | 33.5 | 14.0 |
-| `distinct_count_pred` | 251.7 | 1.1 | 1.3 | 37.7 | 43.8 | 14.0 |
+| `distinct_by_count` | 42.1 | 1.1 | 1.1 | 20.6 | 33.9 | 14.1 |
+| `distinct_by_order_take` | 249.6 | 1.7 | 2.6 | 45.2 | 39.0 | 30.3 |
+| `distinct_by_order_to_array` | 252.5 | 1.7 | 2.7 | 45.5 | 38.9 | 30.2 |
+| `distinct_count` | 41.7 | 1.1 | 1.1 | 20.6 | 33.7 | 14.1 |
+| `distinct_count_pred` | 265.8 | 1.1 | 1.3 | 37.8 | 43.6 | 14.0 |
 | `distinct_take` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
 | `element_at_match` | 0.0 | 0.0 | 0.0 | 0.1 | 0.0 | 0.0 |
 | `first_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
 | `first_or_default_match` | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
-| `groupby_average` | 171.5 | 1.5 | 1.9 | 35.5 | 45.5 | — |
-| `groupby_count` | 142.1 | 1.3 | 1.5 | 20.6 | 33.8 | 42.7 |
-| `groupby_first` | 251.9 | 1.3 | 2.3 | 20.6 | 34.4 | — |
-| `groupby_having_count` | 141.3 | 1.3 | 1.5 | 20.6 | 33.9 | — |
-| `groupby_having_hidden_sum` | 176.9 | 1.5 | 1.7 | 35.5 | 45.2 | — |
-| `groupby_having_post_where` | 171.1 | 1.4 | 1.9 | 35.5 | 44.1 | — |
-| `groupby_max` | 173.4 | 1.5 | 1.9 | 35.5 | 45.8 | — |
-| `groupby_min` | 172.8 | 1.5 | 1.8 | 35.6 | 45.8 | — |
-| `groupby_multi_reducer` | 190.2 | 1.6 | 1.9 | 35.8 | 46.1 | — |
-| `groupby_select_order` | 170.9 | 1.4 | 1.9 | 35.4 | 44.3 | — |
-| `groupby_select_sum` | 200.0 | 2.8 | 3.2 | 31.8 | 39.9 | — |
-| `groupby_sum` | 170.9 | 1.4 | 1.6 | 35.5 | 44.3 | 51.2 |
-| `groupby_where_count` | 76.3 | 0.9 | 1.3 | 35.6 | 41.8 | — |
-| `groupby_where_sum` | 87.6 | 0.9 | 1.3 | 35.6 | 41.9 | — |
-| `join_count` | 38.2 | 10.9 | 11.8 | 42.6 | 71.5 | 32.2 |
-| `join_groupby_count` | 156.9 | 17.6 | 19.5 | 68.3 | 89.8 | — |
-| `join_groupby_to_array` | 189.8 | 17.5 | 19.4 | 79.3 | 36.1 | — |
-| `join_probe` | — | — | — | — | — | 24.0 |
-| `join_probe_build` | — | — | — | — | — | 38.1 |
-| `join_select` | 94.0 | 19.6 | 21.7 | 73.8 | 95.2 | — |
-| `join_where_count` | 39.8 | 18.9 | 20.8 | 63.5 | 78.3 | 37.8 |
-| `last_match` | 0.0 | 0.5 | 1.4 | 18.2 | 25.9 | 22.9 |
-| `long_count_aggregate` | 29.2 | 0.3 | 0.6 | 17.5 | 25.5 | 13.4 |
-| `max_aggregate` | 31.0 | 0.3 | 0.5 | 17.4 | 27.1 | 13.4 |
-| `min_aggregate` | 31.1 | 0.3 | 0.5 | 17.4 | 27.0 | 13.5 |
-| `order_by_multi_key` | 250.0 | 53.1 | 54.7 | 123.6 | 71.3 | 129.4 |
-| `order_distinct_take` | 138.1 | 1.1 | 75.3 | 20.9 | 35.7 | 14.0 |
-| `order_reverse_normalized` | 38.5 | 0.7 | 1.3 | 22.0 | 27.7 | — |
-| `order_take_desc` | 38.2 | 0.7 | 1.3 | 22.0 | 27.5 | 17.8 |
+| `groupby_average` | 177.2 | 1.6 | 1.9 | 37.2 | 45.6 | 51.9 |
+| `groupby_count` | 145.8 | 1.3 | 1.5 | 20.6 | 34.1 | 43.9 |
+| `groupby_first` | 265.0 | 1.3 | 2.3 | 20.7 | 34.6 | 43.7 |
+| `groupby_having_count` | 144.6 | 1.3 | 1.5 | 20.7 | 34.1 | 46.7 |
+| `groupby_having_hidden_sum` | 180.4 | 1.5 | 1.7 | 37.0 | 45.4 | 55.0 |
+| `groupby_having_post_where` | 177.4 | 1.4 | 2.0 | 37.0 | 44.2 | 51.4 |
+| `groupby_max` | 179.1 | 1.5 | 1.9 | 37.1 | 46.0 | 52.0 |
+| `groupby_min` | 179.2 | 1.5 | 1.8 | 37.0 | 46.1 | 52.4 |
+| `groupby_multi_reducer` | 195.2 | 1.6 | 2.0 | 37.1 | 45.9 | 61.3 |
+| `groupby_select_order` | 176.3 | 1.4 | 1.9 | 37.0 | 44.4 | 51.4 |
+| `groupby_select_sum` | 205.9 | 2.8 | 3.2 | 33.2 | 39.7 | 73.0 |
+| `groupby_sum` | 175.9 | 1.4 | 1.6 | 37.0 | 44.5 | 51.9 |
+| `groupby_where_count` | 76.5 | 0.9 | 1.3 | 37.2 | 41.9 | 52.2 |
+| `groupby_where_sum` | 87.7 | 0.9 | 1.3 | 36.9 | 42.0 | 56.1 |
+| `join_count` | 38.7 | 11.0 | 11.7 | 40.9 | 71.4 | 31.8 |
+| `join_groupby_count` | 160.1 | 17.4 | 19.7 | 66.4 | 90.1 | 72.9 |
+| `join_groupby_to_array` | 194.1 | 17.9 | 19.8 | 78.4 | 36.1 | 81.1 |
+| `join_probe` | — | — | — | — | — | 24.2 |
+| `join_probe_build` | — | — | — | — | — | 39.8 |
+| `join_select` | 94.0 | 19.7 | 21.8 | 72.2 | 94.4 | 70.1 |
+| `join_where_count` | 39.6 | 19.3 | 20.6 | 63.2 | 78.2 | 38.0 |
+| `last_match` | 0.0 | 0.5 | 1.4 | 19.6 | 26.9 | 22.8 |
+| `long_count_aggregate` | 29.8 | 0.3 | 0.6 | 29.4 | 26.5 | 13.8 |
+| `max_aggregate` | 31.0 | 0.3 | 0.5 | 29.8 | 27.9 | 13.5 |
+| `min_aggregate` | 31.2 | 0.3 | 0.5 | 29.8 | 27.7 | 13.5 |
+| `order_by_multi_key` | 251.0 | 54.8 | 54.8 | 124.4 | 71.8 | 129.5 |
+| `order_distinct_take` | 142.6 | 1.1 | 75.8 | 21.0 | 35.8 | 14.0 |
+| `order_reverse_normalized` | 38.7 | 0.7 | 1.4 | 19.8 | 28.6 | 17.8 |
+| `order_take_desc` | 38.6 | 0.7 | 1.3 | 19.7 | 28.4 | 17.8 |
 | `point_lookup` | — | — | — | — | — | 0.0 |
 | `point_lookup_scan` | — | — | — | — | — | 6.0 |
-| `reverse_distinct_by` | 295.7 | 1.5 | 3.2 | 20.5 | 34.4 | — |
-| `reverse_take` | 0.0 | 0.0 | 0.0 | 0.0 | 3.8 | 26.9 |
-| `reverse_take_select` | 0.0 | 0.0 | 0.0 | 0.0 | 3.9 | — |
-| `select_count` | 0.1 | 0.0 | 0.0 | 67.0 | 0.0 | 0.0 |
-| `select_many` | — | 62.5 | — | — | — | — |
-| `select_where` | 110.7 | 4.1 | 5.3 | 74.8 | 22.1 | 27.9 |
-| `select_where_count` | 32.6 | 0.3 | 0.6 | 17.4 | 26.3 | 13.4 |
-| `select_where_order_take` | 36.7 | 0.7 | 1.3 | 18.4 | 27.3 | 23.1 |
-| `select_where_sum` | 37.2 | 0.4 | 0.6 | 17.4 | 25.6 | 13.4 |
-| `single_match` | 0.0 | 0.4 | 1.1 | 46.2 | 22.3 | 17.3 |
-| `skip_take` | 0.3 | 0.0 | 0.0 | 1.3 | 0.3 | 0.2 |
-| `skip_while_match` | 3.4 | 0.4 | 0.4 | 46.0 | 21.8 | 13.2 |
-| `sort_first` | 38.4 | 0.4 | 1.3 | 17.4 | 26.7 | 17.2 |
-| `sort_take` | 38.6 | 0.7 | 1.3 | 22.0 | 27.9 | 17.8 |
-| `sort_take_select` | 38.3 | 0.7 | 1.3 | 21.9 | 27.7 | 17.8 |
-| `sum_aggregate` | 30.6 | 0.3 | 0.1 | 17.7 | 24.9 | 13.5 |
-| `sum_where` | 33.0 | 0.3 | 0.6 | 17.4 | 26.3 | 13.4 |
-| `take_count` | 1.9 | 0.1 | 0.1 | 1.2 | 0.3 | 0.2 |
-| `take_count_filtered` | 1.1 | 0.0 | 0.0 | 0.4 | 0.1 | 0.1 |
+| `reverse_distinct_by` | 297.0 | 1.6 | 3.1 | 20.6 | 34.6 | 18.8 |
+| `reverse_take` | 0.0 | 0.0 | 0.0 | 0.0 | 3.8 | 27.0 |
+| `reverse_take_select` | 0.0 | 0.0 | 0.0 | 0.0 | 3.9 | 27.1 |
+| `select_count` | 0.1 | 0.0 | 0.0 | 68.1 | 0.0 | 0.0 |
+| `select_many` | — | 62.7 | — | — | — | — |
+| `select_where` | 108.3 | 4.1 | 5.3 | 75.4 | 23.1 | 28.2 |
+| `select_where_count` | 32.9 | 0.3 | 0.6 | 29.9 | 27.2 | 13.5 |
+| `select_where_order_take` | 37.0 | 0.7 | 1.4 | 19.8 | 27.9 | 23.3 |
+| `select_where_sum` | 37.4 | 0.4 | 0.6 | 20.4 | 26.2 | 13.4 |
+| `single_match` | 0.0 | 0.4 | 1.1 | 46.1 | 23.2 | 17.4 |
+| `skip_take` | 0.3 | 0.0 | 0.0 | 1.3 | 0.2 | 0.2 |
+| `skip_while_match` | 3.5 | 0.4 | 0.4 | 46.0 | 22.2 | 13.3 |
+| `sort_first` | 38.3 | 0.4 | 1.3 | 18.9 | 27.5 | 17.3 |
+| `sort_take` | 38.3 | 0.7 | 1.4 | 19.7 | 28.5 | 17.8 |
+| `sort_take_select` | 38.4 | 0.7 | 1.3 | 19.8 | 28.4 | 17.7 |
+| `sum_aggregate` | 30.4 | 0.3 | 0.1 | 23.3 | 25.6 | 13.5 |
+| `sum_where` | 33.1 | 0.3 | 0.6 | 29.5 | 27.1 | 13.5 |
+| `take_count` | 1.8 | 0.1 | 0.1 | 1.2 | 0.2 | 0.3 |
+| `take_count_filtered` | 1.1 | 0.0 | 0.0 | 0.5 | 0.1 | 0.2 |
 | `take_sum_aggregate` | 0.8 | 0.0 | 0.0 | 0.2 | 0.0 | 0.1 |
-| `take_where_count` | 0.9 | 0.0 | 0.0 | 0.2 | 0.0 | 0.1 |
-| `take_while_match` | 7.8 | 0.2 | 0.3 | 17.4 | 8.9 | 13.4 |
-| `to_array_filter` | 48.9 | 3.2 | 3.3 | 20.8 | 35.5 | 20.2 |
-| `to_table` | — | — | — | — | — | 28.8 |
-| `to_table_staged` | — | — | — | — | — | 41.6 |
-| `where_join_count` | 41.3 | 5.7 | 6.8 | 48.8 | 41.8 | — |
-| `zip_count_pred` | 39.8 | 0.1 | — | 115.3 | 33.8 | — |
-| `zip_dot_product` | 47.3 | 0.1 | 0.1 | 115.4 | 33.8 | — |
-| `zip_dot_product_3arg` | 47.1 | 0.1 | — | 115.3 | 33.7 | — |
-| `zip_reverse_to_array` | — | 4.5 | — | 127.0 | 51.4 | — |
+| `take_where_count` | 0.9 | 0.0 | 0.0 | 0.3 | 0.0 | 0.1 |
+| `take_while_match` | 7.8 | 0.2 | 0.3 | 17.3 | 9.3 | 13.5 |
+| `to_array_filter` | 48.5 | 3.3 | 3.4 | 22.2 | 35.4 | 20.3 |
+| `to_table` | — | 14.1 | 37.4 | 49.7 | 54.3 | 29.2 |
+| `to_table_staged` | — | 25.8 | 26.1 | 53.5 | 64.1 | 42.1 |
+| `where_join_count` | 39.6 | 5.8 | 6.8 | 47.7 | 42.1 | 26.9 |
+| `zip_count_pred` | 39.2 | 0.1 | — | 112.6 | 34.2 | — |
+| `zip_dot_product` | 46.9 | 0.1 | 0.1 | 112.4 | 34.1 | — |
+| `zip_dot_product_3arg` | 46.9 | 0.1 | — | 112.4 | 34.1 | — |
+| `zip_reverse_to_array` | — | 4.6 | — | 123.5 | 51.8 | — |
 <!-- BENCH:TABLES END -->
 
 ## Missing lanes (the `—` cells)
@@ -220,10 +220,10 @@ Each empty cell's reason is also in the bench `.das` file's comment; SQL gaps ar
 - **`reverse_distinct_by` m4 / m5f** — array uses the backward-index walk; non-array sources fuse the forward keep-last splice (decs 27.6/5.0, XML 74.5/22.2); SQL uses MAX(pk).
 - **`order_distinct_take` m4 vs m3f** — `unique_key` hashes workhorse keys directly (array `int`) but string-interpolates structs (decs `DecsBrand`); the gap is per-element string hashing, not decs-walk. `distinct_by_count` is the key-based variant (m4 parity).
 - **`zip_reverse_to_array` / `zip_*` SQL / Decs** — `reverse` has no SQL order key; zip is not relational / not expressible over one archetype walk. By design. (XML/JSON zip lanes are lit, partially fused.)
-- **m7 absent families** — `zip_*` / `cross_join` (lockstep over an unordered slot walk is meaningless), `select_many` (flat fixture, no nested array field), `order_reverse_normalized` / `reverse_take_select` / `reverse_distinct_by` (no backward slot walk; `reverse_take` is kept as the single deferral marker), the group-by tail beyond `groupby_count`/`groupby_sum` (table group_by fusion is a named deferred edge — see `LINQ_TO_TABLE.md`; the two marker cells track the tier-2 cost) plus the join-composition lanes (`join_select` / `where_join_count` would fuse today but aren't instantiated; `join_groupby_*` needs the deferred group_by), `decs_count_bare_pred` (decs-only).
+- **m7 absent families** — `zip_*` / `cross_join` (lockstep pairing over an unordered slot walk is meaningless) and `select_many` (flat fixture, no nested array field; array-only). Everything else in the m7 column is instantiated — but read the `groupby_*` / `join_groupby_*` / reverse-family cells as the **tier-2 cascade cost**, not a fused emit: table group_by fusion and a backward slot walk are named deferred edges (see `LINQ_TO_TABLE.md`), so those cells are the numbers a fix would improve.
 - **`point_lookup` / `point_lookup_scan` non-m7** — m7-only pair: only a table source has a key to probe (`where(kv.key == X)` + terminator → `key_exists` / `tab?[X]`, O(1)); the `_scan` twin forces the same query through the walk (compound `&&` predicate declines the probe) to show the gap. Other sources have no analog by design.
 - **`join_probe` / `join_probe_build` non-m7** — m7-only A/B pair: a table srcB joined on its bare key probes the user's table per lead row (no internal join hash, no build loop); the `_build` twin feeds the identical rows pre-materialized to a kv array, forcing the hashed build. Other sources have no keyed-srcB analog by design.
-- **`to_table` / `to_table_staged` non-m7** — m7-only A/B pair for the `to_table()` sink: the fused insert-loop lands the kv chain straight in the result table (reserve from O(1) length); the `_staged` twin materializes the same projection to an array first, then converts via the consuming builtin `to_table_move` — the shape every chain had before the sink arm. The sink itself works over any direct-loop source (the array lane fuses it too); only the bench pair is table-scoped.
+- **`to_table` / `to_table_staged` SQL** — `to_table` isn't an SQL terminator (`_sql` pass-through has no table sink). All in-memory sources are instantiated: array / XML / JSON / table fuse the insert-loop sink (`_staged` is the materialize-then-`to_table_move` shape every chain had before the sink arm); decs declines by design (explicit guard in its loop_or_count lane), so its `to_table` cell is the full tier-2 cascade — currently slower than its `_staged` twin, which fuses the array materialization first. That gap is the motivating number for a future decs sink hook.
 
 ## Accepted floors
 
diff --git a/benchmarks/sql/table.das b/benchmarks/sql/table.das
index d0afb6557..0a28d7afd 100644
--- a/benchmarks/sql/table.das
+++ b/benchmarks/sql/table.das
@@ -260,6 +260,21 @@ def first_or_default_match_m7(b : B?) {
     }
 }
 
+[benchmark]
+def groupby_average_m7(b : B?) {
+    b |> run("groupby_average", N) {
+        let groups <- _fold(unsafe(each_kv(g_t))
+                            ._group_by(_.value.brand)
+                            ._select((Brand = _._0,
+                                      AvgPrice = _._1 |> select($(c : CarKV) => c.value.price) |> average()))
+                            .to_array())
+        b |> accept(groups)
+        if (empty(groups)) {
+            b->failNow()
+        }
+    }
+}
+
 [benchmark]
 def groupby_count_m7(b : B?) {
     b |> run("groupby_count", N) {
@@ -274,6 +289,145 @@ def groupby_count_m7(b : B?) {
     }
 }
 
+[benchmark]
+def groupby_first_m7(b : B?) {
+    b |> run("groupby_first", N) {
+        let groups <- _fold(unsafe(each_kv(g_t))
+                            ._group_by(_.value.brand)
+                            ._select((Brand = _._0,
+                                      FirstCar = _._1 |> first()))
+                            .to_array())
+        b |> accept(groups)
+        if (empty(groups)) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def groupby_having_count_m7(b : B?) {
+    b |> run("groupby_having_count", N) {
+        let groups <- _fold(unsafe(each_kv(g_t))
+                            ._group_by(_.value.brand)
+                            ._having(_._1 |> length >= 5)
+                            ._select((Brand = _._0, N = _._1 |> length))
+                            .to_array())
+        b |> accept(groups)
+        if (empty(groups)) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def groupby_having_hidden_sum_m7(b : B?) {
+    b |> run("groupby_having_hidden_sum", N) {
+        let groups <- _fold(unsafe(each_kv(g_t))
+                            ._group_by(_.value.brand)
+                            ._having(_._1 |> select($(c : CarKV) => c.value.price) |> sum > 50000)
+                            ._select((Brand = _._0, N = _._1 |> length))
+                            .to_array())
+        b |> accept(groups)
+        if (empty(groups)) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def groupby_having_post_where_m7(b : B?) {
+    b |> run("groupby_having_post_where", N) {
+        let groups <- _fold(unsafe(each_kv(g_t))
+                            ._group_by(_.value.brand)
+                            ._select((Brand = _._0,
+                                      Total = _._1 |> select($(c : CarKV) => c.value.price) |> sum()))
+                            ._where(_.Total > 9000000)
+                            .to_array())
+        b |> accept(groups)
+        if (empty(groups)) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def groupby_max_m7(b : B?) {
+    b |> run("groupby_max", N) {
+        let groups <- _fold(unsafe(each_kv(g_t))
+                            ._group_by(_.value.brand)
+                            ._select((Brand = _._0,
+                                      MaxPrice = _._1 |> select($(c : CarKV) => c.value.price) |> max()))
+                            .to_array())
+        b |> accept(groups)
+        if (empty(groups)) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def groupby_min_m7(b : B?) {
+    b |> run("groupby_min", N) {
+        let groups <- _fold(unsafe(each_kv(g_t))
+                            ._group_by(_.value.brand)
+                            ._select((Brand = _._0,
+                                      MinPrice = _._1 |> select($(c : CarKV) => c.value.price) |> min()))
+                            .to_array())
+        b |> accept(groups)
+        if (empty(groups)) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def groupby_multi_reducer_m7(b : B?) {
+    b |> run("groupby_multi_reducer", N) {
+        let groups <- _fold(unsafe(each_kv(g_t))
+                            ._group_by(_.value.brand)
+                            ._select((Brand = _._0,
+                                      N = _._1 |> length,
+                                      TotalPrice = _._1 |> select($(c : CarKV) => c.value.price) |> sum(),
+                                      MaxPrice   = _._1 |> select($(c : CarKV) => c.value.price) |> max()))
+                            .to_array())
+        b |> accept(groups)
+        if (empty(groups)) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def groupby_select_order_m7(b : B?) {
+    b |> run("groupby_select_order", N) {
+        let groups <- _fold(unsafe(each_kv(g_t))
+                            ._group_by(_.value.brand)
+                            ._select((Brand = _._0,
+                                      Total = _._1 |> select($(c : CarKV) => c.value.price) |> sum()))
+                            ._order_by(_.Total)
+                            .to_array())
+        b |> accept(groups)
+        if (empty(groups)) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def groupby_select_sum_m7(b : B?) {
+    b |> run("groupby_select_sum", N) {
+        let groups <- _fold(unsafe(each_kv(g_t))
+                            ._select(_.value.price)
+                            ._group_by(_ % 100)
+                            ._select((K = _._0, S = _._1 |> sum()))
+                            .to_array())
+        b |> accept(groups)
+        if (empty(groups)) {
+            b->failNow()
+        }
+    }
+}
+
 [benchmark]
 def groupby_sum_m7(b : B?) {
     b |> run("groupby_sum", N) {
@@ -289,6 +443,37 @@ def groupby_sum_m7(b : B?) {
     }
 }
 
+[benchmark]
+def groupby_where_count_m7(b : B?) {
+    b |> run("groupby_where_count", N) {
+        let groups <- _fold(unsafe(each_kv(g_t))
+                            ._where(_.value.price > 500)
+                            ._group_by(_.value.brand)
+                            ._select((Brand = _._0, N = _._1 |> length))
+                            .to_array())
+        b |> accept(groups)
+        if (empty(groups)) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def groupby_where_sum_m7(b : B?) {
+    b |> run("groupby_where_sum", N) {
+        let groups <- _fold(unsafe(each_kv(g_t))
+                            ._where(_.value.price > 500)
+                            ._group_by(_.value.brand)
+                            ._select((Brand = _._0,
+                                      TotalPrice = _._1 |> select($(c : CarKV) => c.value.price) |> sum()))
+                            .to_array())
+        b |> accept(groups)
+        if (empty(groups)) {
+            b->failNow()
+        }
+    }
+}
+
 [benchmark]
 def join_count_m7(b : B?) {
     b |> run("join_count", N) {
@@ -303,6 +488,37 @@ def join_count_m7(b : B?) {
     }
 }
 
+[benchmark]
+def join_groupby_count_m7(b : B?) {
+    b |> run("join_groupby_count", N) {
+        let groups <- _fold(unsafe(each_kv(g_t)) |> _join(g_dealers,
+                                                          $(c : CarKV, d : Dealer) => c.value.dealer_id == d.id,
+                                                          $(c : CarKV, d : Dealer) => (Brand = c.value.brand, DealerId = d.id))
+                                                 |> _group_by(_.Brand)
+                                                 |> _select((Brand = _._0, N = _._1 |> count())))
+        b |> accept(groups)
+        if (empty(groups)) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def join_groupby_to_array_m7(b : B?) {
+    b |> run("join_groupby_to_array", N) {
+        let groups <- _fold(unsafe(each_kv(g_t)) |> _join(g_dealers,
+                                                          $(c : CarKV, d : Dealer) => c.value.dealer_id == d.id,
+                                                          $(c : CarKV, _d : Dealer) => (Brand = c.value.brand, Price = c.value.price))
+                                                 |> _group_by(_.Brand)
+                                                 |> _select((Brand = _._0,
+                                                             Total = _._1 |> select($(t : tuple<Brand : int; Price : int>) => t.Price) |> sum())))
+        b |> accept(groups)
+        if (empty(groups)) {
+            b->failNow()
+        }
+    }
+}
+
 [benchmark]
 def join_probe_m7(b : B?) {
     // srcB is a table joined on its bare key → fused key probe, no internal join hash
@@ -335,6 +551,20 @@ def join_probe_build_m7(b : B?) {
     }
 }
 
+[benchmark]
+def join_select_m7(b : B?) {
+    b |> run("join_select", N) {
+        let rows <- _fold(unsafe(each_kv(g_t)) |> _join(g_dealers,
+                                                        $(c : CarKV, d : Dealer) => c.value.dealer_id == d.id,
+                                                        $(c : CarKV, d : Dealer) => (CarName = c.value.name, DealerId = d.id))
+                                               |> _select(_.CarName))
+        b |> accept(rows)
+        if (empty(rows)) {
+            b->failNow()
+        }
+    }
+}
+
 [benchmark]
 def join_where_count_m7(b : B?) {
     b |> run("join_where_count", N) {
@@ -420,6 +650,19 @@ def order_distinct_take_m7(b : B?) {
     }
 }
 
+[benchmark]
+def order_reverse_normalized_m7(b : B?) {
+    b |> run("order_reverse_normalized", N) {
+        unsafe {
+            let rows <- _fold(each_kv(g_t)._order_by(_.value.price).reverse().take(10).to_array())
+            b |> accept(rows)
+            if (empty(rows)) {
+                b->failNow()
+            }
+        }
+    }
+}
+
 [benchmark]
 def order_take_desc_m7(b : B?) {
     b |> run("order_take_desc", N) {
@@ -457,6 +700,19 @@ def point_lookup_scan_m7(b : B?) {
     }
 }
 
+[benchmark]
+def reverse_distinct_by_m7(b : B?) {
+    b |> run("reverse_distinct_by", N) {
+        unsafe {
+            let rows <- _fold(each_kv(g_t).reverse()._distinct_by(_.value.brand).to_array())
+            b |> accept(rows)
+            if (empty(rows)) {
+                b->failNow()
+            }
+        }
+    }
+}
+
 [benchmark]
 def reverse_take_m7(b : B?) {
     b |> run("reverse_take", N) {
@@ -470,6 +726,19 @@ def reverse_take_m7(b : B?) {
     }
 }
 
+[benchmark]
+def reverse_take_select_m7(b : B?) {
+    b |> run("reverse_take_select", N) {
+        unsafe {
+            let rows <- _fold(each_kv(g_t).reverse().take(10)._select(_.value.name).to_array())
+            b |> accept(rows)
+            if (empty(rows)) {
+                b->failNow()
+            }
+        }
+    }
+}
+
 [benchmark]
 def select_count_m7(b : B?) {
     b |> run("select_count", N) {
@@ -714,3 +983,18 @@ def to_table_staged_m7(b : B?) {
         delete tab
     }
 }
+
+[benchmark]
+def where_join_count_m7(b : B?) {
+    b |> run("where_join_count", N) {
+        let c = _fold(unsafe(each_kv(g_t)) |> _where(_.value.price > 500)
+                                           |> _join(g_dealers,
+                                                    $(c : CarKV, d : Dealer) => c.value.dealer_id == d.id,
+                                                    $(c : CarKV, d : Dealer) => (CarPrice = c.value.price, DealerId = d.id))
+                                           |> count())
+        b |> accept(c)
+        if (c == 0) {
+            b->failNow()
+        }
+    }
+}
diff --git a/benchmarks/sql/xml.das b/benchmarks/sql/xml.das
index b8efcdcfb..fed22b0ea 100644
--- a/benchmarks/sql/xml.das
+++ b/benchmarks/sql/xml.das
@@ -904,6 +904,33 @@ def to_array_filter_m5f(b : B?) {
         }
 }
 
+[benchmark]
+def to_table_m5f(b : B?) {
+    // fused insert-loop sink: the chain lands straight in the result table
+    b |> run("to_table", N) {
+        var tab <- _fold(unsafe(from_xml_node(g_root, type<Car>)) |> _select((_.id => _.price)) |> to_table())
+        b |> accept(length(tab))
+        if (empty(tab)) {
+            b->failNow()
+        }
+        delete tab
+    }
+}
+
+[benchmark]
+def to_table_staged_m5f(b : B?) {
+    // staged baseline: materialize the kv tuples to an array, then convert
+    b |> run("to_table_staged", N) {
+        var rows <- _fold(unsafe(from_xml_node(g_root, type<Car>)) |> _select((_.id => _.price)) |> to_array())
+        var tab <- to_table_move(rows)
+        b |> accept(length(tab))
+        if (empty(tab)) {
+            b->failNow()
+        }
+        delete tab
+    }
+}
+
 [benchmark]
 def where_join_count_m5f(b : B?) {
         b |> run("where_join_count", N) {

From 901b014e0a12bbbfd2b6d73562000b07fb763852 Mon Sep 17 00:00:00 2001
From: Boris Batkin <bbatkin@gmail.com>
Date: Thu, 11 Jun 2026 10:01:59 -0700
Subject: [PATCH 11/11] docs: declare U+2261/U+21D2/U+00D7 for the LaTeX build
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The arc's linq_fold_patterns.rst additions use ≡ / ⇒ / × in prose; pdflatex
halts on undeclared unicode (CI docs job failed on U+2261). conf.py's preamble
is the documented place for these — verified locally via sphinx -b latex +
pdflatex -halt-on-error pass 1.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 doc/source/conf.py | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/doc/source/conf.py b/doc/source/conf.py
index c27f78786..addfc9b45 100644
--- a/doc/source/conf.py
+++ b/doc/source/conf.py
@@ -268,8 +268,11 @@
 \DeclareUnicodeCharacter{2194}{\ensuremath{\leftrightarrow}}
 \DeclareUnicodeCharacter{2195}{\ensuremath{\updownarrow}}
 \DeclareUnicodeCharacter{2260}{\ensuremath{\neq}}
+\DeclareUnicodeCharacter{2261}{\ensuremath{\equiv}}
 \DeclareUnicodeCharacter{2264}{\ensuremath{\leq}}
 \DeclareUnicodeCharacter{2265}{\ensuremath{\geq}}
+\DeclareUnicodeCharacter{21D2}{\ensuremath{\Rightarrow}}
+\DeclareUnicodeCharacter{00D7}{\ensuremath{\times}}
 ''',
 
 # Latex figure (float) alignment