Skip to content

jit: inline table slot walk for keys/values for-loop sources#3100

Merged
borisbat merged 3 commits into
masterfrom
bbatkin/jit-table-walk
Jun 11, 2026
Merged

jit: inline table slot walk for keys/values for-loop sources#3100
borisbat merged 3 commits into
masterfrom
bbatkin/jit-table-walk

Conversation

@borisbat

Copy link
Copy Markdown
Collaborator

What

For-loops over keys(tab) / values(tab) (including every fused linq kv chain emitted by _fold's TableAdapter) previously compiled, under JIT, to a heap-allocated C++ TableIterator plus a first/next call per element per lane — the m7 JIT column sat at roughly INTERP level. This PR compiles them to an inline slot walk instead:

  • jit_table_lock once at first(), header loaded once (CAPACITY / SIZE / HASHES / KEYS-or-DATA),
  • scan ctrl[slot] > CTRL_TOMBSTONE — workhorse keys are open-addressed at every capacity since table: non-string keys open-addressed; full inline JIT find/at for workhorse + string keys #3025, so the walk is a flat 1-byte control scan with no regime branch,
  • keys lanes copy the slot key out (past-end read guarded exactly like the C++ iterator); values lanes bind a pointer into the data block (ref loop var),
  • close() re-checks the data base against its origin (catches modified-during-iteration on shared/hopeless tables that bypass the lock — "table was modified during iteration"), then jit_table_unlock.

The source call is suppressed via skipCall (same mechanism as count()), so no iterator is ever allocated. String / non-workhorse keys keep the generic C++ iterator — they have per-slot liveness regimes (tableLiveSlot) the flat scan can't honor.

Detection note: the daslib keys/values generics instantiate into the compiling module under compiler-generated names (builtin + backtick-mangled), so the match is on the name prefix, not name == "keys" && module == "$" — the latter never fires for generic instances.

LLVM_JIT_CODEGEN_VERSION bumped 0x25 → 0x26 to invalidate cached dlls.

Numbers

benchmarks/sql m7 (table) JIT column, ns/op, before → after (results.md re-swept in this PR):

family before after
count_aggregate / sum_aggregate 13.5 7.3
chained_where 17.8 10.4
select_where 28.2 17.9
last_match 22.8 12.1
join_probe 24.2 16.7
reverse_take 27.0 19.3
point_lookup_scan 6.0 3.0

groupby_* stays ~44 — its cost is the tier-2 group cascade, not source iteration. INTERP matrix flat within noise (interpreter path untouched).

Tests

New tests/jit_tests/table_walk.das (8/8 INTERP + JIT, with is_jit_function firing asserts): kv zip sums, keys-only / values-only lanes, by-ref value mutation, tombstones (erase mid-table), break → unlock → re-mutate, string-key fallback, int64 keys, insert-during-iteration panic.

Gates

  • full INTERP 10992 tests / 0 failed, full AOT 10304 / 0 failed
  • JIT: jit_tests 295, linq 1962, language 1054 — all pass
  • exe-build smoke (-exe link of utils/lint) — ok
  • CI lint 0 issues on changed files; formatter clean; das2rst no-op

🤖 Generated with Claude Code

borisbat and others added 3 commits June 11, 2026 10:32
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A for-loop over keys(tab)/values(tab) (incl. the fused linq kv zips) compiled
to a heap-allocated C++ TableIterator + first/next call per element per lane.
Workhorse-keyed tables are open-addressed at every capacity (#3025), so the
walk is a flat ctrl-byte scan — now emitted inline: lock once, scan
ctrl[slot] > CTRL_TOMBSTONE, keys copy the slot key out (past-end guarded,
like the C++ iterator), values bind a pointer into the data block, close
re-checks the data base (modified-during-iteration on shared/hopeless
tables that bypass the lock) and unlocks. String / non-workhorse keys keep
the generic iterator (different liveness regimes).

Detection: the daslib generics instantiate into the compiling module as
builtin`keys`<hash> — matched by that compiler-generated prefix (the
plain-name + module-$ check never fired; instances don't keep either).
The skipped source call never allocates an iterator, mirroring count().

Glue: jit_table_lock/unlock (module_jit.cpp wrapping builtin_table_lock/
unlock; engine mapping + DAS_API symbol for the exe/dll paths).
LLVM_JIT_CODEGEN_VERSION 0x25 -> 0x26.

m7 JIT spot numbers (ns/elem): count/sum/max_aggregate 13.4 -> 7.3,
chained_where 17.8 -> 10.4, join_count 33 -> 25.2, join_probe 24 -> 16.6,
groupby_count ~160 -> 44.1, reverse_take ~70 -> 19.3, point_lookup_scan
6.0 -> 3.0, last_match -> 12.0. Full sweep + results.md refresh after the
table-arc PR (#3099) merges and this branch rebases onto it.

Gates: JIT tests/linq 1962/1962, tests/language 1054/1054, jit_tests +
decs + json green, exe-build smoke links (the #3025 dll-glob lesson), new
tests/jit_tests/table_walk.das 8/8 INTERP+JIT (incl. is_jit_function
firing checks, tombstones, by-ref values, break-unlock, locked-iteration
panic, string-key fallback). CI lint clean.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
results.md regenerated (2026-06-11, INTERP+JIT matrices). INTERP flat within
noise; JIT m7 halves on walk-dominated families (chained_where 17.8->10.4,
count_aggregate 13.5->7.3, join_probe 24.2->16.7, last_match 22.8->12.1,
point_lookup_scan 6.0->3.0, select_where 28.2->17.9 ns/op). groupby stays ~44
(tier-2 group cascade dominates, source walk was never the bottleneck there).
One prose line added to the m7 bullet: the JIT column is now fused codegen end
to end.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 11, 2026 18:01

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the LLVM JIT lowering of for loops whose sources are keys(tab) / values(tab) for workhorse-keyed tables by eliminating the heap-allocated TableIterator and replacing it with an inlined open-addressed slot walk (ctrl-byte scan) guarded by table lock/unlock semantics. This targets a major JIT performance gap where fused table-fold lanes were previously bottlenecked by per-element C++ iterator calls.

Changes:

  • Add an inline JIT slot-walk implementation for keys(tab) / values(tab) iterator sources (workhorse keys only), including lock/unlock and modified-during-iteration detection at close.
  • Expose JIT runtime entrypoints for table lock/unlock and bump LLVM_JIT_CODEGEN_VERSION to invalidate cached JIT DLLs.
  • Add a dedicated JIT test suite for table-walk behavior and update benchmark result notes.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/jit_tests/table_walk.das Adds JIT+INTERP tests covering the new inline slot-walk behavior (keys/values, tombstones, break/unlock, mutation panic, fallbacks).
src/builtin/module_jit.cpp Exposes JIT-accessible table lock/unlock function pointers for LLVM JIT codegen.
modules/dasLLVM/daslib/llvm_jit.das Implements the inline table ctrl-byte scan iterator path for keys/values loop sources and integrates it into for-loop lowering/close.
modules/dasLLVM/daslib/llvm_jit_common.das Registers/mappings for new jit_table_lock / jit_table_unlock helper functions in the JIT module.
modules/dasLLVM/daslib/llvm_jit_run.das Bumps LLVM_JIT_CODEGEN_VERSION to invalidate cached JIT artifacts.
include/daScript/simulate/aot_builtin_jit.h Declares new das_get_jit_table_lock/unlock entrypoints for AOT/JIT integration.
benchmarks/sql/results.md Updates benchmark commentary and refreshed numbers reflecting the improved JIT table-fold lane.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +649 to +650
void *das_get_jit_table_lock() { return (void *)&builtin_table_lock; }
void *das_get_jit_table_unlock() { return (void *)&builtin_table_unlock; }
@borisbat borisbat merged commit f6695af into master Jun 11, 2026
33 checks passed
pull Bot pushed a commit to forksnd/daScript that referenced this pull request Jun 12, 2026
each_kv/keys/values |> group_by chains declined at can_group_by_source
(TableAdapter inherited the base false) and cascaded to tier-2: materialize
a kv array, re-enter the array lane. The group_by splice pattern is fully
adapter-generic, so enabling tables is two overrides mirroring DecsAdapter:

- can_group_by() -> true
- build_group_by_adapter() -> fresh TableAdapter (clone tabExpr, fresh
  srcName, same elemType/lane). The upstream_join arm returns null:
  join |> group_by over a table lead stays on tier-2 (named deferred edge
  in LINQ_TO_TABLE.md; the stage-5 terminator-path join is untouched).

plan_group_by_core hands the bucket-fill body to wrap_source_loop, so the
kv usage-pruner sees the whole accumulation (key expr + reducer updates +
upstream where/select segments) — a group key over kv.value.brand walks
values(tab) alone. The non-copyable-values gate and the -const elemType
scrub are inherited from adapter construction.

m7 groupby_* (ns/op): INTERP 144-201 -> 30-50 (count 163->31, ~5x);
JIT 44-73 -> 8.4-11 (count 43.5->8.4, rides GaijinEntertainment#3100's inline slot walk).
join_groupby_* unchanged (cascade). INTERP matrix flat elsewhere.

Tests: 8 new group_by sub-tests in test_linq_table_source.das (67/67
INTERP+JIT; AOT-compiled in test_aot) — kv/keys/values lanes, upstream
segments, having + trailing where + order, count terminator, empty table,
fused-vs-tier-2 agreement.

Gates: full INTERP 11001/0, full AOT 10313/0, JIT linq 1971 + jit_tests
295, CI lint 0, Sphinx latex+html -W clean + pdflatex pass 1, formatter
clean. results.md re-swept; linq_fold_patterns.rst table row + new
group_by pattern row; LINQ_TO_TABLE.md stage 7 status + findings.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants