jit: inline table slot walk for keys/values for-loop sources#3100
Merged
Conversation
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A for-loop over keys(tab)/values(tab) (incl. the fused linq kv zips) compiled to a heap-allocated C++ TableIterator + first/next call per element per lane. Workhorse-keyed tables are open-addressed at every capacity (#3025), so the walk is a flat ctrl-byte scan — now emitted inline: lock once, scan ctrl[slot] > CTRL_TOMBSTONE, keys copy the slot key out (past-end guarded, like the C++ iterator), values bind a pointer into the data block, close re-checks the data base (modified-during-iteration on shared/hopeless tables that bypass the lock) and unlocks. String / non-workhorse keys keep the generic iterator (different liveness regimes). Detection: the daslib generics instantiate into the compiling module as builtin`keys`<hash> — matched by that compiler-generated prefix (the plain-name + module-$ check never fired; instances don't keep either). The skipped source call never allocates an iterator, mirroring count(). Glue: jit_table_lock/unlock (module_jit.cpp wrapping builtin_table_lock/ unlock; engine mapping + DAS_API symbol for the exe/dll paths). LLVM_JIT_CODEGEN_VERSION 0x25 -> 0x26. m7 JIT spot numbers (ns/elem): count/sum/max_aggregate 13.4 -> 7.3, chained_where 17.8 -> 10.4, join_count 33 -> 25.2, join_probe 24 -> 16.6, groupby_count ~160 -> 44.1, reverse_take ~70 -> 19.3, point_lookup_scan 6.0 -> 3.0, last_match -> 12.0. Full sweep + results.md refresh after the table-arc PR (#3099) merges and this branch rebases onto it. Gates: JIT tests/linq 1962/1962, tests/language 1054/1054, jit_tests + decs + json green, exe-build smoke links (the #3025 dll-glob lesson), new tests/jit_tests/table_walk.das 8/8 INTERP+JIT (incl. is_jit_function firing checks, tombstones, by-ref values, break-unlock, locked-iteration panic, string-key fallback). CI lint clean. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
results.md regenerated (2026-06-11, INTERP+JIT matrices). INTERP flat within noise; JIT m7 halves on walk-dominated families (chained_where 17.8->10.4, count_aggregate 13.5->7.3, join_probe 24.2->16.7, last_match 22.8->12.1, point_lookup_scan 6.0->3.0, select_where 28.2->17.9 ns/op). groupby stays ~44 (tier-2 group cascade dominates, source walk was never the bottleneck there). One prose line added to the m7 bullet: the JIT column is now fused codegen end to end. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR optimizes the LLVM JIT lowering of for loops whose sources are keys(tab) / values(tab) for workhorse-keyed tables by eliminating the heap-allocated TableIterator and replacing it with an inlined open-addressed slot walk (ctrl-byte scan) guarded by table lock/unlock semantics. This targets a major JIT performance gap where fused table-fold lanes were previously bottlenecked by per-element C++ iterator calls.
Changes:
- Add an inline JIT slot-walk implementation for
keys(tab)/values(tab)iterator sources (workhorse keys only), including lock/unlock and modified-during-iteration detection at close. - Expose JIT runtime entrypoints for table lock/unlock and bump
LLVM_JIT_CODEGEN_VERSIONto invalidate cached JIT DLLs. - Add a dedicated JIT test suite for table-walk behavior and update benchmark result notes.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
tests/jit_tests/table_walk.das |
Adds JIT+INTERP tests covering the new inline slot-walk behavior (keys/values, tombstones, break/unlock, mutation panic, fallbacks). |
src/builtin/module_jit.cpp |
Exposes JIT-accessible table lock/unlock function pointers for LLVM JIT codegen. |
modules/dasLLVM/daslib/llvm_jit.das |
Implements the inline table ctrl-byte scan iterator path for keys/values loop sources and integrates it into for-loop lowering/close. |
modules/dasLLVM/daslib/llvm_jit_common.das |
Registers/mappings for new jit_table_lock / jit_table_unlock helper functions in the JIT module. |
modules/dasLLVM/daslib/llvm_jit_run.das |
Bumps LLVM_JIT_CODEGEN_VERSION to invalidate cached JIT artifacts. |
include/daScript/simulate/aot_builtin_jit.h |
Declares new das_get_jit_table_lock/unlock entrypoints for AOT/JIT integration. |
benchmarks/sql/results.md |
Updates benchmark commentary and refreshed numbers reflecting the improved JIT table-fold lane. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+649
to
+650
| void *das_get_jit_table_lock() { return (void *)&builtin_table_lock; } | ||
| void *das_get_jit_table_unlock() { return (void *)&builtin_table_unlock; } |
pull Bot
pushed a commit
to forksnd/daScript
that referenced
this pull request
Jun 12, 2026
each_kv/keys/values |> group_by chains declined at can_group_by_source (TableAdapter inherited the base false) and cascaded to tier-2: materialize a kv array, re-enter the array lane. The group_by splice pattern is fully adapter-generic, so enabling tables is two overrides mirroring DecsAdapter: - can_group_by() -> true - build_group_by_adapter() -> fresh TableAdapter (clone tabExpr, fresh srcName, same elemType/lane). The upstream_join arm returns null: join |> group_by over a table lead stays on tier-2 (named deferred edge in LINQ_TO_TABLE.md; the stage-5 terminator-path join is untouched). plan_group_by_core hands the bucket-fill body to wrap_source_loop, so the kv usage-pruner sees the whole accumulation (key expr + reducer updates + upstream where/select segments) — a group key over kv.value.brand walks values(tab) alone. The non-copyable-values gate and the -const elemType scrub are inherited from adapter construction. m7 groupby_* (ns/op): INTERP 144-201 -> 30-50 (count 163->31, ~5x); JIT 44-73 -> 8.4-11 (count 43.5->8.4, rides GaijinEntertainment#3100's inline slot walk). join_groupby_* unchanged (cascade). INTERP matrix flat elsewhere. Tests: 8 new group_by sub-tests in test_linq_table_source.das (67/67 INTERP+JIT; AOT-compiled in test_aot) — kv/keys/values lanes, upstream segments, having + trailing where + order, count terminator, empty table, fused-vs-tier-2 agreement. Gates: full INTERP 11001/0, full AOT 10313/0, JIT linq 1971 + jit_tests 295, CI lint 0, Sphinx latex+html -W clean + pdflatex pass 1, formatter clean. results.md re-swept; linq_fold_patterns.rst table row + new group_by pattern row; LINQ_TO_TABLE.md stage 7 status + findings. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
For-loops over
keys(tab)/values(tab)(including every fused linq kv chain emitted by_fold'sTableAdapter) previously compiled, under JIT, to a heap-allocated C++TableIteratorplus afirst/nextcall per element per lane — the m7 JIT column sat at roughly INTERP level. This PR compiles them to an inline slot walk instead:jit_table_lockonce atfirst(), header loaded once (CAPACITY / SIZE / HASHES / KEYS-or-DATA),ctrl[slot] > CTRL_TOMBSTONE— workhorse keys are open-addressed at every capacity since table: non-string keys open-addressed; full inline JIT find/at for workhorse + string keys #3025, so the walk is a flat 1-byte control scan with no regime branch,close()re-checks the data base against its origin (catches modified-during-iteration on shared/hopeless tables that bypass the lock —"table was modified during iteration"), thenjit_table_unlock.The source call is suppressed via
skipCall(same mechanism ascount()), so no iterator is ever allocated. String / non-workhorse keys keep the generic C++ iterator — they have per-slot liveness regimes (tableLiveSlot) the flat scan can't honor.Detection note: the daslib
keys/valuesgenerics instantiate into the compiling module under compiler-generated names (builtin+ backtick-mangled), so the match is on the name prefix, notname == "keys" && module == "$"— the latter never fires for generic instances.LLVM_JIT_CODEGEN_VERSIONbumped 0x25 → 0x26 to invalidate cached dlls.Numbers
benchmarks/sql m7 (table) JIT column, ns/op, before → after (results.md re-swept in this PR):
count_aggregate/sum_aggregatechained_whereselect_wherelast_matchjoin_probereverse_takepoint_lookup_scangroupby_*stays ~44 — its cost is the tier-2 group cascade, not source iteration. INTERP matrix flat within noise (interpreter path untouched).Tests
New
tests/jit_tests/table_walk.das(8/8 INTERP + JIT, withis_jit_functionfiring asserts): kv zip sums, keys-only / values-only lanes, by-ref value mutation, tombstones (erase mid-table), break → unlock → re-mutate, string-key fallback, int64 keys, insert-during-iteration panic.Gates
-exelink of utils/lint) — ok🤖 Generated with Claude Code