perf: fully tail-recursive CPS executor with line tracking off heap#156
Merged
davydog187 merged 2 commits intomainfrom Feb 27, 2026
Merged
perf: fully tail-recursive CPS executor with line tracking off heap#156davydog187 merged 2 commits intomainfrom
davydog187 merged 2 commits intomainfrom
Conversation
Target A: Remove current_line/current_source from %State{} and thread
line as the 8th parameter to do_execute. A :source_line instruction now
updates a local variable only — no State struct allocation on the heap.
Target B: Convert do_execute to a fully tail-recursive CPS loop with
two new parameters: cont (continuation stack) and frames (call frame
stack). Key changes:
- :call for Lua closures pushes a frame onto `frames` and tail-calls
the callee — Erlang stack depth is now O(1) regardless of Lua
recursion depth.
- :test/:test_and/:test_or push `rest` onto `cont` instead of
recursing. Eliminates non-tail calls for every if/else branch, and
removes the O(N) list concat (++) in test_and/test_or.
- All loop instructions (while/repeat/numeric_for/generic_for) use
synthetic CPS continuation entries so break and return work correctly
at any nesting depth without Erlang stack growth.
- :break scans `cont` for a {:loop_exit, _} marker instead of
returning a {:break, regs, state} sentinel tuple.
- :return/:return_vararg delegate to new do_frame_return/6 which
restores caller context from a frame entry.
- Native function calls handled inline via continue_after_call/11.
Expected outcome: memory/iter ~2-3x lower (eliminating ~3-4 State
allocations per :source_line and all intermediate register tuples held
by Erlang frames), Erlang stack depth O(1) instead of O(call depth).
All 1,273 tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
Author
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Target A: Removes
current_lineandcurrent_sourcefrom%State{}. These are now threaded as a plain integerlineparameter throughdo_execute/8. A:source_lineinstruction no longer allocates a new State struct — it updates a local variable only.Target B: Converts
do_executeto a fully tail-recursive CPS loop with two new parameters —cont(continuation stack) andframes(call frame stack)::callfor Lua closures pushes a frame ontoframesand tail-calls the callee. Erlang stack depth is O(1) regardless of Lua recursion depth.:test/:test_and/:test_orpushrestontocontinstead of recursing into the body. Also eliminates the O(N)++list concat intest_and/test_or.{:cps_while_test, ...},{:cps_while_body, ...}, etc.) sobreakandreturnwork correctly at any nesting depth.:breakscanscontfor a{:loop_exit, _}marker — no more{:break, regs, state}sentinel tuple.do_frame_return/6restores caller context (registers, upvalues, proto, cont) from a saved frame on function return.Test plan
breakinsideifinsidewhileexits the correct loop (via{:loop_exit, _}marker incont)return f()tail-call position (result_count == -1) chains throughdo_frame_returnmix run benchmarks/fibonacci.exsto confirm memory reduction vs baseline (8.07 GB → expected < 2.5 GB)🤖 Generated with Claude Code