perf: fully tail-recursive CPS executor with line tracking off heap by davydog187 · Pull Request #156 · tv-labs/lua

davydog187 · 2026-02-27T17:52:58Z

Summary

Target A: Removes current_line and current_source from %State{}. These are now threaded as a plain integer line parameter through do_execute/8. A :source_line instruction no longer allocates a new State struct — it updates a local variable only.
Target B: Converts do_execute to a fully tail-recursive CPS loop with two new parameters — cont (continuation stack) and frames (call frame stack):
- :call for Lua closures pushes a frame onto frames and tail-calls the callee. Erlang stack depth is O(1) regardless of Lua recursion depth.
- :test / :test_and / :test_or push rest onto cont instead of recursing into the body. Also eliminates the O(N) ++ list concat in test_and/test_or.
- All loop instructions use synthetic CPS continuation entries ({:cps_while_test, ...}, {:cps_while_body, ...}, etc.) so break and return work correctly at any nesting depth.
- :break scans cont for a {:loop_exit, _} marker — no more {:break, regs, state} sentinel tuple.
- New do_frame_return/6 restores caller context (registers, upvalues, proto, cont) from a saved frame on function return.

Test plan

All 1,273 existing tests pass with 0 failures
break inside if inside while exits the correct loop (via {:loop_exit, _} marker in cont)
Recursive functions (factorial, fibonacci) return correct results
return f() tail-call position (result_count == -1) chains through do_frame_return
Multi-return, vararg, closures, pcall all covered by existing test suite
Run mix run benchmarks/fibonacci.exs to confirm memory reduction vs baseline (8.07 GB → expected < 2.5 GB)

🤖 Generated with Claude Code

Target A: Remove current_line/current_source from %State{} and thread line as the 8th parameter to do_execute. A :source_line instruction now updates a local variable only — no State struct allocation on the heap. Target B: Convert do_execute to a fully tail-recursive CPS loop with two new parameters: cont (continuation stack) and frames (call frame stack). Key changes: - :call for Lua closures pushes a frame onto `frames` and tail-calls the callee — Erlang stack depth is now O(1) regardless of Lua recursion depth. - :test/:test_and/:test_or push `rest` onto `cont` instead of recursing. Eliminates non-tail calls for every if/else branch, and removes the O(N) list concat (++) in test_and/test_or. - All loop instructions (while/repeat/numeric_for/generic_for) use synthetic CPS continuation entries so break and return work correctly at any nesting depth without Erlang stack growth. - :break scans `cont` for a {:loop_exit, _} marker instead of returning a {:break, regs, state} sentinel tuple. - :return/:return_vararg delegate to new do_frame_return/6 which restores caller context from a frame entry. - Native function calls handled inline via continue_after_call/11. Expected outcome: memory/iter ~2-3x lower (eliminating ~3-4 State allocations per :source_line and all intermediate register tuples held by Erlang frames), Erlang stack depth O(1) instead of O(call depth). All 1,273 tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

davydog187 · 2026-02-27T17:54:06Z

 MIX_ENV=benchmark mix run benchmarks/fibonacci.exs                       
  Compiling 3 files (.ex)                                                  
  Compiling 1 file (.ex)                                                   
  Generated lua app                                                    
  Operating System: macOS                                              
  CPU Information: Apple M1 Pro                                        
  Number of Available Cores: 8                                         
  Available memory: 16 GB                                              
  Elixir 1.19.4                                                        
  Erlang 27.3.4.6                                                      
  JIT enabled: true                                                    
                                                                       
  Benchmark suite executing with the following configuration:          
  warmup: 2 s                                                          
  time: 10 s                                                           
  memory time: 1 s                                                     
  reduction time: 0 ns                                                 
  parallel: 1                                                          
  inputs: none specified                                               
  Estimated total run time: 52 s                                       
  Excluding outliers: false                                            
                                                                       
  Benchmarking C Lua (luaport) ...                                     
  Benchmarking lua (chunk) ...                                         
  Benchmarking lua (eval) ...                                          
  Benchmarking luerl ...                                               
  Calculating statistics...                                            
  Formatting results...                                                
                                                                       
  Name                      ips        average  deviation              
  median         99th %                                                
  C Lua (luaport)        147.30      0.00679 s    ±49.92%      0.00651 
   s       0.0107 s                                                    
  luerl                    0.86         1.16 s    ±12.05%         1.07 
   s         1.40 s                                                    
  lua (eval)               0.68         1.47 s    ±10.79%         1.40 
   s         1.81 s                                                    
  lua (chunk)              0.65         1.54 s    ±23.37%         1.36 
   s         2.35 s                                                    
                                                                       
  Comparison:                                                          
  C Lua (luaport)        147.30                                        
  luerl                    0.86 - 170.61x slower +1.15 s               
  lua (eval)               0.68 - 216.51x slower +1.46 s               
  lua (chunk)              0.65 - 226.91x slower +1.53 s               
                                                                       
  Memory usage statistics:                                             
                                                                       
  Name               Memory usage                                      
  C Lua (luaport)      0.00000 GB                                      
  luerl                   2.45 GB - 15689106.52x memory usage +2.45 GB 
  lua (eval)              7.78 GB - 49713723.62x memory usage +7.78 GB 
  lua (chunk)             7.78 GB - 49712813.33x memory usage +7.78 GB 
                                                                       
  **All measurements for memory usage were the same**

fix

67f9c6f

davydog187 merged commit 13b2964 into main Feb 27, 2026
2 checks passed

davydog187 deleted the perf/cps-executor branch February 27, 2026 18:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: fully tail-recursive CPS executor with line tracking off heap#156

perf: fully tail-recursive CPS executor with line tracking off heap#156
davydog187 merged 2 commits intomainfrom
perf/cps-executor

davydog187 commented Feb 27, 2026 •

edited

Loading

Uh oh!

davydog187 commented Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

davydog187 commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

davydog187 commented Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

davydog187 commented Feb 27, 2026 •

edited

Loading