Skip to content

L1 Compiler Deep Dive

opencode-agent[bot] edited this page May 9, 2026 · 1 revision

L1 Compiler Deep Dive

The L1 compiler is a fast, non-optimizing JIT that translates bytecode to native x86 code using a delayed-emission strategy within basic blocks.

Overview

JNode's L1 compiler (org.jnode.vm.x86.compiler.l1a.X86Level1ACompiler) is the default entry-point compiler. It prioritizes compilation speed over code quality by:

  • Emulating the JVM operand stack with a VirtualStack that holds values in registers as long as possible
  • Directly translating bytecode operations without an IR
  • Using simple greedy register allocation
  • Applying lightweight optimizations via OptimizingBytecodeVisitor (inlining, store-load elimination, constant folding)

Key Components

Class Role
X86Level1ACompiler Entry point; creates X86BytecodeVisitor wrapped in OptimizingBytecodeVisitor
X86BytecodeVisitor Main translator; handles all bytecode opcodes
VirtualStack Delayed-emission stack holding Item instances
Item Abstract base for stack values (constant, GPR, XMM, FPUSTACK, STACK, LOCAL)
WordItem Single-register items (INT, FLOAT, REFERENCE)
DoubleWordItem Two-register items (LONG, DOUBLE) in 32-bit; single-register in 64-bit
X86RegisterPool GPR and XMM register pools with priority-based allocation
OptimizingBytecodeVisitor Lightweight optimizations: inlining, store-load, constant folding

VirtualStack: Delayed Emission

The VirtualStack (VirtualStack.java:35) is the core abstraction. Rather than immediately emitting instructions for every bytecode, values are held as Item objects on a virtual stack:

// Values stay in registers until absolutely necessary
Item pop() { tos--; return stack[tos]; }
void push(Item item) { stack[tos++] = item; }

This allows:

  • Keeping values in GPR registers across multiple operations
  • Avoiding redundant loads/stores
  • The compiler to optimize operand ordering

Item State Machine

Each value transitions through states tracked by Item.Kind:

CONSTANT ──load()──> GPR ──push()──> STACK
                      ↑
                      └──load()───┘

Key transition methods:

  • load(ec): Bring value into a GPR (requests from pool, spills if necessary)
  • push(ec): Emit code to push value onto the actual stack (releases GPR)
  • spill(ec, reg): Spill a GPR to make room (transfers to STACK kind)
  • release(ec): Free resources without emission

Register Usage

GPR Pools

32-bit (GPRs32):

Priority (high→low): EBX < ESI < ECX < EDX < EAX

Reserved: EDI (statics pointer)

64-bit (GPRs64):

Priority (high→low): R15 < R14 < R13 < R12 < R10 < R9 < R8 < RSI < RBX < RCX < RDX < RAX

Reserved: RDI (statics), R12 (VmProcessor)

XMM Pools

32-bit (XMMs32): XMM0–XMM7
64-bit (XMMs64): XMM0–XMM15
Both support FLOAT and DOUBLE.

Special Register Contracts

Register Usage
EAX/RAX Returns (int/ref), implicit for add/sub operands
ECX Shift amount for ishl, lshr
EDX:EAXX 32-bit long return value
EDI Points to statics (reserved)
R12 Points to current VmProcessor (reserved)

Long/Double Handling

32-bit mode: LONG and DOUBLE occupy two registers (LSB + MSB pair). DoubleWordItem tracks both:

final X86Register.GPR32 getLsbRegister(EmitterContext ec) { return lsb; }
final X86Register.GPR32 getMsbRegister(EmitterContext ec) { return msb; }

64-bit mode: LONG and DOUBLE use a single GPR64 (e.g., RAX):

final X86Register.GPR64 getRegister(EmitterContext ec) { return reg; }

Long return values: 32-bit uses EAX:EDX, 64-bit uses RAX.

Optimization Passes

The L1 compiler has no IR and no graph-coloring allocator. Optimizations are limited:

  1. Register exploitation: Delayed emission keeps values in registers
  2. Constant folding: ioperation() folds IntItem constants at compile time
  3. Operand ordering: prepareForOperation() loads operands to favor register-first ordering
  4. Store-load elimination: OptimizingBytecodeVisitor emits dup when load immediately follows store to same local
  5. Method inlining: OptimizingBytecodeVisitor inlines small, final/private/static methods (max 32 bytecode instructions, no exception handlers)

Inlining conditions (canInline()):

!method.isNative() && !method.isAbstract() && !method.isSynchronized()
&& (method.isFinal() || method.isPrivate() || method.isStatic())
&& !declClass.isMagicType() && declClass.isAlwaysInitialized()
&& bc.getNoExceptionHandlers() == 0
&& (method.hasInlinePragma() || (inlineDepth < MAX_INLINE_DEPTH && bc.getLength() <= SIZE_LIMIT))

Gotchas

  1. Basic block boundaries: The vstack is reset at each basic block start. All pending items are flushed to the real stack. This limits register utilization across branches.

  2. Aliasing restriction: Modifying a value that is still on the vstack is forbidden. When storing to a local, loadLocal() pins any aliased stack items into registers.

  3. ECX for shifts: Non-constant shift amounts must be in ECX. ishift() explicitly requests ECX.

  4. EAX/RAX for returns: Return values must be in specific registers. wreturn() and dwreturn() explicitly move values if needed.

  5. FPU stack discipline: FPU operations use a separate 8-slot FPU stack (FPUStack). Items must be on top before fxch or fstp. Incorrect ordering causes undefined behavior.

  6. 64-bit constant limitations: 64-bit instructions still take 32-bit immediates. Loading large constants requires MOV to register first.

  7. Operand stack validation: checkOperandStack mode verifies items are popped in LIFO order, catching stack corruption early.

Related Pages

Clone this wiki locally