L1 Compiler Deep Dive

The L1 compiler is a fast, non-optimizing JIT that translates bytecode to native x86 code using a delayed-emission strategy within basic blocks.

Overview

JNode's L1 compiler (org.jnode.vm.x86.compiler.l1a.X86Level1ACompiler) is the default entry-point compiler. It prioritizes compilation speed over code quality by:

Emulating the JVM operand stack with a VirtualStack that holds values in registers as long as possible
Directly translating bytecode operations without an IR
Using simple greedy register allocation
Applying lightweight optimizations via OptimizingBytecodeVisitor (inlining, store-load elimination, constant folding)

Key Components

Class	Role
`X86Level1ACompiler`	Entry point; creates `X86BytecodeVisitor` wrapped in `OptimizingBytecodeVisitor`
`X86BytecodeVisitor`	Main translator; handles all bytecode opcodes
`VirtualStack`	Delayed-emission stack holding `Item` instances
`Item`	Abstract base for stack values (constant, GPR, XMM, FPUSTACK, STACK, LOCAL)
`WordItem`	Single-register items (INT, FLOAT, REFERENCE)
`DoubleWordItem`	Two-register items (LONG, DOUBLE) in 32-bit; single-register in 64-bit
`X86RegisterPool`	GPR and XMM register pools with priority-based allocation
`OptimizingBytecodeVisitor`	Lightweight optimizations: inlining, store-load, constant folding

VirtualStack: Delayed Emission

The VirtualStack (VirtualStack.java:35) is the core abstraction. Rather than immediately emitting instructions for every bytecode, values are held as Item objects on a virtual stack:

// Values stay in registers until absolutely necessary
Item pop() { tos--; return stack[tos]; }
void push(Item item) { stack[tos++] = item; }

This allows:

Keeping values in GPR registers across multiple operations
Avoiding redundant loads/stores
The compiler to optimize operand ordering

Item State Machine

Each value transitions through states tracked by Item.Kind:

CONSTANT ──load()──> GPR ──push()──> STACK
                      ↑
                      └──load()───┘

Key transition methods:

load(ec): Bring value into a GPR (requests from pool, spills if necessary)
push(ec): Emit code to push value onto the actual stack (releases GPR)
spill(ec, reg): Spill a GPR to make room (transfers to STACK kind)
release(ec): Free resources without emission

Register Usage

GPR Pools

32-bit (GPRs32):

Priority (high→low): EBX < ESI < ECX < EDX < EAX

Reserved: EDI (statics pointer)

64-bit (GPRs64):

Priority (high→low): R15 < R14 < R13 < R12 < R10 < R9 < R8 < RSI < RBX < RCX < RDX < RAX

Reserved: RDI (statics), R12 (VmProcessor)

XMM Pools

32-bit (XMMs32): XMM0–XMM7
64-bit (XMMs64): XMM0–XMM15
Both support FLOAT and DOUBLE.

Special Register Contracts

Register	Usage
EAX/RAX	Returns (int/ref), implicit for `add`/`sub` operands
ECX	Shift amount for `ishl`, `lshr`
EDX:EAXX	32-bit long return value
EDI	Points to statics (reserved)
R12	Points to current VmProcessor (reserved)

Long/Double Handling

32-bit mode: LONG and DOUBLE occupy two registers (LSB + MSB pair). DoubleWordItem tracks both:

final X86Register.GPR32 getLsbRegister(EmitterContext ec) { return lsb; }
final X86Register.GPR32 getMsbRegister(EmitterContext ec) { return msb; }

64-bit mode: LONG and DOUBLE use a single GPR64 (e.g., RAX):

final X86Register.GPR64 getRegister(EmitterContext ec) { return reg; }

Long return values: 32-bit uses EAX:EDX, 64-bit uses RAX.

Optimization Passes

The L1 compiler has no IR and no graph-coloring allocator. Optimizations are limited:

Register exploitation: Delayed emission keeps values in registers
Constant folding: ioperation() folds IntItem constants at compile time
Operand ordering: prepareForOperation() loads operands to favor register-first ordering
Store-load elimination: OptimizingBytecodeVisitor emits dup when load immediately follows store to same local
Method inlining: OptimizingBytecodeVisitor inlines small, final/private/static methods (max 32 bytecode instructions, no exception handlers)

Inlining conditions (canInline()):

!method.isNative() && !method.isAbstract() && !method.isSynchronized()
&& (method.isFinal() || method.isPrivate() || method.isStatic())
&& !declClass.isMagicType() && declClass.isAlwaysInitialized()
&& bc.getNoExceptionHandlers() == 0
&& (method.hasInlinePragma() || (inlineDepth < MAX_INLINE_DEPTH && bc.getLength() <= SIZE_LIMIT))

Gotchas

Basic block boundaries: The vstack is reset at each basic block start. All pending items are flushed to the real stack. This limits register utilization across branches.
Aliasing restriction: Modifying a value that is still on the vstack is forbidden. When storing to a local, loadLocal() pins any aliased stack items into registers.
ECX for shifts: Non-constant shift amounts must be in ECX. ishift() explicitly requests ECX.
EAX/RAX for returns: Return values must be in specific registers. wreturn() and dwreturn() explicitly move values if needed.
FPU stack discipline: FPU operations use a separate 8-slot FPU stack (FPUStack). Items must be on top before fxch or fstp. Incorrect ordering causes undefined behavior.
64-bit constant limitations: 64-bit instructions still take 32-bit immediates. Loading large constants requires MOV to register first.
Operand stack validation: checkOperandStack mode verifies items are popped in LIFO order, catching stack corruption early.

L1 Compiler Deep Dive

L1 Compiler Deep Dive

Overview

Key Components

VirtualStack: Delayed Emission

Item State Machine

Register Usage

GPR Pools

XMM Pools

Special Register Contracts

Long/Double Handling

Optimization Passes

Gotchas

Related Pages

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!