-
Notifications
You must be signed in to change notification settings - Fork 0
L1 Compiler Deep Dive
The L1 compiler is a fast, non-optimizing JIT that translates bytecode to native x86 code using a delayed-emission strategy within basic blocks.
JNode's L1 compiler (org.jnode.vm.x86.compiler.l1a.X86Level1ACompiler) is the default entry-point compiler. It prioritizes compilation speed over code quality by:
- Emulating the JVM operand stack with a
VirtualStackthat holds values in registers as long as possible - Directly translating bytecode operations without an IR
- Using simple greedy register allocation
- Applying lightweight optimizations via
OptimizingBytecodeVisitor(inlining, store-load elimination, constant folding)
| Class | Role |
|---|---|
X86Level1ACompiler |
Entry point; creates X86BytecodeVisitor wrapped in OptimizingBytecodeVisitor
|
X86BytecodeVisitor |
Main translator; handles all bytecode opcodes |
VirtualStack |
Delayed-emission stack holding Item instances |
Item |
Abstract base for stack values (constant, GPR, XMM, FPUSTACK, STACK, LOCAL) |
WordItem |
Single-register items (INT, FLOAT, REFERENCE) |
DoubleWordItem |
Two-register items (LONG, DOUBLE) in 32-bit; single-register in 64-bit |
X86RegisterPool |
GPR and XMM register pools with priority-based allocation |
OptimizingBytecodeVisitor |
Lightweight optimizations: inlining, store-load, constant folding |
The VirtualStack (VirtualStack.java:35) is the core abstraction. Rather than immediately emitting instructions for every bytecode, values are held as Item objects on a virtual stack:
// Values stay in registers until absolutely necessary
Item pop() { tos--; return stack[tos]; }
void push(Item item) { stack[tos++] = item; }This allows:
- Keeping values in GPR registers across multiple operations
- Avoiding redundant loads/stores
- The compiler to optimize operand ordering
Each value transitions through states tracked by Item.Kind:
CONSTANT ──load()──> GPR ──push()──> STACK
↑
└──load()───┘
Key transition methods:
-
load(ec): Bring value into a GPR (requests from pool, spills if necessary) -
push(ec): Emit code to push value onto the actual stack (releases GPR) -
spill(ec, reg): Spill a GPR to make room (transfers to STACK kind) -
release(ec): Free resources without emission
32-bit (GPRs32):
Priority (high→low): EBX < ESI < ECX < EDX < EAX
Reserved: EDI (statics pointer)
64-bit (GPRs64):
Priority (high→low): R15 < R14 < R13 < R12 < R10 < R9 < R8 < RSI < RBX < RCX < RDX < RAX
Reserved: RDI (statics), R12 (VmProcessor)
32-bit (XMMs32): XMM0–XMM7
64-bit (XMMs64): XMM0–XMM15
Both support FLOAT and DOUBLE.
| Register | Usage |
|---|---|
| EAX/RAX | Returns (int/ref), implicit for add/sub operands |
| ECX | Shift amount for ishl, lshr
|
| EDX:EAXX | 32-bit long return value |
| EDI | Points to statics (reserved) |
| R12 | Points to current VmProcessor (reserved) |
32-bit mode: LONG and DOUBLE occupy two registers (LSB + MSB pair). DoubleWordItem tracks both:
final X86Register.GPR32 getLsbRegister(EmitterContext ec) { return lsb; }
final X86Register.GPR32 getMsbRegister(EmitterContext ec) { return msb; }64-bit mode: LONG and DOUBLE use a single GPR64 (e.g., RAX):
final X86Register.GPR64 getRegister(EmitterContext ec) { return reg; }Long return values: 32-bit uses EAX:EDX, 64-bit uses RAX.
The L1 compiler has no IR and no graph-coloring allocator. Optimizations are limited:
- Register exploitation: Delayed emission keeps values in registers
-
Constant folding:
ioperation()foldsIntItemconstants at compile time -
Operand ordering:
prepareForOperation()loads operands to favor register-first ordering -
Store-load elimination:
OptimizingBytecodeVisitoremitsdupwhen load immediately follows store to same local -
Method inlining:
OptimizingBytecodeVisitorinlines small, final/private/static methods (max 32 bytecode instructions, no exception handlers)
Inlining conditions (canInline()):
!method.isNative() && !method.isAbstract() && !method.isSynchronized()
&& (method.isFinal() || method.isPrivate() || method.isStatic())
&& !declClass.isMagicType() && declClass.isAlwaysInitialized()
&& bc.getNoExceptionHandlers() == 0
&& (method.hasInlinePragma() || (inlineDepth < MAX_INLINE_DEPTH && bc.getLength() <= SIZE_LIMIT))-
Basic block boundaries: The vstack is reset at each basic block start. All pending items are flushed to the real stack. This limits register utilization across branches.
-
Aliasing restriction: Modifying a value that is still on the vstack is forbidden. When storing to a local,
loadLocal()pins any aliased stack items into registers. -
ECX for shifts: Non-constant shift amounts must be in ECX.
ishift()explicitly requests ECX. -
EAX/RAX for returns: Return values must be in specific registers.
wreturn()anddwreturn()explicitly move values if needed. -
FPU stack discipline: FPU operations use a separate 8-slot FPU stack (
FPUStack). Items must be on top beforefxchorfstp. Incorrect ordering causes undefined behavior. -
64-bit constant limitations: 64-bit instructions still take 32-bit immediates. Loading large constants requires MOV to register first.
-
Operand stack validation:
checkOperandStackmode verifies items are popped in LIFO order, catching stack corruption early.
- JIT-Compilers — High-level compiler pipeline overview
- Stack-Frame-Layout — x86 stack frame structure
- VM-Magic — Magic annotations that interact with JIT