-
Notifications
You must be signed in to change notification settings - Fork 0
L2 Compiler Deep Dive
The L2 compiler is JNode's SSA-based, optimizing JIT compiler for x86. It translates bytecode to a quad-based IR, applies optimization passes, performs linear-scan register allocation, and emits native x86 code.
The L2 compiler (org.jnode.vm.x86.compiler.l2) is the tier-2, optimizing compiler in JNode's JIT pipeline. While the L1 compiler translates bytecodes directly to native code with a virtual stack, the L2 compiler builds a full Intermediate Representation (IR), converts it to Static Single Assignment (SSA) form, runs multiple optimization passes, allocates registers via linear scan, and finally emits optimized x86 machine code. It is used for hot, performance-critical methods.
The pipeline lives entirely under core/src/core/org/jnode/vm/:
| Package | Role |
|---|---|
org.jnode.vm.compiler.ir |
Architecture-independent IR (quads, CFG, SSA) |
org.jnode.vm.x86.compiler.l2 |
x86-specific code generation and register pool |
org.jnode.assembler.x86 |
Low-level x86 byte encoder |
IRControlFlowGraph<T> (core/src/core/org/jnode/vm/compiler/ir/IRControlFlowGraph.java) is the top-level IR container. It holds an array of IRBasicBlock objects representing the method's control flow graph. It manages SSA construction/destruction, dominance computation, liveness analysis, and the optimization loop.
IRBasicBlock<T> represents a single basic block — a straight-line sequence of quads with no internal branches. It maintains:
- List of
Quadinstructions - Predecessor and successor block references
- Immediate dominator (
idominator) - Dominance frontier and set of dominated blocks
Quads are JNode's three-address IR instructions. Each quad carries an address (its quad index within the block), a bytecode address, and a flag for dead-code elimination.
| Quad Class | Purpose |
|---|---|
BinaryQuad<T> |
Binary ops: IADD, ISUB, IMUL, IDIV, FADD, etc. |
UnaryQuad<T> |
Unary ops: INEG, I2L, F2I, etc. |
ConstantRefAssignQuad<T> |
Load a constant literal |
VariableRefAssignQuad<T> |
Copy between variables |
PhiAssignQuad<T> |
Phi function (SSA merge) |
ConditionalBranchQuad<T> |
If-goto |
UnconditionalBranchQuad<T> |
Goto |
TableswitchQuad<T> |
TABLESWITCH bytecode |
LookupswitchQuad<T> |
LOOKUPSWITCH bytecode |
CallQuad, StaticCallQuad, VirtualCallQuad
|
Method calls |
ReturnQuad |
Method return |
The SSA form is constructed by IRControlFlowGraph.constructSSA(), which calls:
-
computeDominance()— Computes dominators, immediate dominators, dominance frontiers, and dominated-blocks sets for all basic blocks. -
placePhiFunctions()— Iterates variables defined in blocks with multiple predecessors; insertsPhiAssignQuadat the dominance frontier of those blocks. -
renameVariables(startBlock)— Depth-first traversal that renames each variable use to the current top of the SSA renaming stack, pushing a new SSA version on each assignment. Each variable thus gets a unique SSA version at each program point.
deconstrucSSA() removes phi functions and replaces each PhiAssignQuad with a sequence of VariableRefAssignQuad moves — one from each control flow predecessor. The move from the earliest block is selected as the canonical assignment for the phi LHS.
The full optimization sequence in X86Level2Compiler.doCompile():
1. cfg.constructSSA()
2. cfg.optimize() // Pass 2: per-quad constant folding, copy propagation
3. cfg.removeUnusedVars() // Remove variables with zero uses
4. cfg.deconstrucSSA() // Convert phi functions to moves
5. cfg.removeDefUseChains() // Copy propagation for single-use variables
6. cfg.fixupAddresses() // Reassign sequential addresses after SSA destruction
BinaryQuad.doPass2() checks if both operands are Constant. If so, it computes the result at compile time and replaces the quad with a ConstantRefAssignQuad. This runs on every binary op (add, sub, mul, div, and, or, xor, etc.).
Variable.simplify() is called on each referenced operand. It propagates copy assignments so that later quads use the original source instead of a chain of copies.
Quad.computeLiveness() performs a backward dataflow pass computing live variables at each quad. After SSA deconstruction, cfg.optimize(liveVariables) marks assignments dead if the LHS variable is not in the live set. Dead quads are skipped during code generation.
LiveRange (core/src/core/org/jnode/vm/compiler/ir/LiveRange.java) contains:
- The
Variablebeing allocated -
assignAddress— quad index where the variable is first defined -
lastUseAddress— quad index of the last use
Live ranges are built by iterating over quads in program order and tracking variable definitions and uses from the liveness analysis.
LinearScanAllocator.allocate() executes the classic linear-scan algorithm:
-
Sort all live ranges by
assignAddress(ascending). -
For each range in order:
- Call
expireOldRanges()— release registers for active ranges whoselastUseAddressis before the currentassignAddress. - Call
request(type)to get an available register fromX86RegisterPool. - If a register is available: assign it as a
RegisterLocation, add the range to the active list sorted by end address. - If no register: call
spillRange().
- Call
spillRange() uses an "spill the longest-lived" heuristic:
- LONG and DOUBLE types are always spilled (they require consecutive register pairs in 32-bit mode).
- If the active list is empty, spill the current range to the stack.
- Otherwise, find the active range with the latest
lastUseAddress(longest-lived).- If it ends after the current range: swap their locations (the current range gets the register, the longest-lived range is spilled).
- If it ends before or at the current range: spill the current range.
-
RegisterLocation<T>— holds an x86 register (EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP, or XMM registers for floats). -
StackLocation<T>— holds a stack frame displacement[EBP + offset]. -
TopStackLocation<T>— for values at the JVM operand stack top.
X86CodeGenerator extends GenericX86CodeGenerator<X86Register> and dispatches each quad to an x86 instruction sequence via addressing-mode selection. Key classes:
| Class | Role |
|---|---|
X86Level2Compiler |
Entry point — orchestrates CFG building, SSA, optimization, allocation, codegen |
X86CodeGenerator |
x86-specific quad-to-bytes translation |
GenericX86CodeGenerator<X86Register> |
Architecture-generic codegen with register/displacement addressing |
FPX86CodeGenerator |
Floating-point instruction generation |
X86StackFrame |
Manages prologue/epilogue, local variable offsets |
X86RegisterPool |
x86 register availability tracking |
The stack frame layout follows the standard x86 convention: [EBP + offset] for locals, [EBP] for saved EBP, [EBP + 4] for return address, and [EBP - offset] for spilled locals.
- Phi function placement: Phi functions are placed at the dominance frontier, not at merge points directly. This requires accurate dominance computation first.
- LONG/DOUBLE spilling: These types cannot share a register pair across a spill boundary, so they are always assigned to stack locations in 32-bit mode.
- SSA deconstruction ordering: After phi removal, the move from the earliest predecessor block is selected as the canonical assignment. This is intentional — it preserves the value from the entry path when control flow first reaches the phi block.
-
Dead code vs. unreachable code:
removeUnusedVars()removes variables (SSA versions) with zero uses, but does not remove unreachable basic blocks. Unreachable blocks are detected by liveness analysis in later passes. -
Register pool constraints:
X86RegisterPoolmanages caller-saved registers only. Callee-saved registers (EBX, ESI, EDI, EBP) are handled by the stack frame prologue/epilogue.
- JIT-Compilers — High-level overview of the L1/L2 pipeline
- L1-Compiler-Deep-Dive — The fast, non-optimizing compiler with VirtualStack
- Stack-Frame-Layout — x86 stack frame structure used during code generation
- VM-Magic — Compiler interaction with unboxed types
-
Code-Conventions —
@Inlineand@NoOptCompilePragmaannotations