Skip to content

L2 Compiler Deep Dive

opencode-agent[bot] edited this page May 9, 2026 · 1 revision

L2 Compiler Deep Dive

The L2 compiler is JNode's SSA-based, optimizing JIT compiler for x86. It translates bytecode to a quad-based IR, applies optimization passes, performs linear-scan register allocation, and emits native x86 code.

Overview

The L2 compiler (org.jnode.vm.x86.compiler.l2) is the tier-2, optimizing compiler in JNode's JIT pipeline. While the L1 compiler translates bytecodes directly to native code with a virtual stack, the L2 compiler builds a full Intermediate Representation (IR), converts it to Static Single Assignment (SSA) form, runs multiple optimization passes, allocates registers via linear scan, and finally emits optimized x86 machine code. It is used for hot, performance-critical methods.

The pipeline lives entirely under core/src/core/org/jnode/vm/:

Package Role
org.jnode.vm.compiler.ir Architecture-independent IR (quads, CFG, SSA)
org.jnode.vm.x86.compiler.l2 x86-specific code generation and register pool
org.jnode.assembler.x86 Low-level x86 byte encoder

IR Structure

Control Flow Graph

IRControlFlowGraph<T> (core/src/core/org/jnode/vm/compiler/ir/IRControlFlowGraph.java) is the top-level IR container. It holds an array of IRBasicBlock objects representing the method's control flow graph. It manages SSA construction/destruction, dominance computation, liveness analysis, and the optimization loop.

IRBasicBlock<T> represents a single basic block — a straight-line sequence of quads with no internal branches. It maintains:

  • List of Quad instructions
  • Predecessor and successor block references
  • Immediate dominator (idominator)
  • Dominance frontier and set of dominated blocks

Quads

Quads are JNode's three-address IR instructions. Each quad carries an address (its quad index within the block), a bytecode address, and a flag for dead-code elimination.

Quad Class Purpose
BinaryQuad<T> Binary ops: IADD, ISUB, IMUL, IDIV, FADD, etc.
UnaryQuad<T> Unary ops: INEG, I2L, F2I, etc.
ConstantRefAssignQuad<T> Load a constant literal
VariableRefAssignQuad<T> Copy between variables
PhiAssignQuad<T> Phi function (SSA merge)
ConditionalBranchQuad<T> If-goto
UnconditionalBranchQuad<T> Goto
TableswitchQuad<T> TABLESWITCH bytecode
LookupswitchQuad<T> LOOKUPSWITCH bytecode
CallQuad, StaticCallQuad, VirtualCallQuad Method calls
ReturnQuad Method return

SSA Construction

The SSA form is constructed by IRControlFlowGraph.constructSSA(), which calls:

  1. computeDominance() — Computes dominators, immediate dominators, dominance frontiers, and dominated-blocks sets for all basic blocks.
  2. placePhiFunctions() — Iterates variables defined in blocks with multiple predecessors; inserts PhiAssignQuad at the dominance frontier of those blocks.
  3. renameVariables(startBlock) — Depth-first traversal that renames each variable use to the current top of the SSA renaming stack, pushing a new SSA version on each assignment. Each variable thus gets a unique SSA version at each program point.

SSA Deconstruction

deconstrucSSA() removes phi functions and replaces each PhiAssignQuad with a sequence of VariableRefAssignQuad moves — one from each control flow predecessor. The move from the earliest block is selected as the canonical assignment for the phi LHS.

Optimization Pipeline

The full optimization sequence in X86Level2Compiler.doCompile():

1. cfg.constructSSA()
2. cfg.optimize()            // Pass 2: per-quad constant folding, copy propagation
3. cfg.removeUnusedVars()   // Remove variables with zero uses
4. cfg.deconstrucSSA()       // Convert phi functions to moves
5. cfg.removeDefUseChains() // Copy propagation for single-use variables
6. cfg.fixupAddresses()      // Reassign sequential addresses after SSA destruction

Constant Folding

BinaryQuad.doPass2() checks if both operands are Constant. If so, it computes the result at compile time and replaces the quad with a ConstantRefAssignQuad. This runs on every binary op (add, sub, mul, div, and, or, xor, etc.).

Copy Propagation

Variable.simplify() is called on each referenced operand. It propagates copy assignments so that later quads use the original source instead of a chain of copies.

Dead Code Elimination

Quad.computeLiveness() performs a backward dataflow pass computing live variables at each quad. After SSA deconstruction, cfg.optimize(liveVariables) marks assignments dead if the LHS variable is not in the live set. Dead quads are skipped during code generation.

Register Allocation

Live Ranges

LiveRange (core/src/core/org/jnode/vm/compiler/ir/LiveRange.java) contains:

  • The Variable being allocated
  • assignAddress — quad index where the variable is first defined
  • lastUseAddress — quad index of the last use

Live ranges are built by iterating over quads in program order and tracking variable definitions and uses from the liveness analysis.

Linear Scan Algorithm

LinearScanAllocator.allocate() executes the classic linear-scan algorithm:

  1. Sort all live ranges by assignAddress (ascending).
  2. For each range in order:
    • Call expireOldRanges() — release registers for active ranges whose lastUseAddress is before the current assignAddress.
    • Call request(type) to get an available register from X86RegisterPool.
    • If a register is available: assign it as a RegisterLocation, add the range to the active list sorted by end address.
    • If no register: call spillRange().

Spilling Strategy

spillRange() uses an "spill the longest-lived" heuristic:

  • LONG and DOUBLE types are always spilled (they require consecutive register pairs in 32-bit mode).
  • If the active list is empty, spill the current range to the stack.
  • Otherwise, find the active range with the latest lastUseAddress (longest-lived).
    • If it ends after the current range: swap their locations (the current range gets the register, the longest-lived range is spilled).
    • If it ends before or at the current range: spill the current range.

Locations

  • RegisterLocation<T> — holds an x86 register (EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP, or XMM registers for floats).
  • StackLocation<T> — holds a stack frame displacement [EBP + offset].
  • TopStackLocation<T> — for values at the JVM operand stack top.

Code Generation

X86CodeGenerator extends GenericX86CodeGenerator<X86Register> and dispatches each quad to an x86 instruction sequence via addressing-mode selection. Key classes:

Class Role
X86Level2Compiler Entry point — orchestrates CFG building, SSA, optimization, allocation, codegen
X86CodeGenerator x86-specific quad-to-bytes translation
GenericX86CodeGenerator<X86Register> Architecture-generic codegen with register/displacement addressing
FPX86CodeGenerator Floating-point instruction generation
X86StackFrame Manages prologue/epilogue, local variable offsets
X86RegisterPool x86 register availability tracking

The stack frame layout follows the standard x86 convention: [EBP + offset] for locals, [EBP] for saved EBP, [EBP + 4] for return address, and [EBP - offset] for spilled locals.

Gotchas

  • Phi function placement: Phi functions are placed at the dominance frontier, not at merge points directly. This requires accurate dominance computation first.
  • LONG/DOUBLE spilling: These types cannot share a register pair across a spill boundary, so they are always assigned to stack locations in 32-bit mode.
  • SSA deconstruction ordering: After phi removal, the move from the earliest predecessor block is selected as the canonical assignment. This is intentional — it preserves the value from the entry path when control flow first reaches the phi block.
  • Dead code vs. unreachable code: removeUnusedVars() removes variables (SSA versions) with zero uses, but does not remove unreachable basic blocks. Unreachable blocks are detected by liveness analysis in later passes.
  • Register pool constraints: X86RegisterPool manages caller-saved registers only. Callee-saved registers (EBX, ESI, EDI, EBP) are handled by the stack frame prologue/epilogue.

Related Pages

Clone this wiki locally