Skip to content

JNAsm Instruction Encoding

opencode-agent[bot] edited this page May 10, 2026 · 1 revision

JNAsm Instruction Encoding

How JNasm translates x86 assembly mnemonics into machine code bytes using addressing modes, operand encoding, and ModRM/SIB bytes.

Overview

JNasm's instruction encoding system takes parsed assembly instructions (mnemonic + operands) and emits the corresponding x86 machine code bytes. The encoding lives primarily in X86Core.java which contains one emitXXX() method per instruction family (emitADD, emitMOV, emitCALL, etc.). Each emit method selects the correct encoding variant based on the addressing mode of its operands — register-register, register-immediate, register-memory, memory-immediate, etc.

The encoding delegates actual byte emission to X86Assembler (the stream.writeXXX() methods in X86Core), which handles the low-level bytes: opcode bytes, ModR/M bytes, SIB bytes, displacement bytes, and immediate bytes.

Key Components

Class / File Role
builder/src/builder/org/jnode/jnasm/assembler/x86/X86Core.java Main encoding logic: emitXXX() methods for each instruction family
builder/src/builder/org/jnode/jnasm/assembler/x86/AbstractX86Module.java Addressing mode classification (R_ADDR, RR_ADDR, RE_ADDR, etc.) and operand extraction helpers
builder/src/builder/org/jnode/jnasm/assembler/Instruction.java Instruction metadata: mnemonic, operands, prefixes (LOCK, REP, FS), size info
builder/src/builder/org/jnode/jnasm/assembler/Address.java Memory operand representation: base register, index register, scale, displacement, segment
builder/src/builder/org/jnode/jnasm/assembler/Register.java Register operand representation (name field)
builder/src/builder/org/jnode/jnasm/assembler/InstructionUtils.java Static reflection-based mapping from mnemonic names to integer instruction IDs
core/src/core/org/jnode/assembler/x86/X86Assembler.java Low-level byte emission: writeADD, writeMOV, writeCALL, ModR/M encoding, etc.
core/src/core/org/jnode/assembler/x86/X86Constants.java x86 constants: register IDs, flag values, operand size constants (BITS8, BITS16, BITS32)

How It Works

Instruction ID System

X86Core uses an integer-based instruction ID system to dispatch from mnemonic to emit method. Each instruction family gets a unique _ISN constant:

public static final int ADC_ISN = 0;
public static final int ADD_ISN = ADC_ISN + 1;
public static final int ALIGN_ISN = ADD_ISN + 1;
// ... ~100 more

InstructionUtils.getInstructionMap(X86Core.class) uses reflection to scan all *_ISN fields and build a Map<String, Integer> from lowercase mnemonic to ID. This map is used in the emit() dispatch:

public boolean emit(String mnemonic, List<Object> operands, int operandSize, Instruction instruction) {
    Integer key = INSTRUCTION_MAP.get(mnemonic);
    if (key == null) return false;
    switch (key) {
        case ADD_ISN:
            emitADD();
            break;
        // ...
    }
    return true;
}

Addressing Mode Classification

The core of instruction encoding is AbstractX86Module.getAddressingMode(int maxArgs), which classifies operands into a 3-digit base-8 (DISP=3) encoded addressing mode integer:

static final int NUL_ARG = 0;  // no argument
static final int CON_ARG = 1;  // constant/immediate
static final int REG_ARG = 2;  // register
static final int REL_ARG = 3;  // relative (register-indirect)
static final int ABS_ARG = 4;  // absolute (displacement-only)
static final int SCL_ARG = 5;  // scaled (base+index)
static final int ZSC_ARG = 6;  // simple scaled (index only)
static final int SEG_ARG = 7;  // segment register

Each operand type gets 3 bits (DISP=3). With up to 3 operands, addressing modes are encoded as a composite integer:

static final int RR_ADDR = REG_ARG | REG_ARG << DISP;           // 0o22 = register, register
static final int RC_ADDR = REG_ARG | CON_ARG << DISP;           // 0o21 = register, constant
static final int RE_ADDR = REG_ARG | REL_ARG << DISP;           // 0o23 = register, [base+disp]
static final int ER_ADDR = REL_ARG | REG_ARG << DISP;           // 0o32 = [base+disp], register
static final int EC_ADDR = REL_ARG | CON_ARG << DISP;            // 0o31 = [base+disp], constant
static final int AC_ADDR = ABS_ARG | CON_ARG << DISP;           // 0o41 = [disp], constant
static final int RRRC_ADDR = REG_ARG | REG_ARG << DISP | REG_ARG << 2 * DISP | CON_ARG << 3 * DISP; // 0o2221

The getAddressingMode() method iterates operands, classifies each by type, and shifts into position:

int getAddressingMode(int maxArgs) {
    int ret = N_ADDR; // 0 (no argument)
    for (int i = 0; i < maxArgs; i++) {
        Object o = operands.get(i);
        if (o instanceof Integer)      ret |= CON_ARG << DISP * i;
        else if (o instanceof Register) ret |= REG_ARG << DISP * i;
        else if (o instanceof Address) {
            Address ind = (Address) o;
            if (ind.segment)         ret |= SEG_ARG << DISP * i;
            else if (ind.reg != null && ind.sreg != null) ret |= SCL_ARG << DISP * i; // [base+index*scale]
            else if (ind.reg != null) ret |= REL_ARG << DISP * i; // [base+disp]
            else if (ind.sreg != null) ret |= ZSC_ARG << DISP * i; // [index*scale+disp]
            else ret |= ABS_ARG << DISP * i; // [disp]
        }
        args[i] = o;
    }
    return ret;
}

The return value is a switch-case selector in each emitXXX() method, dispatching to the appropriate stream.writeXXX() call.

Emit Method Pattern

Each instruction family follows the same pattern. Here's emitADD:

private void emitADD() {
    int addr = getAddressingMode(2);
    switch (addr) {
        case RR_ADDR:                          // reg, reg
            stream.writeADD(getReg(0), getReg(1));
            break;
        case RC_ADDR:                          // reg, immediate
            stream.writeADD(getReg(0), getInt(1));
            break;
        case RE_ADDR:                          // reg, [base+disp]
            Address ind = getAddress(1);
            stream.writeADD(getReg(0), getRegister(ind.getImg()), ind.disp);
            break;
        case RA_ADDR:                          // reg, [disp]
            stream.writeADD_MEM(getReg(0), ind.disp);
            break;
        case ER_ADDR:                          // [base+disp], reg
            ind = getAddress(0);
            stream.writeADD(getRegister(ind.getImg()), ind.disp, getReg(1));
            break;
        case EC_ADDR:                          // [base+disp], immediate
            ind = getAddress(0);
            stream.writeADD(operandSize, getRegister(ind.getImg()), ind.disp, getInt(1));
            break;
        case AC_ADDR:                          // [disp], immediate
            stream.writeADD(operandSize, getAddress(0).disp, getInt(1));
            break;
        case GC_ADDR:                          // [seg:disp], immediate
            ind = getAddress(0);
            stream.writeADD(operandSize, (SR) X86Register.getRegister(ind.getImg()), ind.disp, getInt(1));
            break;
        default:
            reportAddressingError(ADD_ISN, addr);
    }
}

Operand Size

The operandSize parameter (set from the [BITS 32] or [BITS 64] directive or size overrides) controls the default operand width. Some instructions use it explicitly:

// EC_ADDR encoding for ADD uses operandSize for the immediate width
stream.writeADD(operandSize, getRegister(ind.getImg()), ind.disp, getInt(1));

For two-operand instructions where destination and source differ in size, X86Core adjusts:

int oSize = operandSize;
if (oSize > getReg(1).getSize()) {
    oSize = getReg(1).getSize(); // use smaller register's size
}
stream.writeMOV(oSize, getRegister(ind.getImg()), ind.disp, getReg(1));

Prefixes

Instruction.java encodes prefix state as bit flags:

public static final int LOCK_PREFIX = 1;
public static final int REP_PREFIX = 2;
public static final int FS_PREFIX = 4;

These are passed through to stream.writeXXX() calls which emit the appropriate prefix bytes before the instruction encoding.

Jump and Call Encoding

Calls and jumps use label resolution. Identifier operands are looked up in the labels map:

Label lab = labels.get(id);
lab = (lab == null) ? new Label(id) : lab;
stream.writeCALL(lab);

Forward references are resolved in the two-pass assembly: pass 1 collects label addresses, pass 2 emits the relative offsets.

ModRM Encoding

The actual ModR/M byte encoding is handled by X86Assembler.writeADD() and similar methods (not in jnasm itself, but in core/src/core/org/jnode/assembler/x86/X86Assembler.java). The jnasm X86Core passes register operands, memory base, displacement, and scale — the X86Assembler builds the ModR/M and SIB bytes according to Intel encoding rules.

Gotchas

  1. Register class casting: When both GPR and segment register (SR) operands are possible, explicit casts are needed: (SR) X86Register.getRegister(ind.getImg()).

  2. Displacement decoding: The Address.disp field carries the displacement. Memory operands with no register (ABS_ARG) encode as [disp] — the displacement is accessed via getAddress(0).disp, not via getReg().

  3. Addressing mode validation: Not all addressing modes are valid for every instruction. Unsupported combinations throw IllegalArgumentException via reportAddressingError(), which decodes the mode bits back to argument type names.

  4. Implicit operand sizing: Some encodings like AC_ADDR (absolute memory + immediate) require explicit operandSize passed as the first parameter to distinguish byte vs. word vs. dword immediates.

  5. Jump type: The Instruction.getJumpType() controls near vs. far call encoding for address operands with a far jump type.

  6. GPR vs MMX/FPU split: Register operands are handled by getRegister() which returns X86Register.GPR. MMX and FPU registers use separate methods (getRegisterMMX(), getRegisterFPU()) defined in AbstractX86Module.

Related Pages

  • JNasm-Assembler - JNasm overview, preprocessing, two-pass architecture, and build integration
  • JNasm-Assembler-Design - Design notes on instruction encoding and addressing modes
  • Build-System - How BootImageBuilder invokes JNasm during boot image construction
  • Assembly-Files - The .asm source files that JNasm assembles (kernel.asm, vm.asm, etc.)
  • L1-Compiler-Deep-Dive - How the JIT compiler emits x86 instructions (contrast with JNasm's static assembly)
  • Stack-Frame-Layout - How compiled code layouts x86 stack frames (uses emitted instructions)

Clone this wiki locally