v1: harden self-host codegen and string interpolation lowering#133
Merged
Conversation
…r corruption In SafeRegAlloc BINOP path, ADD with IMM_INT operand used safeLoadVal(IMM_INT) → movRI(scratch, imm) → addRR(dr, scratch). This required loading the immediate into a scratch register, which could be corrupted in large functions (suspect: scratch register aliasing or slot collision for the immediate value). Fix: use LEA instruction directly — encodes displacement in the instruction bytes, no scratch register needed for the immediate. Also more compact (LEA is 4-8 bytes vs MOV+IMM64+ADD which is 14). Reduces S4 binary from 706K to 646K.
…h register Extend LEA pattern: CMP with IMM_INT operand now uses cmpRI (direct immediate encoding) instead of safeLoadVal(IMM)→scratch→cmpRR. SUB with IMM_INT uses subRI similarly. Eliminates scratch register usage for immediate values in comparison and subtraction, preventing potential corruption in large functions. S4 binary: 642K (was 646K after LEA, was 706K originally).
…nterpolation
isIntegerExpr was missing IDENT and FIELD expression kinds,
causing integer variables (parameters, fields) in ${} string
interpolation to bypass __arimo_i64_to_str conversion.
Integer values were passed directly to __arimo_strcat as string
pointers, causing strlen(strcat_result) to crash on small integers
(e.g., kind=1 appearing as pointer 0x1).
Fix: check varClassOf for IDENT and inferClass for FIELD,
returning true for Integer and Boolean types.
Replace complex pre-counting + write-loop i64_to_str with simpler fixed-buffer approach. Writes digits right-to-left into 32-byte buffer, returns pointer to first digit. Eliminates pre-count loop and reduces IR instruction count and frame slots.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Hardens SafeRegAlloc codegen against scratch-register corruption and fixes string interpolation integer detection. Full bootstrap chain reaches S4 with 0 unresolved labels.
Commits
fix(codegen): use LEA for ADD with immediatefix(codegen): use immediate encodings for CMP and SUBfix(ir): detect integer IDENT and FIELD in isIntegerExpr${kind}now routes integers through i64_to_strfix(ir): simplify generateI64ToStrValidation
Known Remaining Blocker
S4 hello still crashes during parsing with strlen(0). The crash changed from strlen(1) to strlen(0) after f17db1d — confirming the raw integer→strcat bug is fixed. Remaining issue: SafeRegAlloc codegen corruption in Parser__eat string interpolation call chain (i64_to_str→strcat→strlen cascade, one value becomes NULL). This only manifests in large functions with complex control flow.
Rules
🤖 Generated with Claude Code