A PHP-to-native compiler written in Rust. Compiles a static subset of PHP to ARM64 assembly, producing standalone macOS Mach-O binaries. No interpreter, no VM, no runtime dependencies.
cargo build # dev build
cargo build --release # optimized build
cargo run -- file.php # compile a PHP fileThe compiler outputs a native binary next to the source file (e.g., file.php → file).
Every feature must have tests before it's considered done. The test suite is the primary quality gate.
cargo test # run all tests (slow — ~5-6 min due to as+ld per codegen test)
cargo test -- --include-ignored # run ALL tests including those requiring external libs
cargo test --test codegen_tests # run only end-to-end tests
cargo test test_fizzbuzz # run a specific testLinux target-specific regressions can be checked through the Docker scripts in
scripts/. They support both full suites and normal cargo test filters:
./scripts/test-linux-x86_64.sh # run all tests on Linux x86_64
./scripts/test-linux-arm64.sh # run all tests on Linux ARM64
./scripts/test-linux-x86_64.sh iterable # run tests matching a filter
./scripts/test-linux-arm64.sh test_my_feature # run tests matching a filter
./scripts/test-linux-x86_64.sh --rebuild # rebuild the Docker image first
./scripts/test-linux-arm64.sh --rebuild # rebuild the Docker image firstSome tests are marked #[ignore] because they require external libraries (e.g., SDL2) not available in CI. Before committing, always run cargo test -- --include-ignored locally to verify nothing is broken — including ignored tests.
The full test suite is slow because each codegen test spawns as + ld + runs the binary. To avoid waiting several minutes on every change:
- While developing a feature: run only the tests for that feature (
cargo test test_my_feature) - When the feature is complete: run the full suite once (
cargo test) to check for regressions - PHP cross-check: opt-in via
ELEPHC_PHP_CHECK=1 cargo test— verifies output matches PHP interpreter
| File | What it tests | How |
|---|---|---|
tests/lexer_tests.rs |
Tokenization | Asserts token sequences from source strings |
tests/parser_tests.rs |
AST construction | Asserts AST node structure and operator precedence |
tests/codegen_tests.rs |
Full pipeline (end-to-end) | Compiles PHP → binary, runs it, asserts stdout |
tests/error_tests.rs |
Error reporting | Asserts that invalid programs produce the right error messages |
- New language construct (keyword, operator, statement): needs lexer, parser, codegen, AND error tests
- New operator: needs a Pratt parser binding power test verifying precedence relative to adjacent operators
- New statement type: needs at least one codegen test showing correct output, one test for edge cases (empty body, nested), and one error test for malformed syntax
- New built-in function: needs codegen tests for normal use and error test for wrong argument count/types
- Bug fix: must include a regression test that would have caught the bug
- Every feature also needs an example in
examples/. If an existing example can showcase the new feature naturally, update it. Otherwise, create a newexamples/<name>/main.phpwith its own.gitignore(containing*.s,*.o,main). Examples should be small, readable programs that demonstrate real use cases — not just test cases.
Codegen tests compile inline PHP source and assert stdout:
#[test]
fn test_my_feature() {
let out = compile_and_run("<?php echo 1 + 2;");
assert_eq!(out, "3");
}Each test runs in an isolated temp directory. Tests run in parallel — the compile_and_run helper handles isolation automatically.
PHP source → Lexer (tokens) → Parser (AST) → Resolver (include/require) → NameResolver (namespace/use/FQN canonicalization) → Type Checker → Codegen (ARM64 asm) → as + ld → binary
| Module | Entry point | Responsibility |
|---|---|---|
src/lexer/ |
tokenize() |
Source → Vec<(Token, Span)> |
src/parser/ |
parse() |
Tokens → Program (Vec of Stmt). Pratt parser for expressions |
src/resolver.rs |
resolve() |
Resolves include/require by inlining referenced files. Runs before namespace/name canonicalization |
src/name_resolver.rs |
resolve() |
Applies namespace/use rules, rewrites references to canonical fully-qualified names, and flattens namespace-only AST nodes before type checking |
src/types/ |
check() |
Type checking, returns CheckResult with TypeEnv, function/class/interface/FFI metadata, and the internal Mixed type for heterogeneous assoc-array values |
src/codegen/ |
generate() |
AST → ARM64 assembly string. Top-level orchestration lives in mod.rs, while most lowering lives under expr/, stmt/, and runtime/ |
src/errors/ |
report() |
Error formatting with line:col |
src/span.rs |
Span |
Source position (line, col) attached to all AST nodes |
src/codegen/expr.rsis mainly a dispatcher; most expression lowering now lives in focused helpers undersrc/codegen/expr/src/codegen/stmt.rsis mainly a dispatcher; most statement lowering now lives in focused helpers undersrc/codegen/stmt/src/codegen/runtime/mod.rsemits runtime code (__rt_*routines)src/codegen/runtime/data.rsemits runtime.data/.bsssymbols and metadata tablessrc/codegen/context.rscarries variable layout, ownership state, class metadata, and FFI metadata through codegen
- Add token to
src/lexer/token.rs - Add scanning logic to
src/lexer/scan.rs - Add
BinOpvariant tosrc/parser/ast.rs - Add one line to
infix_bp()insrc/parser/expr.rs(the Pratt parser binding power table) - Add type checking in
src/types/checker/mod.rs - Add ARM64 codegen in the relevant file under
src/codegen/expr/(and only touchsrc/codegen/expr.rsif the dispatcher must learn about a new helper path) - Add tests in all 4 test files
- Add
StmtKindvariant tosrc/parser/ast.rs - Add parser logic in
src/parser/stmt.rs - Add type checking in
src/types/checker/mod.rs - Add codegen in the relevant file under
src/codegen/stmt/(and only touchsrc/codegen/stmt.rsif the dispatcher must learn about a new helper path) - If it introduces variables, update
collect_local_varsinsrc/codegen/functions.rs - Add tests
- Add the function to
src/types/checker/builtins/catalog.rs. This is mandatory: it drives PHP-style case-insensitive builtin lookup, namespace fallback, redeclaration checks, andname_resolverbehavior. - Confirm
function_exists("...")recognizes the function. The implementation delegates to the canonical catalog; do not add a second builtin-name table without keeping it in lockstep. - Add type signature handling in the appropriate
src/types/checker/builtins/<category>.rsfile (argument count, types, return type). - Add the codegen return type to
src/codegen/functions/types.rswhen local type inference needs to know the result type before the builtin is emitted. - Create a new file in
src/codegen/builtins/<category>/(e.g.,strings/my_func.rs). - Add
mod my_func;plus dispatcher wiring in the category'smod.rs. - If the function needs a runtime routine, create it under
src/codegen/runtime/<category>/. - Add module/re-export wiring in the relevant
runtime/<category>/mod.rs, then call it from the runtime emitter orchestration. - Add codegen and error tests.
Do not stop after wiring only the checker and codegen dispatcher. A builtin is not complete until the catalog, function_exists(), case-insensitive lookup, and namespace fallback all see it. New builtins should include at least one case-insensitive or namespaced call test when the feature is PHP-visible.
Leaf builtin/runtime files contain exactly one emitter function. Keep dispatcher/re-export files (mod.rs) as orchestration-only files, and keep runtime data emission in src/codegen/runtime/data.rs.
As a general rule, aim to keep source files under 500 lines of code. This is a maintainability guideline, not a blind numeric rule.
The real goal is to avoid files that become hard to reason about because they mix multiple responsibilities. In practice:
- Dispatcher/orchestration files (
mod.rs, top-level drivers, large checker/codegen coordinators) should stay slim. If they grow large, split them aggressively. - Multi-responsibility files should be split once they start accumulating unrelated concerns, even if the line count is not yet extreme.
- Leaf files that implement one cohesive feature are allowed to exceed 500 lines when splitting them would create artificial fragmentation.
Examples of files that may reasonably stay above the soft limit:
- a single runtime emitter implementing one substantial builtin or runtime routine
- a single compiler pass file that is still clearly about one feature and one code path
- a self-contained parser/lowering/runtime leaf where splitting would only spread one mental model across several tiny files
Examples of files that should usually be split:
- a file that mixes dispatch, validation, data collection, and post-processing
- a file that contains several unrelated builtins or runtime helpers
- a file that acts as a “miscellaneous bucket” for code that did not get a home
So the policy is:
- treat 500 LOC as a warning sign
- treat mixed responsibilities as the real trigger for refactoring
- do not split a file that owns one coherent feature just to satisfy the number
In short: prefer cohesion over mechanical line-count compliance. A 650-line mono-feature leaf is acceptable; a 350-line multi-purpose orchestrator is already a refactor candidate.
- Integers: result in
x0 - Floats: result in
d0 - Strings: pointer in
x1, length inx2 - Function args:
x0-x7(int = 1 reg, string = 2 regs),d0-d7(floats) - Return value: same as expression result (
x0,d0, orx1/x2) - Stack frame:
x29= frame pointer,x30= link register, locals at negative offsets fromx29 - ABI helpers:
src/codegen/abi.rscentralizes load/store/write per type - Labels: use
ctx.next_label("prefix")— global counter prevents collisions across functions - Mixed values:
PhpType::Mixedis an internal boxed runtime shape used for heterogeneous associative-array values; codegen/runtime must preserve the boxed cell contract instead of treating it like a plain scalar
Every emitter.instruction(...) call MUST have an inline // comment explaining what the ARM64 instruction does. This is mandatory — the codebase is educational and every assembly line must be understandable by someone learning how compilers work.
Rules:
- Every instruction line gets a comment. No exceptions. If you add a new
emitter.instruction(...), it must have a// comment. - Alignment:
//starts at column 81. Pad with spaces so the//is at the 81st character position (1-indexed). If the code itself is >= 80 characters, add exactly one space before//. - Block comments before related groups. Use
// -- description --on a standalone line before a block of related instructions (e.g.,// -- set up stack frame --,// -- copy bytes from source --). - Comments explain intent, not mnemonics. Write "store argc from OS" not "store x0 to memory". The reader can see the instruction — explain why it's there.
Example of correct formatting:
// -- set up stack frame --
emitter.instruction("sub sp, sp, #32"); // allocate 32 bytes on the stack
emitter.instruction("stp x29, x30, [sp, #16]"); // save frame pointer and return address
emitter.instruction("add x29, sp, #16"); // set new frame pointer
// -- convert integer to string and write to stdout --
emitter.instruction("bl __rt_itoa"); // convert x0 to decimal string → x1=ptr, x2=len
emitter.instruction("mov x0, #1"); // fd = stdout
emitter.instruction("mov x16, #4"); // syscall 4 = sys_write
emitter.instruction("svc #0x80"); // invoke macOS kernelTo verify alignment, run:
python3 -c "
with open('path/to/file.rs') as f:
for i, line in enumerate(f, 1):
if 'emitter.instruction' in line and '//' in line:
pos = line.rstrip().index('//')
if pos != 80 and len(line[:pos].rstrip()) < 80:
print(f'Line {i}: // at col {pos+1}')
"Each example lives in examples/<name>/main.php with its own .gitignore. To run:
cargo run -- examples/fizzbuzz/main.php
./examples/fizzbuzz/mainPHP-derived syntax must be 100% compatible with PHP. When elephc implements a PHP construct (variables, operators, keywords, built-ins), it must behave identically to PHP. This means:
- Variable names, keywords, operators, and built-in function names must match PHP exactly
- Superglobals (
$argc,$argv) must use PHP's syntax (e.g.,$argv[0], notargv(0)) - Operator precedence and associativity must match PHP
- String escape sequences must match PHP behavior
- Built-in function signatures must match PHP (argument count, order, types)
When in doubt, test with php -r '...' to verify behavior.
elephc also provides compiler-specific extensions beyond standard PHP (e.g., ptr, extern, buffer<T>, packed class). These features have no PHP equivalent and are not expected to run under the PHP interpreter. They are clearly distinguishable from PHP syntax and exist to enable use cases (FFI, game development, low-level memory access) that PHP cannot address.
The docs/ directory is the project's complete documentation, organized into three sections:
docs/
├── README.md # Main index
├── php/ # PHP syntax (standard PHP features)
│ ├── types.md
│ ├── operators.md
│ ├── control-structures.md
│ ├── functions.md
│ ├── strings.md
│ ├── arrays.md
│ ├── math.md
│ ├── classes.md
│ ├── namespaces.md
│ └── system-and-io.md
├── beyond-php/ # Compiler extensions (not valid PHP)
│ ├── pointers.md
│ ├── buffers.md
│ ├── packed-classes.md
│ ├── extern.md
│ └── ifdef.md
└── internals/ # Compiler internals
├── what-is-a-compiler.md
├── how-elephc-works.md
├── the-lexer.md
├── the-parser.md
├── the-type-checker.md
├── the-codegen.md
├── the-runtime.md
├── memory-model.md
├── architecture.md
├── arm64-assembly.md
└── arm64-instructions.md
All docs files are Markdown with YAML frontmatter compatible with Astro content collections. Every .md file must have this frontmatter format:
---
title: "Page Title"
description: "One-line description of the page."
sidebar:
order: N
---titlereplaces the# Heading— do not add a top-level# Titlein the body (Astro renders it from frontmatter)sidebar.ordercontrols page ordering within its section- No navigation links (
[← Back],Next:, etc.) — Astro handles navigation - Use standard Markdown (CommonMark). No custom shortcodes or Astro components inside docs
Documentation must be kept up to date. When adding a new feature:
- PHP syntax feature (operator, built-in, statement, etc.) → update the relevant page in
docs/php/. Add the function signature, parameters, return type, and a short example. - Compiler extension (pointer, buffer, extern, ifdef) → update the relevant page in
docs/beyond-php/. - Compiler internals change (pipeline, codegen, runtime) → update the relevant page in
docs/internals/. - If a feature was previously listed as "not supported", remove that note.
- If there are known incompatibilities with PHP, document them in
docs/php/types.md(incompatibilities section). - Update
docs/README.mdindex if adding a new page.
ROADMAP.md tracks all planned and completed work, organized by version.
- Never remove completed items from a version section. Mark them as
[x]and leave them under the version they belong to. This preserves the history of what was delivered in each release. - New work items go under the appropriate future version.
- When all items in a version are completed, the version is considered done — do not move items elsewhere.
- No
Co-Authored-Bylines in commits - Use commit message prefixes such as
feat:,fix:,chore:,docs:,refactor:, ortest: - Keep commit messages concise
- Run
cargo testbefore committing — all tests must pass - Zero compiler warnings policy (
cargo buildmust be clean) - Never run
cargo fmtin this repo. Use targeted manual edits only; global formatting creates noisy churn here.