Compiler Guidelines

This file exists to keep the compiler implementation coherent over time. Peeper shares some concepts with Rust. But it doesn't mean Peeper is a direct copy of Rust.

The goal is not to copy Rust or any other compiler blindly. The goal is to build a compiler that is:

correct
easy to extend
easy to reason about
organized around clear phase boundaries

1. Core Rule

Do not cargo-cult architecture.

When reusing ideas from Rust, Zig or any other compiler:

copy the idea only if it fits this language
simplify when the full design is not needed yet
avoid premature generalization
keep room for later semantic and codegen phases

If a design is copied, there must be a concrete reason for it.

Do not ship placeholder architecture for core compiler subsystems.

parsing, manifests, module resolution, dependency loading, semantic analysis, IR, and codegen must be designed for real compiler growth
avoid “basic for now” implementations that will need to be thrown away once the compiler handles real projects
when a subsystem is important, implement the correct boundary and data model first, even if some features inside that boundary remain incomplete
small scope is acceptable; toy structure is not

2. Project Priorities

In order:

correctness
clarity of semantics
maintainability
diagnostic quality
performance

Do not trade away 1-4 for micro-optimizations.

For critical infrastructure, prefer durability over short-term speed of implementation.

a smaller but structurally correct subsystem is acceptable
a quick implementation that bakes in the wrong model is not

3. Language Context To Preserve

These decisions are already part of the language design and should not drift accidentally:

:: is used for imported names, enum variants, and static members
imports are package-root-relative, never relative
core is reserved for the standard library
dependency imports are resolved by manifest alias on the first path segment
a non-aliased import binds the last path segment in source, for example import "util/build" binds build
Zig-style literals are used: .{ ... }
methods are declared outside types using attached-method syntax with receivers.
defer and panic are part of the core control-flow model
builtin functions are declared in global.peep within the core library source tree
stdlib source modules are declared within the core library source tree
external declarations use #[extern(\"...\")] and may omit a body. extern can contain the external linking function name as parameter or keep empty for default behavior.
error unions are explicit value-level control flow and are not exceptions

If implementation changes conflict with this, update the language spec first.

4. Architecture Rules

The compiler should be split into clear layers.

lexer: tokenization only
parser: syntax only
context: module table, dependency graph, caches, shared compiler state
pipeline: phase orchestration
hir: all HIR data structures and HIR-local transforms
mir: MIR and MIR-local transforms
semantics: name resolution, type checking, ownership checks

No package should mix all of these concerns.

HIR-specific rule:

do not split HIR model, HIR generation, and HIR lowering across unrelated top-level packages
keep HIR-related code under hir
if HIR grows, prefer subpackages under hir/... over creating parallel top-level hir* packages again

5. Parser Rules

The parser should build syntax, not interpretation.

keep it single-responsibility
keep it readable without helper noise
avoid wrapper functions whose only job is to rename token checks
recovery should exist, but not at the cost of unreadable control flow

Parser code should answer:

what syntactic form is being parsed
what tokens are consumed
what AST node is produced

If a function makes that hard to see, rewrite it.

6. AST Rules

AST nodes represent source structure, not semantic conclusions.

avoid embedding semantic state in frontend AST nodes
keep locations on all user-visible nodes
choose names that match syntax, not later implementation details

Examples:

NumberLit is better than IntLit if the lexer accepts non-integer numerics
ImportDecl should exist if imports are part of module syntax

7. Context And Pipeline Rules

Use a central compiler context for shared state.

The context owns:

modules
file/import mapping
dependency graph
source cache
diagnostics
incremental state

Module identity inside the compiler must be stronger than raw source import text.

use origin-qualified module keys internally
do not let local, stdlib, and dependency modules collide in one string namespace
reject relative imports at resolution time

Manifest and package-resolution logic should stay outside context and pipeline.

manifest parsing belongs in a dedicated package
project/workspace loading may feed config into the compiler context
fetching/downloading is a separate concern from parsing and import-graph construction

The pipeline owns:

phase order
loading modules
traversing imports
cache reuse decisions
cycle detection

Do not hide pipeline behavior inside parser or lexer code.

7.1 Phase Responsibilities

Phase ownership must stay explicit.

collector
- create module scopes
- collect top-level names
- collect method sets by receiver
- no name lookup beyond local declaration registration
resolver
- resolve imports
- resolve local and nested names
- resolve :: paths
- enforce cross-module visibility for imported names
- bind named-type members that are syntactically addressable without type inference, such as enum variants and static fields
- bind labeled control-flow targets
- no type inference or data-flow reasoning
typechecker
- type expressions and statements
- validate assignment compatibility
- validate return statement value types against function result types
- perform call-site method lookup once receiver types are known
- validate pointer, optional, and error-union rules
- do local, type-directed checks only
- do not do full path-sensitive return completeness here
usage analysis
- run as a final whole-compilation pass after ownership on clean modules
- warn about unused imports
- warn about unused private module symbols such as private functions, types, and module bindings
- warn about unused locals and parameters where the language wants those diagnostics
- treat public API surface differently from private/internal declarations
- do not own type layout or backend-specific liveness decisions
ownership analysis
- run after MIR generation on typed, normalized IR
- validate moves, borrow freezing, and borrow escape rules
- own use-after-move diagnostics and ownership-state transitions
- consume CFG liveness once CFG exists instead of recursing purely over syntax
- use MIR temp/place provenance instead of re-reading frontend syntax
- handle path-sensitive and flow-sensitive reasoning
- stay separate from the basic typechecker
HIR
- preserve typed semantics while removing parser-only surface noise
- represent control constructs in a form suitable for CFG construction
HIR lowering
- desugar frontend sugar
- normalize constructs before CFG
CFG analysis
- build per-function control-flow graphs
- do reachability and return-path analysis
- decide whether non-void functions return on all paths
- handle nested conditionals and loops through graph analysis, not ad hoc AST recursion
- own unreachable-code diagnostics once CFG exists
- be unwind-aware once panic / defer lowering is implemented
- distinguish normal edges from panic-unwind cleanup edges
const evaluation
- fold and propagate only after types are known
- may inform CFG simplification, but should not replace CFG-based correctness checks
layout
- compute physical size, alignment, and field offsets
- may reorder struct fields physically for packing/alignment if the language permits it
- must not mutate semantic/source field order used by parser, typechecker, HIR, ownership, or diagnostics
- must produce a stable mapping from semantic field index to physical slot/offset
- must own ABI-facing layout decisions, not the typechecker or MIR builder

Return-path analysis belongs to CFG analysis, not the basic typechecker.

Reason:

it is fundamentally a control-flow problem
nested loops and branching make AST-local reasoning brittle
CFG gives one place to handle if, switch, loops, break, continue, and labels coherently

Ownership and borrow checking should not be treated as part of the basic typechecker contract.

Reason:

it needs flow-sensitive state transitions
it interacts with control flow, reinitialization, and escapes
keeping it separate makes the typechecker simpler and keeps ownership logic aligned with later CFG/data-flow work

7.2 Unwind And Error Model

Do not conflate panic with E!T.

E!T is explicit value-level control flow
!! lowers to ordinary control flow
panic is non-local control flow
defer runs on both normal exit and panic unwind
recover is not current surface syntax; if reintroduced later it must fit the unwind/cleanup model cleanly

This implies:

CFG and MIR must eventually support cleanup/unwind edges
MIR terminators must remain extensible; do not hard-code the assumption that only normal branch/jump/return exist forever
cleanup execution order must be explicit in IR, not reconstructed from source text late in codegen

Do not fake panic semantics by lowering it to an ordinary call and hoping codegen reconstructs unwind behavior later.

7.3 Semantic Order vs Physical Layout

Semantic field order and physical field layout are different concepts.

semantic order is the declared/source order
semantic field indices in HIR/MIR should remain stable for diagnostics, ownership, and language semantics
physical layout is computed later by a dedicated layout phase

If field reordering is allowed for packing/alignment:

keep semantic field index stable
compute a separate semantic-index -> physical-slot/offset mapping
do not rewrite HIR/MIR/ownership to use physical order directly

This avoids breaking:

dumps
diagnostics
source mapping
ownership/partial-move reasoning on fields
cross-phase reproducibility

8. Incremental And Cache Rules

Incremental behavior must be explicit and testable.

cache keys must be obvious
module invalidation rules must be local and understandable
unchanged modules should be reusable
dependency changes must invalidate correctly

Do not build a “smart cache” that nobody can reason about.

9. Diagnostics Rules

Diagnostics are a product feature, not a side effect.

every syntax error should point to a concrete span
messages should say what is wrong and what was expected
error codes should remain stable once exposed
avoid generic “unexpected token” if a better message is easy to give

Prefer a smaller number of accurate diagnostics over a flood of cascading noise.

10. Testing Rules

Every non-trivial frontend change should come with tests.

Minimum expectations:

lexer tests for new token forms
parser tests for new syntax forms
pipeline tests for multi-file behavior
regression tests for previously broken cases

Do not rely on one example file as proof that the compiler works.

11. Change Discipline

Before changing code, ask:

Is this syntax, parsing, semantics, or orchestration?
Which package should own it?
Will this decision still make sense after type checking exists?
Am I encoding a shortcut that will become a bug later?

If the answer to 4 is yes, do not do it.

12. Anti-Patterns

Avoid these:

helper layers that only rename obvious operations
parser code that depends on semantic facts
token kinds that encode typechecker assumptions
architecture copied from another compiler without current need
one-file “temporary” logic that becomes permanent
hidden ownership rules
silent implicit behavior that the language spec does not justify

13. Preferred Style

Prefer:

small focused packages
direct names
explicit data flow
phase-local responsibilities
tests next to the behavior they protect

If a simpler design solves the problem cleanly and preserves the right long-term structure, prefer it over ornamental complexity. Do not confuse “simple” with “throwaway” or “underdesigned”.

14. Workflow Rule

When implementing new compiler work:

check this file
decide which layer owns the change
choose a design that will survive analyzer/codegen stages
implement the smallest structurally correct version
add tests
only then generalize if there is real pressure

Extras

Migration to import syntax instead of

import "core:io";
import "github.com/raysan5/raylib/ui";

import core::io; 
import raylib::ui; // can alias inline with `as`

config file will contain dependencies

[dependencies]
raylib = { url = "https://github.com/raysan5/raylib", version = "4.0.0" }
local_project = { path = "../local_project" }

lockfile will contain the transitive dependencies and hashes of the dependencies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Compiler Guidelines

1. Core Rule

2. Project Priorities

3. Language Context To Preserve

4. Architecture Rules

5. Parser Rules

6. AST Rules

7. Context And Pipeline Rules

7.1 Phase Responsibilities

7.2 Unwind And Error Model

7.3 Semantic Order vs Physical Layout

8. Incremental And Cache Rules

9. Diagnostics Rules

10. Testing Rules

11. Change Discipline

12. Anti-Patterns

13. Preferred Style

14. Workflow Rule

Extras

Uh oh!

FilesExpand file tree

COMPILER_GUIDELINES.md

Latest commit

History

COMPILER_GUIDELINES.md

File metadata and controls

Compiler Guidelines

1. Core Rule

2. Project Priorities

3. Language Context To Preserve

4. Architecture Rules

5. Parser Rules

6. AST Rules

7. Context And Pipeline Rules

7.1 Phase Responsibilities

7.2 Unwind And Error Model

7.3 Semantic Order vs Physical Layout

8. Incremental And Cache Rules

9. Diagnostics Rules

10. Testing Rules

11. Change Discipline

12. Anti-Patterns

13. Preferred Style

14. Workflow Rule

Extras