Skip to content

explicit modeling compliance refactor#32

Merged
gabewillen merged 133 commits into
mainfrom
compliance/explicit-modeling-wip
Mar 6, 2026
Merged

explicit modeling compliance refactor#32
gabewillen merged 133 commits into
mainfrom
compliance/explicit-modeling-wip

Conversation

@gabewillen

Copy link
Copy Markdown
Contributor

Summary

  • harden SML/agent/compliance docs against runtime control-flow shortcuts
  • add model audit inventory at docs/model.audit.md
  • refactor first target (gbnf/rule_parser/lexer) to explicit guard/state-driven modeling
  • refresh quality gate timing snapshot

Completed in this PR so far

  1. docs policy + checklist hardening
  2. lexer re-architecture (first item from audit)

Validation

  • scripts/quality_gates.sh passed

Next planned steps

  • continue down audit list one component at a time, one commit per component refactor

@gabewillen gabewillen marked this pull request as ready for review March 5, 2026 23:46
Copilot AI review requested due to automatic review settings March 5, 2026 23:46
@gabewillen gabewillen changed the title WIP: explicit modeling compliance refactor explicit modeling compliance refactor Mar 5, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR is a large-scale refactor to eliminate dependencies on deprecated EMEL_OK/EMEL_ERR_* C macros throughout the codebase, replacing them with typed, domain-specific error enums and explicit guard-per-error-code state machine transitions. It also adds compliance documentation and a new GBNF lexer architecture.

Changes:

  • Removes EMEL_OK/EMEL_ERR_* macros from emel.h and replaces all usages in source, tests, and tools with typed error codes (emel::error::cast(...), error_code(error::none), etc.)
  • Refactors state machine transition tables across many components (graph, assembler, allocator, tokenizer, batch/planner, etc.) from binary phase_ok/phase_failed guards to per-error-code explicit guards
  • Adds SML compliance rules, AGENTS.md agent instructions, and documentation updates for the new lexer architecture and guard patterns

Reviewed changes

Copilot reviewed 140 out of 266 changed files in this pull request and generated no comments.

Show a summary per file
File Description
include/emel/emel.h Removes deprecated EMEL_* macros, adds emel_error_type typedef
src/emel/text/tokenizer/errors.hpp Migrates error enum from EMEL_* macros to typed bit-field values
src/emel/text/tokenizer/guards.hpp Replaces phase_ok/phase_failed with per-error-code guard structs
src/emel/text/tokenizer/sm.hpp Updates transition table to use per-error-code guards
src/emel/text/encoders/errors.hpp Replaces EMEL_* macros with literal integer values
src/emel/text/formatter/format.hpp Introduces local error enum, removes EMEL_* dependency
src/emel/graph/guards.hpp Replaces compute_phase_ok/compute_phase_failed with per-error guards
src/emel/graph/allocator/guards.hpp Replaces phase_ok/phase_failed with per-error allocation guards
src/emel/graph/assembler/guards.hpp Replaces reserve_phase_*/assemble_phase_* with per-error guards
src/emel/graph/processor/guards.hpp Replaces phase_ok/phase_failed with per-error execution guards
src/emel/batch/planner/guards.hpp Adds planning_failed_with_error/planning_failed_without_error
src/emel/batch/planner/modes/simple/guards.hpp Adds capacity/step-size guards as lambdas
src/emel/batch/planner/modes/sequential/guards.hpp Mirrors simple mode: adds capacity/step-size guards
src/emel/batch/planner/sm.hpp Removes plan_failed intermediate state, routes directly to done
src/emel/gbnf/rule_parser/lexer/detail.hpp New file: lexer detail helpers (word-char, skip-layout, scan)
src/emel/gbnf/rule_parser/nonterm_parser/sm.hpp Splits deciding into lookup-exec/decision states
src/emel/text/jinja/parser/guards.hpp Replaces phase_ok/phase_failed with explicit error guards
src/emel/generator/guards.hpp Renames phase_ok/phase_failed to typed error guards
src/emel/memory/kv/sm.hpp Expands allocate_slots validation into multi-stage decision states
tools/paritychecker/tokenizer_parity_common.cpp Replaces EMEL_OK/EMEL_ERR_* with named tokenizer-scoped constants
tools/bench/** Replaces EMEL_OK/EMEL_ERR_* with local bench_error typed constants
tests/** Replaces EMEL_OK/EMEL_ERR_* checks with typed error-code comparisons
docs/rules/sml.rules.md Adds rules 16–19 banning runtime control-flow emulation patterns
docs/compliance-checklist.md Adds anti-shortcut checklist items
AGENTS.md Adds agent instructions banning loop-based branch emulation
scripts/quality_gates.sh Raises timeout, reduces bench iteration counts, adds generate_docs step
snapshots/quality_gates/timing.txt Updates timing snapshot
Comments suppressed due to low confidence (4)

src/emel/text/encoders/sm.hpp:1

  • This comment in the design doc block now uses the full C++ call expression rather than a human-readable description, making it harder to read as documentation. Consider replacing it with a plain-language description such as - invalid requests or capacity errors -> error code for invalid_argument or a short symbolic reference like `error::code::invalid_argument`.
    src/emel/batch/planner/modes/simple/guards.hpp:1
  • The required_step_count free function in simple/guards.hpp and the minimum_step_count free function in sequential/guards.hpp are identical in logic (both compute full_chunks + has_remainder from effective_step_size and n_tokens). This duplication should be extracted to a shared detail helper to avoid divergence.
    src/emel/text/encoders/errors.hpp:1
  • Both the to_emel and from_emel lookup tables use raw integer literals as magic numbers. These numeric values must stay in sync with the code enum definition immediately above. An out-of-sync edit would silently produce wrong mappings. Consider using static_cast<int32_t>(code::ok) etc., or a static_assert verifying the mapping, to make the relationship explicit and compiler-checked.
    src/emel/text/encoders/errors.hpp:1
  • Both the to_emel and from_emel lookup tables use raw integer literals as magic numbers. These numeric values must stay in sync with the code enum definition immediately above. An out-of-sync edit would silently produce wrong mappings. Consider using static_cast<int32_t>(code::ok) etc., or a static_assert verifying the mapping, to make the relationship explicit and compiler-checked.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fb29612c37

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread include/emel/emel.h
@gabewillen gabewillen merged commit 389bef7 into main Mar 6, 2026
4 checks passed
@gabewillen gabewillen deleted the compliance/explicit-modeling-wip branch March 6, 2026 00:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants