A sprag is a one-way clutch — it lets a mechanism turn forward but locks against reverse motion. That's the invariant: architecture only moves forward, never rots back.
Enforce human-authored architectural invariants as a gate on every change, ratcheted against a baseline, so AI-built codebases don't silently rot (the k10s failure mode: https://blog.k10s.dev/im-going-back-to-writing-code-by-hand/). Mechanical + deterministic (no model), so no oracle-quality ceiling and no "who-verifies-the-verifier" problem.
And — the part nothing else does — when an AI agent is told "make the gate pass," it can't cheat by
silencing the gate. node demo-threat-model.mjs watches it try, and fail:
told "make the gate pass", the agent tries to silence the gate instead of fixing it:
🛡 BLOCKED raise the limit (maxLines 20 → 50)
🛡 BLOCKED re-baseline (0 → 1, grandfather the debt)
🛡 BLOCKED delete the rule
🛡 BLOCKED downgrade severity (block → warn)
🛡 BLOCKED stage-then-revert the relaxed config
🛡 BLOCKED kill the analysis engine
Every bypass blocked; the gate can only be passed honestly. ✅
The only two ways to green: fix the code, or loosen it on the record (ARCH_ALLOW_RELAX=1,
which still prints what it loosened). Full bypass catalogue + honest limits: THREAT-MODEL.md.
Every quality gate has two silent-failure modes — and an AI agent told "make the gate pass" reaches for both. sprag is the only gate built to close them:
- It can't die quietly. If the analysis engine can't load (not installed, wrong-platform binary, ABI mismatch), sprag fails closed — errors, exit 2 — instead of scoring 0 and passing everything. A no-op gate is the worst possible failure for a gate, so a dead engine must be loud, not invisible.
- It can't be relaxed to pass. The meta-ratchet gates sprag's own config + baseline against a
git ref: raising a
max, dropping a rule, downgrading severity, or re-baselining upward blocks. Loosening is forward-only and visible — a deliberate, reviewedARCH_ALLOW_RELAX=1that still prints every relaxation. A Betterer snapshot can be--updated, an ESLint rule disabled, a baseline rewritten; sprag's ruleset can only move forward, or the change shows up blocked.
The ratchet itself is well-trodden (ArchUnit, Betterer, Sonar "Clean as You Code", Semgrep
--baseline-commit) — sprag stands on that. What's uncontested, and what matters most when the author
of the code is also trying to get past the gate, is gating the gate against the two ways it goes
no-op: by accident (dead engine) and on purpose (relaxed config).
See THREAT-MODEL.md for the full bypass catalogue and a runnable, self-verifying
proof — node demo-threat-model.mjs watches an agent try every shortcut and sprag block each one. The
idea, in essay form: The gate that can't be weakened.
npm i -g @johnpwarren.dev/sprag # provides the `sprag` command
npx @johnpwarren.dev/sprag init <src-dir>
# Or straight from GitHub (no build step either way):
# npm i -g github:johnpatrickwarren-oss/spragOr clone and run directly: git clone https://github.com/johnpatrickwarren-oss/sprag && cd sprag && npm install && node arch.mjs <cmd>.
One unified CLI (arch, or node arch.mjs <cmd>):
npm install # ast-grep engines
node arch.mjs init <src-dir> # scaffold generic invariants + baseline (lang auto-detected)
node arch.mjs check <src-dir> --invariants arch-invariants.json --baseline-in arch-invariants.baseline.json
node arch.mjs install-hook <repo-dir> <src-rel> arch-invariants.json # pre-commit gate (blocks new debt vs HEAD)
node arch.mjs loop <src-dir> --fixer "<your builder cmd>" # AI-loop feedback gate
node arch.mjs trend <repo> <src-rel> --invariants arch-invariants.json # debt trend over historyinit scaffolds the generic, no-tuning checks (god-files, god-functions, coupling fan-in); add
project-specific tenets from library/tenets.json or any raw ast-grep rule via the ast_grep_rule
check kind. The ratchet means you don't need perfect thresholds up front: start from your current
state, and the gate refuses to make it worse.
node arch-gate.mjs sample --baseline # record the clean sample's metrics as the accepted baseline
node arch-gate.mjs sample # check -> PASS (exit 0)
node test-arch-gate.mjs # self-contained proof: passes clean, blocks 3 rot diffs (exit 0)Exit codes: 0 pass · 3 blocked (rot) · 64 usage.
invariants.json declares human-authored invariants; arch-gate.mjs computes each metric over the
codebase and blocks on (a) an absolute max breach or (b) a ratchet regression vs
baseline.json (never-get-worse). Three invariants here map to the k10s tenets:
| Invariant | Check | k10s tenet |
|---|---|---|
model-not-god-object |
Model struct field count (max 8, ratchet) |
god object (1 & 2) |
no-positional-rows |
magic integer-index access (ra[3]) count (max 0) |
positional fragility (4) |
bounded-dispatch |
switch m.view case count (ratchet) |
central per-view dispatch (2) |
The ratchet is the key idea: it blocks the 13th Model field, not the 30th — catching the
trend as it's introduced, which is what would have flagged k10s at commit ~20 instead of at
collapse (commit 234). Demo: adding one Model field is blocked at 6→7, before the absolute max.
Two integration points turn the checker into an enforcement gate:
Pre-commit hook — blocks a commit that makes architecture worse than HEAD (ratchet vs the last commit), so rot can't land:
./install-hook.sh <repo-dir> <src-rel-path> # writes <repo>/.git/hooks/pre-commit
node test-precommit.mjs # proof: rot commit blocked + doesn't land; clean allowedThe hook computes the baseline from HEAD on each commit (no manual baseline upkeep). Bypass
(discouraged) is git commit --no-verify.
AI-loop feedback gate — runs the gate; on a block, feeds the violation back to a pluggable fixer and re-checks, until it passes or escalates:
node arch-loop.mjs <dir> --fixer "<cmd>" [--max-iters 3]
node test-arch-loop.mjs # proof: converges on a good fixer, escalates on a stuck oneThe fixer is a stub in tests; in real use it's a claude session (cwd = the code dir, reads
ARCH_GATE_FEEDBACK / .arch-feedback.md) — reusing prototypes/verification-harness's
watchdog-wrapped session runner. On escalation the loop does NOT relax the invariant — a fixer
that can't satisfy the gate means the change is genuinely incompatible with the architecture, which
is a human call (the inverse of the behavioral harness's phantom problem: here a stuck loop means
the code is wrong, not the invariant).
A per-occurrence check is suppressed on a line carrying // anchor:allow <invariant-id>: <reason>.
The instance is not counted as a violation but is reported in a "Suppressions" section — so
the escape hatch is visible and auditable, never silent. (A gate with no escape hatch gets disabled
wholesale; one with untracked suppression rots silently — this is the middle path.) Metric/ratchet
invariants are instead "suppressed" by deliberately re-recording the baseline.
n := legacyRow[2] // anchor:allow no-positional-rows: legacy CSV import, tracked in #123node arch-trend.mjs <repo> <src-rel> [--last N=20] [--json]
node test-arch-trend.mjs # proof: surfaces a Model growing 6->10 over commits + flags the max breachWalks git history, computes each invariant's metric at every commit, prints the trend, and flags where each invariant first breaches its max. This makes accumulating rot visible early — the article's author only discovered it at collapse because velocity hid the trend.
Checks are computed by a per-invariant engine (engine + lang fields):
ast-grep— real AST via@ast-grep/napi(+@ast-grep/lang-go,@ast-grep/lang-python): Go, TypeScript/JS, and Python. Adding a language is a small parser adapter, not new gate logic.heuristic(default) — lightweight text/brace parsing of Go-flavored source, no deps (fallback).
Built-in check kinds: struct_field_count, switch_case_count, magic_index_count, forbid_pattern,
oversized_files, max_function_lines, max_complexity, module_fanin, scope_diff, forbid_path,
time_bomb_tests, require_tests, dependency_count, unlocked_dependencies, secret_scan,
config_relaxations (see Beyond structure below). For anything bespoke, ast_grep_rule takes a raw
ast-grep rule object (matched on the top-level dir) and ast_grep_tree matches the same rule
recursively, per file, over the whole tree — so a project can encode its own architectural rules in
JSON, on a nested src/, with no code changes.
One check is behavioral, not structural: golden_outputs runs human-declared commands and
diffs their output against committed golden files — catching the "AI refactor silently changed
behavior" failure with a model-free oracle (the approved output). It executes the code, so unlike
every check above it is opt-in / out-of-band (CI / pre-merge, like arch mutate), not the
per-commit hot path, and the commands must be deterministic. Record goldens with ARCH_RECORD_GOLDEN=1;
the committed golden diff is the auditable approval. It's the first behavioral rung — deeper behavior
(property-based, metamorphic) is future work, and a model-based oracle is deliberately out of scope to
keep sprag deterministic and unweakenable.
Raw line count (max_function_lines, oversized_files) is a cheap proxy, and the specific number is a
convention, not a law — a long-but-flat function is fine; a short, deeply-branched one is not. max_complexity
is the less-arbitrary signal: it approximates cyclomatic complexity (1 + decision points + short-circuit
&&/||) per function from the same AST parse — flagging branchy functions (the ones that are genuinely
hard to follow and test; >~10 is the McCabe/NIST anchor), not merely long ones. Same zero-token, deterministic
cost as max_function_lines.
Recommended pairing (what arch init scaffolds): max_complexity is the PRIMARY function gate (~12), max_function_lines is a coarse BACKSTOP set high (~150). They are different axes and neither subsumes the other: a short, branchy function (a 40-line, complexity-46 classifier) is invisible to any length rule, while a long, flat function (a 200-line data table or sequential builder with few branches) is invisible to complexity. Run both, but set the length bound high so it only catches genuinely-huge-but-flat functions and doesn't fight the complexity gate by flagging clean ~100-line functions. (Length alone is a trap: decomposing by length crushes the worst-complexity tail but tends to redistribute branches into more moderate functions rather than remove them — complexity is what actually tracks test/maintenance cost.)
What keeps any threshold from being tyrannical is the design, not the number:
you author the limit for your codebase, the ratchet enforces "never worse" rather than a magic absolute,
and a legitimate overrun is recorded with an auditable suppression (// anchor:allow <id>: <reason>) — visible,
not silent. The gate surfaces candidates for judgment, not verdicts.
The checks below go beyond size — layering / dependency-direction and test discipline — classes the metric checks are blind to (learned by refactoring real repos where the rot lived there):
-
require_tests{ dirs:[...] }— the deterministic shadow of TDD: flags source modules underdirswith no corresponding test (base-name match, layout-agnostic:foo.ts↔foo.test.ts/foo_test.go/test_foo.py). Can't prove test-first, but enforces TDD's durable outcome — "no untested code ships" — as a ratchet (grandfather today's untested, block NEW). Excludes barrelindex.*(override viaexclude). Suppression-aware. -
forbid_path{ dirs:[...], path:'<regex>' }— flags files underdirsthat reference a forbidden path in code (imports / fs reads, not comment citations). Encodes a dependency-direction invariant, e.g. "product (test/,src/) must not read process state (coordination/)". Catches the product-depends-on-process smell. -
time_bomb_tests{ dirs:['test'] }— flags tests that invoke git against a frozen reference (a pinned commit SHA,git diff <ref>..HEAD,--name-onlyanti-scope diffs,git show <sha>byte-identity). These can only rot — once HEAD moves past the round they fail regardless of product correctness — so the discipline belongs in a round-aware gate (seeanti-scope-gate.sh), not the permanent suite. The signal requires both a git invocation and a frozen-ref marker, so a product SHA-256 hash test that never touches git is not falsely flagged.
Architectural rot is one failure mode of AI-built code, not the only one. These deterministic,
model-free, offline checks cover the rest — each as a ratchet or an absolute max: 0, all
suppression-aware:
- Supply chain.
dependency_count{ manifest, include }ratchets the declared dependency surface (npm /go.mod/requirements.txt) so it can't grow without a deliberate re-baseline.unlocked_dependencies{ manifest, lockfile, allow }flags a dep declared but absent from the lockfile — the offline fingerprint of a hallucinated / slopsquatted package, caught with no registry call. - Type integrity (
ast_grep_tree, recursive).no-new-any(anytype),no-non-null-assertion(thex!operator),no-ts-ignore(@ts-ignore/@ts-nocheckdirectives — not@ts-expect-error, which self-removes). The #1 ways AI silences the type system instead of fixing it. - Secrets.
secret_scan{ dirs }flags an inlined credential (provider key shapes + private-key blocks + a guarded genericsecret="…"rule that excludes env refs / placeholders / low-entropy). Tracked files only — a gitignored.envis correctly invisible. High-precision by design (amax:0gate can't cry wolf), so it's the always-on floor, not a replacement for a dedicated scanner. - The gate's own integrity. Two failure modes a gate has that nothing else watches:
- Fail closed. If the ast-grep engine can't load (not installed, wrong-platform binary, ABI mismatch) the gate errors (exit 2) instead of silently scoring 0 and passing everything — a no-op gate is the worst possible failure for a gate.
- No silent relaxation (
config_relaxations{ invariants, baseline, against, from }, the meta-ratchet). The config + baseline may only move forward (stricter) vs a git ref. It blocks every way an agent can make a violation vanish without fixing the code: a raisedmax, dropped rule, downgraded severity, raised baseline, a removed baseline floor for a still-active rule, or a grown exemption list (adding the hallucinated dep toallow, the new dir toallowed, the untested module toexclude). Because the rule lives in the set it guards, deleting it counts too. In a pre-commit hook,from: "index"checks what's actually being committed (the staged config) — closing the stage-a-relaxation-then-revert-the-working-file trick. A deliberate, reviewed loosening goes throughARCH_ALLOW_RELAX=1— still printed, never silent. Together with fail-closed these stop the gate becoming a no-op either by accident (dead engine) or on purpose (relaxed config).
sprag enforces all of the above on itself (invariants.harness.json, run over the whole repo by
the dogfood test on every npm test).
library/tenets.json ships the k10s 5 tenets (T1–T5), 2 layering/test-rot invariants (L1–L2),
the complexity gate (Q1), the supply-chain pair (S1–S2), type-strictness (TS1–TS3), secrets
(SEC1) and the meta-ratchet (M1) as ready-to-enable invariants. Copy the ones you want into your
invariants.json and tune. See
library/README.md.
sprag enforces the deterministic half of quality. The behavioral half — the disciplines that
fight a strong model's defaults (cold-eye review, spec-first contract, …) — lives in its companion
Anchor, whose DISCIPLINES.md is the pairing for
this gate. sprag deliberately carries no behavioral-methodology doc of its own — that would just be a
second copy that drifts.
What sprag does ship is library/working-with-the-gate.md: the gate-coupled habits — author the
invariants first, write the test with the code (require_tests enforces it; arch property validates a
behavioral one), and run the gate before done (don't --no-verify; loosen only on the record). arch init drops it as arch-gate-usage.md; reference it from your CLAUDE.md (@arch-gate-usage.md).
Requiring tests can devolve into test theater — more tests that don't catch more bugs (the classic "2× tests, no better results"). The count-independent answer is mutation testing: flip an operator (&&→||, >=→>, true→false…), re-run your suite, and see if a test fails. A mutant that survives is a bug your tests can't catch — a real gap that line-count and even line-coverage are blind to.
arch mutate <dir> --test "npm test" --since main # mutate only files changed vs main (incremental)
arch mutate <dir> --test "node --test test/*.test.mjs" --all --threshold 70 # full baseline run
arch mutate . --all --test "npm test" --exclude 'corpus/**,test-*.mjs' # skip fixtures + non-.test. suitesIt mutates changed source files only by default (git diff), runs your test command per mutant, and gates on the kill rate. It is deterministic — zero model tokens — but heavy (mutants × suite runtime, not offset by having fewer tests). So it's opt-in and out-of-band: run it in CI / nightly / pre-merge, not as the per-commit gate. The cheap AST checks (complexity, require_tests, god-files) stay on the hot path; mutate is the periodic audit that the tests you do have are worth keeping.
Test files are auto-skipped when they use the .test./.spec. convention; use --exclude <globs> for anything the heuristic misses — in-repo fixtures (deliberately-broken code used as test inputs) and tests named another way (test-*.mjs, etc.). Globs are matched against the repo-relative path (* = within a segment, ** = across). Mutating a fixture or a test file measures nothing, so excluding them keeps the score honest.
Wired into CI (.github/workflows/mutate.yml): every PR runs an incremental mutate over just the source it changed (gating — new code must ship tests that kill its mutants), and a weekly schedule runs the full baseline as a report. This repo dogfoods it.
Rightsizing tests: don't gate on count or a coverage-% target (both reward theater). The amount of testing a function needs is bounded by its cyclomatic complexity (max_complexity caps it → caps the tests needed); require_tests ensures presence; arch mutate confirms the tests that exist actually catch bugs. More tests is never the goal — bug-catching tests are.
- Mechanical + deterministic (no model) → no "who-verifies-the-verifier" problem; invariants are human-authored (the article's Tenet 1).
- Real AST on Go, TypeScript/JS, and Python via ast-grep (
@ast-grep/napi+@ast-grep/lang-go+@ast-grep/lang-python); the heuristic Go engine remains a no-dep fallback. More languages = a parser adapter, not new gate logic. - Generic, no-tuning checks (work on any repo):
oversized_files(God file),max_function_lines(God function),module_fanin(a module imported by too many files — the k10s "everything depends on the God object" coupling smell). - Remaining (design §12): more generic metrics; richer
scope_diff; broader real-repo trials.
npm test → 14 self-contained suites, covering: gate+ratchet+scope, pre-commit hook, AI-loop
(converge/escalate), debt-trend, the generic God-file/God-function/fan-in checks, the custom
ast_grep_rule DSL, init scaffolding, real-AST on TypeScript / Go (incl. goroutine-mutation) /
Python, scope-dirs, and a dogfood suite that runs the gate on its own source (the tool has no God
files/functions/hubs by its own checks).