diff --git a/.erpaval/INDEX.md b/.erpaval/INDEX.md
index 854ed17b..a98e4817 100644
--- a/.erpaval/INDEX.md
+++ b/.erpaval/INDEX.md
@@ -3,6 +3,10 @@
Compound-extracted lessons and EARS specs from prior autonomous
development sessions. Solutions are reusable; specs are per-feature.
+## Roadmap (durable — read FIRST before planning any milestone)
+
+- [v1.0 roadmap](ROADMAP.md) — M1→M7 dependency graph, 5 hard rails, 10 validation constraints, target package layout, language + scanner coverage. If in-conversation scope disagrees with this file, this file wins.
+
## Solutions (architecture patterns + conventions)
- [SCIP replaces LSP for code-graph oracle edges](solutions/architecture-patterns/scip-replaces-lsp.md) — one-shot indexers beat stateful LSP clients for compiler-grade graph edges.
@@ -18,6 +22,22 @@ development sessions. Solutions are reusable; specs are per-feature.
- [llms-txt config strings quietly anchor doc accuracy](solutions/conventions/llms-txt-as-ground-truth.md) — in a Starlight site with `starlight-llms-txt`, `astro.config.mjs` is more load-bearing than prose READMEs; audit it first in doc-sync sweeps.
- [tsconfig project references go stale on package removal](solutions/conventions/tsconfig-project-references-stale-on-package-removal.md) — root tsconfig `references` drift is invisible until a root-scoped tsc invocation hits; clean up in the same commit as the package delete.
- [Astro NODE_ENV in CI — set it at script scope, not step scope](solutions/conventions/astro-node-env-in-ci-script-scope.md) — mise-action + pnpm + astro chain loses CI-level NODE_ENV overrides; hard-code in package.json `build` script.
+- [Verify npm package canonicality via the upstream repo README install command](solutions/conventions/npm-package-canonicality-via-upstream-readme.md) — `chonkie-ts` was a 2.6 kB squatter; the upstream README pointed to `@chonkiejs/core`. Apply when bare/`-ts`/`@scoped` namesakes coexist.
+- [Add typed kind-filtered enumeration to IGraphStore once 3+ packages need it](solutions/architecture-patterns/storage-list-nodes-over-scattered-sql.md) — `listNodes()` collapses N raw-SQL call sites into one typed rehydration; cross-adapter parity test catches schema drift.
+- [Lift pure helpers to the deepest shared workspace dependency to break future cycles](solutions/architecture-patterns/lift-pure-functions-to-shared-dep-to-break-cycles.md) — `mcp → pack → mcp` was averted by lifting `classifyDependencies` into `@opencodehub/analysis` (the LCA dep). 30-LOC mechanical chore commit.
+- [Worktree isolation — pin pwd at task start and exclude worktrees from biome v2](solutions/best-practices/worktree-isolation-pwd-pin-and-biome-exclusion.md) — gitignore is not enough for biome v2; scope to `packages/` or add `experimentalScannerIgnores`. Always `pwd && git rev-parse --show-toplevel` at task start.
+- [Resolve milestone-old spec drifts inline with the implementing commit](solutions/best-practices/spec-drift-amend-inline-with-implementing-commit.md) — amend spec wording in the same commit that implements the resolution; record drifts with `recommend` in explore-delta so Gate 0 is a confirmation, not a fresh debate.
+- [Segregate graph-only and tabular-only stores at the interface boundary](solutions/architecture-patterns/igraphstore-itemporalstore-segregation.md) — when one type extends multiple sub-interfaces and a concrete implementor can't honestly satisfy all, segregate at the interface, not the class. `IGraphStore` + `ITemporalStore` + `openStore()` composition factory.
+- [Replace raw-SQL escape hatches with typed finders on the storage interface](solutions/architecture-patterns/typed-finders-replace-raw-sql-in-consumers.md) — 108 raw-SQL sites collapse into 15 named finders. Adapters internalize dialect; consumers stay backend-agnostic. Liskov-clean parity harness via public-method rebuilder.
+- [Parallel Act subagents on a shared git tree — interleaving + cherry-pick discipline](solutions/best-practices/parallel-act-subagents-with-shared-git-tree.md) — verify branch state, spawn on non-overlapping packages, watch for stale dist + phantom test counts, watch the test-fixup tail.
+- [Squash-merge masks pre-existing repo-wide debt](solutions/best-practices/squash-merge-masks-pre-existing-debt.md) — first action on a fresh branch from main is `mise run check` BEFORE starting work; lint rules / transitive deps / cross-package test assertions drift across squash boundaries even when per-commit gating was green inside the prior PR.
+- [No spec-coordinate leakage into source](solutions/best-practices/no-spec-coordinate-leakage-into-source.md) — ERPAVal `AC-*`, `M-*`, `W-*`, `CL-*` prefixes belong in commits, PR bodies, ADR refs sections — NOT in JSDoc, inline comments, CLI flag help, MCP tool descriptions, or test names. Sweep `rg -n "AC-[A-Z]-[0-9]" packages/` before every PR-open; LLM clients pick up the leakage and start citing it back.
+- [release: published events need PAT or inline](solutions/conventions/release-published-event-needs-pat-or-inline.md) — release-please-action with default `GITHUB_TOKEN` does NOT fire downstream `release: [published]` workflows; inline asset-attach in `release-please.yml` gated on `steps.release.outputs.release_created`. Fixed AC-D-4; sbom.yml has same latent bug for follow-on.
+- [Dogfood pre-push hook catches CLI spec drift on first push](solutions/best-practices/dogfood-prepush-hook-caught-cli-spec-mismatch.md) — the first `git push` of the commit that adds a self-targeting pre-push hook is where spec/CLI-flag mismatches and "missing index" foot-guns surface. Pattern: SKIP-with-message shape from `pack-determinism-audit.sh` for any gate that depends on a derived artifact.
+- [Cherry-pick verified bug fixes from a sibling testbed clone](solutions/best-practices/cherry-pick-from-sibling-testbed.md) — when a post-filter sibling has authored fix commits with file:line repro coordinates, fetch the sibling and cherry-pick directly; preserves authorship, halves review surface, defeats re-author drift.
+- [Bench dashboard ↔ acceptance script banner-text parity](solutions/architecture-patterns/bench-dashboard-acceptance-script-parity.md) — when a dashboard parses banners by exact-string match, the two artifacts must be edited together; add a roster-shape test that pulls the banner list from the script directly. Surfaced 9-of-17 gates rendering by Bug #4 in 2026-05-10 smoke campaign.
+- [Test env hermeticity for backend-precedence libraries](solutions/conventions/test-env-hermeticity-for-backend-precedence.md) — when an SDK picks a backend by env presence, tests must scope-stash every key in the chain via prefix glob, not just the one they assert on. Per Bug #7 in 2026-05-10 smoke: `CODEHUB_EMBEDDING_*` chain.
+- [Parallel docs subagents over-scrub ADR coordinates](solutions/best-practices/parallel-docs-subagent-overscrubs-adrs.md) — PR #74's carve-out for ADR text isn't visible in the durable lesson; brief docs subagents explicitly that `docs/adr/*` retains spec coordinates.
## Specs
diff --git a/.erpaval/ROADMAP.md b/.erpaval/ROADMAP.md
new file mode 100644
index 00000000..41a4b5a9
--- /dev/null
+++ b/.erpaval/ROADMAP.md
@@ -0,0 +1,219 @@
+# OpenCodeHub v1.0 Roadmap
+
+**Source**: `https://dw5vh8cb4iz6i.cloudfront.net/artifacts/och-roadmap/opencodehub-roadmap-2026-05-05.html` (CloudFront signed URL, expires 2026-05-05).
+**Extracted**: 2026-05-05.
+**Owner**: Laith Al-Saadoon (sole user — rip-and-replace latitude).
+
+This is the durable roadmap reference. If it conflicts with in-conversation scope, this file wins. Durable by design — committed to survive context compaction.
+
+## Product thesis
+
+OpenCodeHub is a personal, local-first, self-hosted OSS code-intelligence hub exposing deterministic cross-repo symbol graphs and SARIF findings through stdio MCP and CLI only. Two-surface product per brainstorm 013:
+
+- **Surface 1 — laptop artifact factory (P0)**: Claude Code plugin over stdio MCP. `codehub-document`, `codehub-pr-description`, `codehub-onboarding`, `codehub-contract-map`. Visible, immediate wedge.
+- **Surface 2 — CI action surface (P1, deferred)**: OSS GH Actions + GitLab templates shelling `codehub` CLI. Structural, slower wedge. Waits on surface-1 adoption.
+
+## Five hard rails (non-negotiable)
+
+1. Self-hosted OSS only — no hosted / managed / SaaS / OCH-operated tier.
+2. Stdio MCP only — no remote / HTTP MCP.
+3. No agent SDK — no Python / TS / claude-hooks / framework adapters.
+4. No LLM in query path — index-time summarizer is the sole exception (persisted, citation-validated, opt-in `--llm`).
+5. No web UI / eval-server / IDE plugin / LSP / model fine-tuning.
+
+## Milestone dependency graph
+
+```
+M1 → M2 → (M3 ∥ M4) → (M5 ∥ M6) → M7
+```
+
+Sequenced by dependency only. No calendar estimates.
+
+## M1 — Stabilize (COMPLETE)
+
+14 commits on `feat/v1-m1-m2`, landed via PR #53 squash-merge `4431b53`. PASS-WITH-CONCERNS.
+
+| Task | Scope | Commits |
+|------|-------|---------|
+| T-M1-1 | Dirty-tree guard on analyze fast-path | `d3fa11b`, `b5e7068`, `fcdd9c9` |
+| T-M1-2 | Real incremental via `loadPreviousGraph` snapshot; graphHash byte-identity preserved | `7b100fd`, `cca3c34`, `7ebe4eb` |
+| T-M1-3 | `EmbeddingHashCacheAdapter` 3-tier content-hash skip; `--force` re-embeds | `3cfb0cf`, `cca3c34`, `8576f53` |
+| T-M1-4 | SARIF symbol-level `FOUND_IN` edges via enclosing-symbol lookup | (in T-M1-2 block) |
+| T-M1-5 | Delete 5 canned MCP prompts; skills replace | `73d1375`, `b95cc90`, `a6a210f` |
+
+**Open concerns** (non-blocking):
+- **C1**: `stringArrayField []→NULL` round-trip asymmetry at `analyze.ts:722-730` + `duckdb-adapter.ts:1353-1359` can drift `canonicalJson` hashes. Tracked, pre-M3 cleanup.
+
+## M2 — Repo split + package surgery (COMPLETE)
+
+14 commits on `feat/v1-m1-m2`, landed via PR #53.
+
+| Task | Scope | Commits |
+|------|-------|---------|
+| T-M2-1 | Extract `packages/eval` + `packages/gym` + `bench/` → `opencodehub-testbed` repo | `53d9b88`, `f6f5f68`, `6d5bc2c` |
+| T-M2-2 | Remove `codehub eval-server` HTTP surface | `60b2982`, `1a1ff05` |
+| T-M2-3 | Remove `packages/docs` Starlight + `pages.yml`; retain `docs/adr/` | `690ca5e`, `d95df3c` |
+| T-M2-4 | `@opencodehub/policy` v1 (3 rule types: `blast_radius_max`, `license_allowlist`, `ownership_required`); wire into `verdict` | `f25b196`, `9890e17`, `d8bfd15`, `4732396` |
+| T-M2-5 | Extract `@opencodehub/wiki` workspace package; compat shim in analysis | `6fcc2f0`, `c538f2d`, `dd624ca` |
+
+## M3 — LadybugDB phase-1 (PENDING, parallel with M4)
+
+Replace recursive-CTE traversals with polymorphic rel-table-per-edge schema (**corrected 2026-05-05** — the v1 roadmap proposed a single rel-table with a `type` column; LadybugDB docs recommend one named rel table per edge kind with multiple `FROM/TO` pairs for columnar predicate pushdown). Current OCH edge-kind count is **23** (post-M2 additions `FOUND_IN`, `DEPENDS_ON`, `OWNED_BY`, `WRAPS`, `QUERIES`, `REFERENCES`, `ACCESSES`), not 21 as originally estimated.
+
+LadybugDB = community successor to Kuzu (Apple acquisition). Pre-1.0 with ABI breaks every few months. **Current npm package: `@ladybugdb/core@0.16.1`** (released 2026-05-04, one day before roadmap review). GitNexus pins 0.15.2. Source-level naming uses `GraphDbStore` / `graphdb-adapter.ts` / `graphdb-pool.ts` to stay within `scripts/check-banned-strings.sh` limits — the `ladybug` and `kuzu` literals are rejected in tracked source files; the `@ladybugdb/core` dep in `package.json` is permitted under package-scope precedent.
+
+| Task | Scope | Dependency | Test gate |
+|------|-------|-----------|-----------|
+| T-M3-1 | Implement `LbugStore` behind `IGraphStore` seam, gated by `CODEHUB_STORE=lbug` | M2 | graphHash parity suite |
+| T-M3-2 | Pool-adapter lifted from GitNexus `pool-adapter.ts` (612 LOC); LadybugDB `.query()` segfaults on concurrent calls | M3-1 | Concurrent query test |
+| T-M3-3 | Single `CodeRelation` rel-table + per-kind DDL replaces ~60-column polymorphic nodes table | M3-2 | MATCH pattern tests |
+| T-M3-4 | graphHash parity test suite — advance iff `DuckStore.graphHash === LbugStore.graphHash` on corpus | M3-3 | CI gate: byte-identical hash |
+| T-M3-5 | Convert `sql` MCP tool output to `cypher` (dual-emit during phase 1, drop `sql` at M7) | M3-4 | MCP tool signature tests |
+| T-M3-6 | ADR documenting swap rationale + 3-phase plan | M3-5 | Documentation reviewed |
+
+**Fallbacks**: DuckDB remains legacy through M7. Apache AGE on Postgres 18 is survivability fallback if LadybugDB breaks beyond repair (documented, not implemented until M7).
+
+## M4 — Language expansion (PENDING, parallel with M3)
+
+| Task | Scope | Notes |
+|------|-------|-------|
+| T-M4-1 | `scip-clang` adapter | Needs `compile_commands.json`, 2 GB RAM/core guard |
+| T-M4-2 | `scip-ruby` adapter | Sorbet install workflow |
+| T-M4-3 | `scip-dotnet` adapter | — |
+| T-M4-4 | Kotlin promotion (distinct from Java) | `scip-kotlin` v0.6.0 via `scip-java` |
+| T-M4-5 | COBOL regex hot path | ~1 ms/file; `copybook`, `CICS`, `PARAGRAPH`, `PERFORM` extraction |
+| T-M4-6 | COBOL ProLeap v4.0.0 backend | ANTLR4/JVM Java subprocess, `--allow-build-scripts` gated. tree-sitter-cobol (v0.1.1, 2023-02-01 — no newer tagged release) remains unreliable. **ProLeap is NOT published to Maven Central** (`search.maven.org` returns 0; last GitHub Release v2.4.0 from 2018); M4-6 must `git clone + mvn install` OR ship a prebuilt JAR under `vendor/proleap/`. ProLeap does not ship a CLI — need a small Java `main` wrapper. |
+| T-M4-7 | Framework detection 5-stage pipeline | New `@opencodehub/frameworks` package. No OSS drop-in; custom curated-registry. |
+
+**Framework detection stages** (each emits `{framework, version?, confidence, evidence[]}`):
+1. Manifest presence (`package.json`, `pyproject.toml`, `pom.xml`, `Gemfile`, `go.mod`, `Cargo.toml`)
+2. Lockfile + exact versions (semver-aware, curated registry)
+3. Config AST (`astro.config.mjs`, `next.config.js`, `vite.config.ts`, `spring.factories`)
+4. Folder convention (`app/`, `pages/`, `src/main/java/`, `config/routes.rb`)
+5. Import / SCIP usage patterns (`import fastapi`, `from django.db`, `@SpringBootApplication`)
+
+## M5 — Deterministic code-packs (PENDING, parallel with M6)
+
+Depends on M4.
+
+| Task | Scope |
+|------|-------|
+| T-M5-1 | `@opencodehub/pack` package with 9-item BOM contract |
+| T-M5-2 | PageRank extraction from `scip-ingest/materialize.ts` dead code → `analysis/page-rank.ts` |
+| T-M5-3 | `codehub code-pack` CLI subcommand + MCP tool |
+| T-M5-4 | Byte-identity determinism test suite |
+| T-M5-5 | `codehub-code-pack` SKILL.md |
+
+**9-item code-pack BOM** (byte-identical given same commit, tokenizer, budget):
+1. `manifest.json` — pack_hash, commit SHA, tokenizer ID, schema version, counts
+2. PageRank-ranked symbol skeleton
+3. File tree with framework labels
+4. Dependency graph / lockfile slice (exact versions)
+5. Top-N AST-chunked files with byte offsets
+6. SCIP-grounded cross-refs (community clusters + call graph)
+7. Optional embeddings sidecar (`.parquet`)
+8. Salient docstrings / SARIF findings by severity + rule
+9. LICENSES / NOTICES + README.md + full determinism contract
+
+## M6 — Cross-repo federation (PENDING, parallel with M5)
+
+Depends on M5.
+
+| Task | Scope |
+|------|-------|
+| T-M6-1 | First-class `Repo` entity in graph |
+| T-M6-2 | `group_list`, `group_status`, `group_contracts`, `group_query` MCP tools |
+| T-M6-3 | `codehub-contract-map` skill (group-only, Mermaid consumer → producer) |
+| T-M6-4 | Cross-repo link graph in `codehub-document --group` |
+| T-M6-5 | `AMBIGUOUS_REPO` sentinel when ≥ 2 repos indexed without explicit `repo:` |
+
+## M7 — LadybugDB default, DuckDB legacy (PENDING)
+
+Depends on M3 + M6.
+
+| Task | Scope |
+|------|-------|
+| T-M7-1 | Flip default backend to `CODEHUB_STORE=lbug` |
+| T-M7-2 | Retain DuckDB only for temporal analytics |
+| T-M7-3 | Drop dual-emit `sql|cypher` → `cypher`-only |
+| T-M7-4 | Final graphHash parity audit across testbed corpus |
+| T-M7-5 | Apache AGE / Postgres 18 escape hatch documented (not implemented) |
+
+## Target package layout at end of roadmap
+
+**Core (11 packages, ~400 files from ~970)**:
+- `@opencodehub/cli` — `codehub` binary, 22+ subcommands (adds `verdict`, `code-pack`)
+- `@opencodehub/mcp` — stdio MCP (29+ tools, 0 prompts)
+- `@opencodehub/analysis` — request-time queries (PageRank, blast, impact)
+- `@opencodehub/ingestion` — scan + materialize pipeline
+- `@opencodehub/scip-ingest` — SCIP proto parsing
+- `@opencodehub/storage` — `IGraphStore` + `DuckStore` + `LbugStore`
+- `@opencodehub/embed` (née embedder) — transformers.js default + HTTP endpoint
+- `@opencodehub/summarizer` — Bedrock Haiku 4.5, index-time only
+- `@opencodehub/sarif` — SARIF 2.1.0 schemas + baseline diff
+- `@opencodehub/scanners` — 20-scanner orchestrator
+- `@opencodehub/core-types` — shared types
+
+**New (4 packages)**:
+- `@opencodehub/frameworks` — 5-stage framework detection
+- `@opencodehub/pack` — deterministic code-pack generator
+- `@opencodehub/policy` — `opencodehub.policy.yaml` + evaluator (M2 shipped)
+- `@opencodehub/wiki` — deterministic wiki (M2 shipped)
+
+## Language coverage targets at v1.0
+
+| Language | Tree-sitter | SCIP | Frameworks | Status |
+|----------|-------------|------|-----------|--------|
+| TypeScript / JavaScript | ✅ | scip-typescript 0.4.0 | Next.js, Nest, Astro, Remix, Vite, Express | Active |
+| Python | ✅ | scip-python | FastAPI, Django, Flask, LangChain, Pydantic | Active |
+| Go | ✅ | scip-go 0.2.4 | stdlib, Gin, Echo | Active |
+| Java | ✅ | scip-java 0.12.3 | Spring Boot, Micronaut, Gradle, Maven | Active |
+| Scala | ✅ | scip-java 0.12.3 | Play, Akka | Active (via java) |
+| Kotlin | ✅ | scip-kotlin 0.6.0 | Ktor, Android | M4 promotion |
+| Ruby | ✅ | scip-ruby 0.4.7 | Rails, Sinatra | M4 |
+| C / C++ | ✅ | scip-clang 0.4.0 | CMake, Conan | M4 |
+| C# / .NET | ✅ | scip-dotnet | ASP.NET, EF Core | M4 |
+| Rust | ✅ | Gap | cargo, Axum, Tokio | Tree-sitter only; SCIP blocked |
+| Swift | ✅ | Gap | SwiftUI, Vapor | Tree-sitter only |
+| COBOL | ❌ | None | CICS, IMS, JCL | Regex hot path + ProLeap v4 (gated) |
+
+## Scanner pipeline (20 scanners at v1.0)
+
+SARIF 2.1.0 ingestion + baseline diff + `codehub verdict` CI exit codes + `ci-init` workflow generation.
+
+- **SAST**: Semgrep, CodeQL, Bandit (Py), Brakeman (Rb), GoSec, detect-secrets
+- **SCA / license**: OSV-Scanner, internal `license_audit`, CycloneDX/SBOM
+- **Type**: tsc, pyright, mypy, ruff-type
+- **Lint**: Biome, ruff, golangci-lint, clippy
+- **Fingerprinting**: `opencodehub/v1` via `{rule_id, symbol_id, hash(snippet)}` for stable baseline diff across formatters
+
+## Validation constraints (every milestone must satisfy all 10)
+
+| # | Constraint | Check |
+|---|-----------|-------|
+| 1 | Stdio MCP + CLI only; no HTTP surfaces | `rg -n 'express\|fastify\|http.createServer' packages/ → 0` |
+| 2 | No LLM in query path | No `@aws-sdk/client-bedrock-runtime` outside `packages/summarizer/` |
+| 3 | Narrative / LLM features ship as skills | `plugins/opencodehub/skills/*/SKILL.md` exists per narrative tool |
+| 4 | Fixtures / evals / gyms in testbed repo | absent from core post-M2 |
+| 5 | `mise run check` exit 0 | per commit |
+| 6 | `graphHash` byte-identical full vs incremental | CI gate |
+| 7 | Deterministic code-pack | same commit + tokenizer + budget → same bytes |
+| 8 | No time estimates | sequenced by dependency graph only |
+| 9 | SARIF 2.1.0 conformance | Zod passthrough + sarif-sdk spec tests |
+| 10 | 20-scanner pipeline coverage | scanner registry enumerated |
+
+## Explicitly rejected (no exceptions)
+
+- Hosted / managed / SaaS tier
+- Remote / HTTP MCP server
+- Agent SDK (Python, TS, claude-hooks, framework adapters)
+- `grounding_pack` MCP compositor
+- OpenCodeHub-branded coding agent
+- LLM-based PR review
+- Hosted review UI (GitHub Checks + PR comments only)
+- IDE plugin / LSP
+- Model fine-tuning
+
+## Rip-and-replace latitude
+
+1 active user. Roadmap explicitly sanctions rip-and-replace where it produces a better shape. No breaking-change budget to preserve beyond the graphHash byte-identity invariant and the MCP tool contract (tools may be renamed/replaced as long as the skill layer is updated in the same change).
diff --git a/.erpaval/debt.md b/.erpaval/debt.md
index d3bc2ceb..edd01525 100644
--- a/.erpaval/debt.md
+++ b/.erpaval/debt.md
@@ -288,11 +288,13 @@ architecture pages + mermaid rendering.
`sarif`, `scip-ingest`, `search` — all have code-level docs but no
README.
-2. **`.gitmodules` thiserror pin comment.** The sweep reconciled
- `packages/gym/corpus/rust/README.md` and `corpus/repos/README.md` on
- `thiserror@2.0.17`. `.gitmodules` line 19 still says
- `pin: v2.0.0 tag`. One-line fix — the subagent's write was denied,
- deferred to the user.
+2. **`.gitmodules` thiserror pin comment.** **Status: CLOSED-STALE** —
+ `git show HEAD:.gitmodules` returns "fatal: path .gitmodules does
+ not exist in HEAD"; the file was removed when `packages/gym` moved
+ to `opencodehub-testbed` (commit 378f79f). The submodule set lives
+ in the testbed repo now; any thiserror-pin reconciliation belongs
+ over there, not here. Closed as stale by AC-C-7 (Track C, v1
+ finalize, 2026-05-09).
3. **Dead eval-harness fallback.** `packages/eval/src/opencodehub_eval/
test_parametrized.py:167-175` has tool-still-unregistered fallback
diff --git a/.erpaval/solutions/architecture-patterns/bench-dashboard-acceptance-script-parity.md b/.erpaval/solutions/architecture-patterns/bench-dashboard-acceptance-script-parity.md
new file mode 100644
index 00000000..caac01ff
--- /dev/null
+++ b/.erpaval/solutions/architecture-patterns/bench-dashboard-acceptance-script-parity.md
@@ -0,0 +1,66 @@
+---
+name: A dashboard that parses banner-text from a script must mirror the script's banners verbatim
+description: Bench/dashboard tools that index gates/jobs by exact-title match against a script's banner output drift silently when the script grows new gates — both files must be edited together
+type: architecture-patterns
+---
+
+`packages/cli/src/commands/bench.ts` indexes gate rows by exact-string
+match against `scripts/acceptance.sh` banners (`N/17:
`). When
+the script grew from 9 to 17 gates and changed a few existing banner
+titles ("graphHash determinism" → "determinism (double-run graphHash)"),
+the dashboard didn't follow. Result: 8 gates never advance past
+"pending" and post-stream get stamped "skipped — script crashed" by the
+crash-fallback path; another 3 displayed under stale titles. Operators
+saw 9/17 gates with confusing detail strings.
+
+The original code shape:
+
+```ts
+export const MVP_GATES: readonly { id: string; title: string }[] = [
+ { id: "install", title: "pnpm install --frozen-lockfile" },
+ // ... 8 more, with stale titles
+];
+
+export function applyLine(rows: GateRow[], rawLine: string): void {
+ const banner = /^\d+\/\d+:\s+(.*)$/.exec(line);
+ if (banner) {
+ const idx = rows.findIndex((r) => r.title === banner[1]); // exact match
+ if (idx >= 0) currentGateIdx = idx;
+ }
+}
+```
+
+**Why:** the dashboard is a thin presenter over the script's stdout. Any
+banner text not in `MVP_GATES` is silently dropped. There is no compile-
+time signal — the build is green, the unit tests are green, only the
+runtime UX degrades. The same gap also caught `[SKIP]` markers: the
+original `applyLine` matched `[PASS]`/`[FAIL]` but not `[SKIP]`, so
+gracefully-degrading gates rendered as "skipped — script crashed" via
+the crash-fallback path with a misleading detail string.
+
+**How to apply:**
+
+1. **Treat banner titles as a contract** between the script and the
+ dashboard. Edit both files in the same commit.
+2. **Add a roster-shape test.** Assert `MVP_GATES.length === 17` AND
+ `MVP_GATES.map(g => g.title)` matches the banner sequence the script
+ emits. The test pulls the banner list from the script directly with
+ `grep -oE '^echo "\d+/\${TOTAL_GATES}: (.+)"$' scripts/acceptance.sh`
+ so the assertion follows the source of truth.
+3. **Match every marker the script emits.** If the script emits `[PASS]`,
+ `[FAIL]`, AND `[SKIP]`, the parser must handle all three. The
+ crash-fallback path must NOT fire for legitimate skips.
+4. **Order matters when index = listr2 row.** `MVP_GATES` order must
+ match script execution order — the dashboard advances rows by index
+ as banners stream in.
+
+Anti-pattern: a "we'll keep them in sync manually" comment without an
+enforcement test. The 9-gate / 17-gate drift sat in `main` undetected
+because no CI surface failed when the script grew. Surfacing it
+required an operator to run `codehub bench` and notice the visual
+mismatch.
+
+Cross-link: the `dogfood-prepush-hook-caught-cli-spec-mismatch` durable
+lesson covers a related pattern — the dogfood pre-push hook on this
+exact PR was where this bug was first surfaced (Bug #4 in
+UPSTREAM_BUGS.md, 2026-05-10 smoke).
diff --git a/.erpaval/solutions/architecture-patterns/igraphstore-itemporalstore-segregation.md b/.erpaval/solutions/architecture-patterns/igraphstore-itemporalstore-segregation.md
new file mode 100644
index 00000000..21ce2069
--- /dev/null
+++ b/.erpaval/solutions/architecture-patterns/igraphstore-itemporalstore-segregation.md
@@ -0,0 +1,62 @@
+---
+title: Segregate graph-only and tabular-only stores at the interface boundary
+tags: [interface-segregation, liskov, storage, multi-backend, igraphstore]
+session: session-33f24f
+---
+
+## Context
+
+`IGraphStore` originally extended `CochangeStore + SymbolSummaryStore` and
+exposed `query(sql, params)`. `GraphDbStore` (LadybugDB) couldn't honestly
+satisfy `lookupCochangesForFile` — it threw `NotImplementedError` on six
+methods. The "obvious" fix was to *implement* cochanges on the graph
+adapter. The clean fix was to *delete* those signatures from the graph
+interface entirely.
+
+After AC-A-1 (split) + AC-A-3 (residue cleanup): `IGraphStore` is graph-only
+(Cypher dialect or none). `ITemporalStore` is tabular-only (SQL `exec()` +
+cochanges + symbol summaries). `openStore({path, backend}) -> {graph,
+temporal, close, describe}` composes both. DuckDB-only deployments share
+one connection between views via structural typing — no class split. LadybugDB
+deployments open `graph.lbug` + `temporal.duckdb` as siblings.
+
+## Lesson
+
+When one type extends multiple sub-interfaces and a concrete implementor
+can't honestly satisfy all of them, segregate at the interface boundary.
+NOT at the class. The concrete that DOES satisfy both stays as one class
+implementing both interfaces (structural typing); the concrete that only
+satisfies one drops the other entirely from its `implements` list.
+
+Procedure:
+
+1. Name the two cohesive interfaces — pick the responsibility, not the
+ storage technology. Here: graph operations vs tabular operations.
+2. Add a composition factory (`openStore`) that returns BOTH views in one
+ envelope. Callers needing both take the envelope; callers needing one
+ take the narrow interface.
+3. Delete the cross-cutting methods from the narrow interface entirely.
+ Concrete adapters that don't implement them no longer need to throw
+ `NotImplementedError`.
+4. Test contract for community adapters: only the narrow interface, with a
+ conformance suite that any implementor imports + runs.
+
+## Why this matters
+
+This pattern lets community contributors fork in adapters without
+re-implementing concerns that don't belong on their backend. An AGE /
+Memgraph / Neo4j / Neptune author implements `IGraphStore` only —
+DuckDB stays as the temporal backend on every deployment. Two files to
+fork in: implement IGraphStore + call `assertIGraphStoreConformance` in
+their test. The pattern beats the alternative ("one mega-interface,
+each adapter throws NotImplementedError on what it can't do") on type
+honesty, conformance verifiability, and Liskov compliance.
+
+## Example
+
+- `packages/storage/src/interface.ts` — split into IGraphStore + ITemporalStore.
+- `packages/storage/src/index.ts` — openStore factory composes views.
+- `packages/storage/src/graphdb-adapter.ts` — implements IGraphStore only.
+- `packages/storage/src/duckdb-adapter.ts` — implements both via structural typing.
+- `packages/storage/src/test-utils/conformance.ts` (AC-A-11) — pre-baked test
+ suite that any IGraphStore implementor imports.
diff --git a/.erpaval/solutions/architecture-patterns/lift-pure-functions-to-shared-dep-to-break-cycles.md b/.erpaval/solutions/architecture-patterns/lift-pure-functions-to-shared-dep-to-break-cycles.md
new file mode 100644
index 00000000..f1176fe9
--- /dev/null
+++ b/.erpaval/solutions/architecture-patterns/lift-pure-functions-to-shared-dep-to-break-cycles.md
@@ -0,0 +1,48 @@
+---
+title: Lift pure helpers to the deepest shared workspace dependency to break future cycles
+tags: [monorepo, dependency-graph, refactoring, workspace-cycles]
+session: session-e1d819
+---
+
+## Context
+
+`classifyDependencies` (license tier classification, ~30 LOC pure
+function) lived in `packages/mcp/src/tools/license-audit.ts`.
+`packages/pack/src/licenses.ts` (M5-5 BOM body) needed it. But
+`@opencodehub/mcp` already depends on `@opencodehub/pack` via the
+`pack_codebase` MCP tool wrapper — a `pack → mcp` import would create
+a `mcp → pack → mcp` cycle. T-W2-3 (commit 9d8d570) lifted the function
+into `@opencodehub/analysis`, which both `mcp` and `pack` already depend
+on, in a single mechanical chore commit.
+
+## Lesson
+
+When a pure helper in package A is needed by package B, and a `B → A`
+import would create a cycle, lift the helper to the **deepest shared
+dependency** in the workspace dep graph (the LCA in package-import
+terms). Procedure:
+
+1. Identify the LCA package by walking up imports from both A and B
+ (`pnpm why @opencodehub/` or visual inspection of
+ `package.json` workspace deps).
+2. Move the function + supporting types **byte-identical** — preserve
+ every comment, signature, regex (in this case `COPYLEFT_PATTERN
+ = /^(GPL|AGPL|SSPL|EUPL|CPAL|OSL|RPL)/`).
+3. Re-export from the destination package's barrel (`index.ts`) at the
+ alphabetically-correct position to match existing convention.
+4. Replace local impl in package A with `import { fn } from "@org/lca"`.
+ Do **not** retain a re-export shim — direct imports are cleaner and
+ prevent future "should I import from A or LCA?" drift.
+5. Move tests to the LCA package; keep the original package's test if
+ it covers integration via the imported symbol.
+6. Commit scope: `chore():` (cross-package symbol moves are
+ chores, not features).
+
+## Why
+
+The alternative — path-importing from `packages//src/...` or
+hardcoding a `.js` import — works but cements the cycle, blocks future
+tree-shaking, and creates two ways to call the same function. Lifting
+to the LCA preserves the dep graph as a DAG and gives every future
+consumer one canonical import path. The 30-LOC mechanical lift takes
+~1 hour and unblocks the downstream feature with zero behavior change.
diff --git a/.erpaval/solutions/architecture-patterns/storage-list-nodes-over-scattered-sql.md b/.erpaval/solutions/architecture-patterns/storage-list-nodes-over-scattered-sql.md
new file mode 100644
index 00000000..f3d18e9d
--- /dev/null
+++ b/.erpaval/solutions/architecture-patterns/storage-list-nodes-over-scattered-sql.md
@@ -0,0 +1,56 @@
+---
+title: Add typed kind-filtered enumeration to IGraphStore once 3+ packages need it
+tags: [storage, graph-store, api-design, typed-rehydration]
+session: session-e1d819
+---
+
+## Context
+
+Spec 005 originally called for `IGraphStore.listNodes()`. Implementation
+diverged into raw SQL (`SELECT id, kind, ... FROM nodes WHERE kind = ?`)
+scattered across `packages/mcp/src/tools/{scan,project-profile,
+dependencies,verdict}.ts`. M5 BOM bodies (skeleton, file-tree, deps,
+xrefs) were about to add four more raw-SQL call sites in
+`packages/pack/`. T-W2-2 lifted the abstraction back into
+`packages/storage/src/interface.ts` (commit 018c253).
+
+## Lesson
+
+When ≥ 3 packages need typed kind-filtered node enumeration from a
+polymorphic graph store, add the method to the storage interface
+instead of duplicating SQL. The shape that worked here:
+
+```ts
+// packages/storage/src/interface.ts
+listNodes(opts?: {
+ readonly kinds?: readonly string[]; // undefined → all; [] → []
+ readonly limit?: number;
+ readonly offset?: number;
+}): Promise; // typed discriminated union
+```
+
+Implementation requirements:
+
+- Both adapters must rehydrate to the **typed** `GraphNode` discriminated
+ union — not `Record`. This forces every column-to-field
+ mapping to be reversed once, in the adapter, instead of duplicated in
+ each consumer (`packages/storage/src/duckdb-adapter.ts:rowToGraphNode`,
+ `packages/storage/src/graphdb-adapter.ts:recordToGraphNode`).
+- `ORDER BY id ASC` at the SQL layer + JS-side lex-stable tiebreak — this
+ is what gives cross-adapter byte-identical output (parity test in
+ `graphdb-adapter.test.ts`).
+- Empty `kinds: []` short-circuits **before** opening any native binding
+ pool; this preserves the pure-JS contract for never-opened stores.
+- Additive interface change: every existing `implements IGraphStore`
+ fake (4 found in this repo: `analysis/test-utils.ts`, `wiki/index.test.ts`,
+ `search/bm25.test.ts`, `search/hybrid.test.ts`) needs a no-op or
+ in-memory `listNodes` to typecheck.
+
+## Why
+
+Scattered SQL ages badly: every new column on the polymorphic `nodes`
+table forces N consumers to update; per-kind rehydration drifts; tests
+silently miss new fields. A typed `listNodes` collapses N rehydration
+implementations to one and turns "did the consumer remember to read
+`languageStats`?" into a compile error. The 25-test cross-adapter parity
+suite added here is the canary for future schema additions.
diff --git a/.erpaval/solutions/architecture-patterns/tree-sitter-wasms-catalog-incompat.md b/.erpaval/solutions/architecture-patterns/tree-sitter-wasms-catalog-incompat.md
new file mode 100644
index 00000000..038bed38
--- /dev/null
+++ b/.erpaval/solutions/architecture-patterns/tree-sitter-wasms-catalog-incompat.md
@@ -0,0 +1,70 @@
+---
+title: tree-sitter-wasms catalog package is unusable with web-tree-sitter 0.26+
+tags: [tree-sitter, web-tree-sitter, wasm, dylink, parser-runtime, ingestion]
+first_applied: 2026-05-08
+repos: [opencodehub]
+---
+
+## The pattern
+
+When a tree-sitter grammar npm package doesn't ship a `.wasm` alongside
+its `.node` binding (kotlin `fwcd/tree-sitter-kotlin`, swift
+`alex-pinkus/tree-sitter-swift`, dart `UserNobody14/tree-sitter-dart`),
+the obvious workaround is the shared catalog package
+`tree-sitter-wasms` which pre-builds `.wasm` for ~40 grammars in one
+place.
+
+**Do not reach for `tree-sitter-wasms@0.1.13` with
+`web-tree-sitter@0.26+`. It won't load.**
+
+## Why
+
+`tree-sitter-wasms@0.1.13` (npm latest as of 2026-05-08) built its
+`.wasm` artifacts with `tree-sitter-cli@0.20.8`, which emits the
+legacy `dylink` custom section (6 bytes). `web-tree-sitter@0.26+`
+hard-requires the standardized `dylink.0` section name (8 bytes) and
+throws `Error: need the dylink section to be first` at
+`Language.load(path)`.
+
+Byte-level verification:
+
+```
+$ xxd -l 32 node_modules/tree-sitter-python/tree-sitter-python.wasm
+00000000: 0061 736d 0100 0000 0011 0864 796c 696e .asm.......dylin
+00000010: 6b2e 3001 0694 c41a 0407 0001 2908 6001 k.0.........).`.
+
+$ xxd -l 32 node_modules/tree-sitter-wasms/out/tree-sitter-kotlin.wasm
+00000000: 0061 736d 0100 0000 000f 0664 796c 696e .asm.......dylin
+00000010: 6ba8 87ee 0104 0200 0001 2908 6001 7f00 k.........).`.
+```
+
+The 11 per-grammar packages that DO ship their own `.wasm` (python,
+typescript, javascript, go, rust, java, csharp, c, cpp, ruby, php)
+were built with current tree-sitter-cli and use `dylink.0` — those
+load cleanly.
+
+## Do this instead
+
+Build your own `.wasm` blobs from the exact grammar sources your
+package.json pins and commit them to the repo. See the opencodehub
+implementation:
+
+- `scripts/build-vendor-wasms.sh` — reproducible build via
+ tree-sitter CLI + docker/podman/finch/local emcc
+- `packages/ingestion/vendor/wasms/{kotlin,swift,dart}.wasm` — committed
+ artifacts (8.1 MB total)
+- `packages/ingestion/src/parse/wasm-fallback.ts` —
+ `resolveGrammarWasmPath` falls back to `vendor/wasms/` for these 3
+ languages when per-grammar `.wasm` isn't present
+
+Zero grammar-version drift (built from same source as native), zero
+install-time emscripten requirement (artifacts committed), zero CI-time
+build (fast install everywhere).
+
+## Related
+
+- ADR 0013 (`docs/adr/0013-parse-runtime-wasm-default.md`) records the
+ full WASM-default decision.
+- Upstream publish blocker that forced the whole reshuffle:
+ [tree-sitter/node-tree-sitter#276](https://github.com/tree-sitter/node-tree-sitter/issues/276)
+ (Node 24 ABI break fix blocked on npm OIDC publish issue since 2025-06).
diff --git a/.erpaval/solutions/architecture-patterns/typed-finders-replace-raw-sql-in-consumers.md b/.erpaval/solutions/architecture-patterns/typed-finders-replace-raw-sql-in-consumers.md
new file mode 100644
index 00000000..385148a3
--- /dev/null
+++ b/.erpaval/solutions/architecture-patterns/typed-finders-replace-raw-sql-in-consumers.md
@@ -0,0 +1,68 @@
+---
+title: Replace raw-SQL escape hatches with typed finders on the storage interface
+tags: [service-layer, dialect-leak, typed-finders, dry, igraphstore]
+session: session-33f24f
+---
+
+## Context
+
+108 raw-SQL call sites lived outside `packages/storage/`: 46 in mcp/, 27
+in analysis/, 17 in cli/, 12 in wiki/, 4 in pack/, 2 in search/. Each
+called `store.query("SELECT ... FROM nodes WHERE ...")`. After
+`IGraphStore` split graph-only (no SQL), every one of those was a
+silent breakage waiting to fire when the default backend flipped.
+
+The clean fix wasn't `s/IGraphStore/DuckDbStore/` everywhere — that
+preserves the abstraction leak. It was **a 13-finder service layer**
+on the interface: `listNodesByKind`, `listEdges`, `listEdgesByType`,
+`listFindings`, `listDependencies`, `listRoutes`, `getRepoNode`,
+`countNodesByKind`, `countEdgesByType`, `traverseAncestors`,
+`traverseDescendants`, `listEmbeddings`, `listConsumerProducerEdges`,
+plus 2 specialized (`listNodesByEntryPoint`, `listNodesByName`).
+
+Each adapter (DuckDB, GraphDb, future AGE/Memgraph/Neo4j/Neptune)
+internalizes the dialect. Consumers call `store.listFindings({severity:
+"error"})`. The 108 sites collapse into 15 named finders. SQL strings
+never leave the adapter.
+
+## Lesson
+
+When raw-SQL escape hatches sprawl across a codebase, the migration
+target is not the "right" type pin — it's the right service-layer API.
+Pattern:
+
+1. Audit raw call sites. Group by query shape. The grouping IS the
+ finder set.
+2. Add finders to the interface. Each finder is the SMALLEST coherent
+ abstraction that covers a recurring query shape.
+3. Implement on every adapter. Internalize the dialect. Determinism
+ (ORDER BY id ASC for nodes; (from_id, to_id, type) for edges).
+4. Migrate consumers one package at a time. Per-package agent + write
+ protocol per AC.
+5. Test contract: round-trip parity via a Liskov rebuilder that uses
+ ONLY public methods (no raw SQL/Cypher). Any new adapter slots in.
+
+## Why this matters
+
+Raw SQL in consumers is a leaky abstraction that fires the day the
+default backend changes. Replacing it with typed finders:
+
+- Makes the architecture honest at compile time, not runtime.
+- Lets community adapters slot in without rewriting consumers.
+- The 15-finder set is a SOLID-I balance — small enough to be coherent,
+ large enough to cover every read pattern.
+- The Liskov-clean parity harness (`rebuildFromStore` using only public
+ methods) means a third-party adapter proves conformance by passing
+ the suite. No coupling to either flagship adapter.
+
+## Example
+
+- `packages/storage/src/interface.ts:144-215` — 15 finder signatures.
+- `packages/storage/src/duckdb-adapter.ts`, `graphdb-adapter.ts` — 13 finder
+ impls each, dialect internalized.
+- `packages/storage/src/test-utils/parity-harness.ts` — `rebuildFromStore`
+ uses listNodes + listEdges only.
+- `packages/storage/src/test-utils/conformance.ts` —
+ `assertIGraphStoreConformance(name, factory)` for community adapters.
+- 108 migration sites across analysis/mcp/pack/wiki/search/cli — see
+ commits `efa673c` through `e4131b3` on `feat/v1-finalize-track-a`.
diff --git a/.erpaval/solutions/best-practices/cherry-pick-from-sibling-testbed.md b/.erpaval/solutions/best-practices/cherry-pick-from-sibling-testbed.md
new file mode 100644
index 00000000..afd63246
--- /dev/null
+++ b/.erpaval/solutions/best-practices/cherry-pick-from-sibling-testbed.md
@@ -0,0 +1,52 @@
+---
+name: Cherry-pick verified bug fixes from a sibling testbed clone
+description: When a sibling/post-filter checkout has authored fix commits with file:line repro coordinates, fetch the sibling and cherry-pick directly — no need to re-author or re-test on upstream
+type: best-practices
+---
+
+When you maintain a "post-filter testbed" sibling repo for smoke / dogfood
+campaigns and you've already authored fix commits there with verified
+repros, do not re-write the fixes on upstream. Fetch the sibling as a
+local remote and cherry-pick.
+
+**Why:** The fix has already been authored, repro'd, verified. Re-authoring
+on upstream loses authorship metadata, doubles review surface, and
+introduces drift between what was fixed and what landed. Re-testing
+re-validates the same green path. The cherry-pick is provably equivalent
+when the file:line coordinates in the fix message match upstream HEAD.
+
+**How to apply:**
+
+1. **Verify file:line parity first.** Each fix in the testbed report
+ should cite file paths and line numbers; quickly grep upstream to
+ confirm the same lines exist there. Per Bug #2 in OCH 2026-05-10
+ campaign: `packages/cli/src/commands/scan.ts:162-171` was identical in
+ testbed and upstream — direct cherry-pick worked.
+2. **Fetch the sibling as a path remote.** No need to register it
+ permanently. One-shot:
+ ```bash
+ git fetch /efs/lalsaado/workplace/opencodehub.post-filter --no-tags
+ ```
+ `FETCH_HEAD` now points at the sibling's HEAD; commits referenced by
+ short-hash become resolvable.
+3. **Cherry-pick in severity order.** HIGH first, MEDIUM next, LOW last.
+ Each pick is one commit; do not squash them into a "umbrella fix"
+ commit — preserves blame and lets the PR reviewer see one
+ self-contained fix per scope.
+4. **Re-verify after each pick** with the package-scoped check:
+ `pnpm -F @opencodehub/ test` plus any smoke script the fix
+ targets (`bash scripts/smoke-mcp.sh`, `node ... doctor`, etc.).
+5. **Prefer one PR for the bundle** when the fixes are small and
+ thematically related (a "v1 upstream bug sweep") — reviewer context
+ stays coherent. Split only if the bundle exceeds reviewability.
+
+Anti-pattern: re-authoring the fix on upstream and citing the testbed
+commit in the body. That loses the original commit's authorship and
+makes blame point at the re-author for code that was thought-through
+elsewhere. If you need to adapt the fix to upstream divergence, do that
+as a follow-up commit on top of the cherry-pick, not a rewrite.
+
+Related: this pairs naturally with the durable lesson "Squash-merge
+masks pre-existing repo-wide debt" — run `mise run check` on upstream
+BEFORE the cherry-pick to baseline-clean, so any test regression after
+the pick is unambiguously attributable to the picked fix.
diff --git a/.erpaval/solutions/best-practices/dogfood-prepush-hook-caught-cli-spec-mismatch.md b/.erpaval/solutions/best-practices/dogfood-prepush-hook-caught-cli-spec-mismatch.md
new file mode 100644
index 00000000..8ac97ae7
--- /dev/null
+++ b/.erpaval/solutions/best-practices/dogfood-prepush-hook-caught-cli-spec-mismatch.md
@@ -0,0 +1,64 @@
+---
+name: A dogfood pre-push hook catches CLI-spec mismatches on the first push
+description: When you wire a CLI you own into your own pre-push hook, the hook becomes a tight feedback loop — the first push of the AC that adds the hook will surface any drift between the spec's invocation and the actual CLI surface, before CI sees it
+type: knowledge
+tags: [dogfood, lefthook, pre-push, ci-hooks, verdict, codehub, fast-feedback]
+session: session-85faf1
+ac: AC-D-5
+---
+
+## Context
+
+Track D's AC-D-5 added a pre-push lefthook job:
+
+```yaml
+- name: verdict
+ run: "{pnpm} codehub verdict --base origin/main --head HEAD --exit-code"
+```
+
+The spec lifted that exact invocation from the spec text — `--exit-code` was a load-bearing flag in the spec. The hook fired on the first `git push -u origin feat/v1-finalize-track-d` and immediately failed:
+
+```
+error: unknown option '--exit-code'
+```
+
+`codehub verdict --help` confirmed the flag does not exist. Reading the source, `verdict` already exits with non-zero on a `block` tier by default — process.exitCode is set automatically. The spec was wrong about the flag.
+
+A second push surfaced a second bug: `codehub verdict` requires a graph index at `.codehub/graph.duckdb` or `graph.lbug`, and a fresh dev clone has neither. The hook hard-blocked the push instead of degrading gracefully.
+
+Both fixes landed as `fix(ci):` follow-up commits BEFORE the PR opened, on the same branch, in the same session.
+
+## Lesson
+
+When you wire your own CLI into your own pre-push hook, the hook is a self-test. The first push of the AC that adds the hook is where you discover:
+
+1. **Whether the flags the spec named are actually wired in the CLI.** Spec drift between EARS requirements and the runtime tool is silent until something runs the tool — and a pre-push hook runs it on every push by definition.
+
+2. **Whether the hook degrades gracefully on every state of the developer's working tree.** A hook that hard-blocks pushes from a freshly-cloned repo (no `.codehub/` index yet) is a foot-gun even if it works correctly on a fully-set-up box.
+
+The fix template for the second one is the same as `scripts/pack-determinism-audit.sh`'s SKIP shape:
+
+```yaml
+run: |
+ if [ -f .codehub/graph.duckdb ] || [ -f .codehub/graph.lbug ]; then
+ {pnpm} codehub verdict --base origin/main --head HEAD
+ else
+ echo "verdict skipped: no .codehub/ index — run 'mise run och:self-analyze' first"
+ fi
+```
+
+## How to apply
+
+- Always test a new pre-push hook by pushing the very commit that adds it. The first push is the truth-teller.
+- Pattern: every dogfood gate that depends on a derived artifact (index, build output, cache) should mirror `scripts/pack-determinism-audit.sh`'s SKIP-with-message shape on absence — never hard-block a push for an artifact the developer hasn't been told to build.
+- When a spec quotes a CLI invocation, sanity-check it against ` --help` before trusting it. Specs lag CLIs; CLIs are the source of truth.
+
+## Why this matters
+
+The spec contract for AC-D-5 was D1-E-4: "lefthook pre-push MUST run `codehub verdict --base origin/main --head HEAD --exit-code`." That clause was wrong about the flag, and a non-dogfooded hook would have left the bug to CI on the next push, or the next dev's first push, or — worst case — a release-please run. Tight feedback caught it in 30 seconds at the cost of one fixup commit.
+
+## References
+
+- Implementation: PR #75 commits `4cf07a8` (initial), `55dc684` (drop `--exit-code`), `044ef43` (graceful-degrade guard).
+- CLI shape: `packages/cli/src/commands/verdict.ts:42-65,140-145` — the `--exit-code` is set by default, no flag needed.
+- Skip-pattern reference: `scripts/pack-determinism-audit.sh` lines 30-44.
diff --git a/.erpaval/solutions/best-practices/finch-as-docker-shim.md b/.erpaval/solutions/best-practices/finch-as-docker-shim.md
new file mode 100644
index 00000000..e0258bd0
--- /dev/null
+++ b/.erpaval/solutions/best-practices/finch-as-docker-shim.md
@@ -0,0 +1,51 @@
+---
+title: Use finch as a drop-in docker via PATH shim on Amazon AL2023 devboxes
+tags: [finch, docker, al2023, containers, emscripten, tree-sitter-cli]
+first_applied: 2026-05-08
+repos: [opencodehub]
+---
+
+## The pattern
+
+CLIs that shell out to `docker` (like `tree-sitter build --wasm -d`,
+which runs `docker run emscripten/emsdk ...`) don't know about Amazon
+Finch. AL2023 devboxes typically have finch installed via
+`/usr/bin/sudo finch ...` (aliased in zsh) but no `docker` on PATH. The
+tool errors out with "You must have either emcc, docker, or podman on
+your PATH".
+
+Workaround: a 3-line shell shim.
+
+## Fix
+
+```bash
+cat > /tmp/docker-shim.sh <<'EOF'
+#!/usr/bin/env bash
+exec sudo HOME=/home/$USER DOCKER_CONFIG=/home/$USER/.docker finch "$@"
+EOF
+chmod +x /tmp/docker-shim.sh
+mkdir -p /tmp/docker-bin && ln -sf /tmp/docker-shim.sh /tmp/docker-bin/docker
+
+PATH=/tmp/docker-bin:$PATH
+```
+
+Verified against `tree-sitter build --wasm -d` — finch pulled
+`docker.io/emscripten/emsdk:3.1.64` (30 s), built kotlin/swift/dart
+WASM grammars (~1 min each), output byte-identical to what a native
+docker install would produce.
+
+## Caveats
+
+- `finch run -v /path:/path` works with volume mounts.
+- The `sudo HOME=... DOCKER_CONFIG=...` wrapping matches Amazon's
+ standard finch alias — without it, finch writes container state to
+ `/root/` and breaks cache reuse.
+- Warnings like `unsupported volume option "Z"` are harmless (SELinux
+ label option that finch/nerdctl ignores).
+
+## When to reach for this
+
+One-off container needs where installing Docker Desktop or podman is
+heavier than justifying — e.g. pre-building WASM artifacts to commit,
+running a one-shot emsdk compile, or testing something in an
+`emscripten/emsdk`-style official image.
diff --git a/.erpaval/solutions/best-practices/no-spec-coordinate-leakage-into-source.md b/.erpaval/solutions/best-practices/no-spec-coordinate-leakage-into-source.md
new file mode 100644
index 00000000..d5bf17be
--- /dev/null
+++ b/.erpaval/solutions/best-practices/no-spec-coordinate-leakage-into-source.md
@@ -0,0 +1,82 @@
+---
+name: ERPAVal spec coordinates (CL-*, AC-*, M-*, W-*) MUST NOT leak into production source or comments
+description: Specifier prefixes from EARS specs and the ERPAVal classifier vocabulary are session-local bookkeeping; production code, comments, JSDoc, and CLI/MCP option descriptions must not reference them
+type: feedback
+---
+
+ERPAVal specs use a structured vocabulary — `AC-A-1`, `AC-C-3`, `M3-1`,
+`W-A-2`, `E-C-3`, `CL-VALIDATE`, `S-A-2`, `architecture-revised.md
+§AC-A-7` — to coordinate work across the orchestrator and Act
+subagents. These prefixes are useful inside ERPAVal artifacts:
+`.erpaval/specs/`, `.erpaval/sessions//`, ADR validation tables,
+commit messages, PR bodies. They are NOT useful in production source.
+
+Observed leakage on Track C cleanup (2026-05-09): the orchestrator and
+multiple Act subagents seeded `AC-C-3:`, `AC-C-2:`, `AC-A-1:`, `AC-A-6c:`,
+`AC-A-9:` into JSDoc, inline comments, MCP tool option descriptions
+(visible to every MCP client), and CLI flag help (visible to every
+`codehub query --help` user). Counts after Wave C.1 + my Wave C.2 first
+pass: ~45 source references to AC-A-* (legacy from Track A — already
+on main via PR #71), 14 source references to AC-C-* introduced this
+session before sweep.
+
+**Why:** session-local coordinates rot. Six months after the AC graduates
+into a release, the spec packet is in `.erpaval/sessions/session-/`
+which is gitignored — readers of the source can't follow the citation.
+The MCP option description "Bypass the embedder fingerprint check
+(AC-C-3)." leaks ERPAVal vocabulary into the MCP tool surface, which
+LLM clients then pick up and start citing back; the leakage compounds.
+
+**How to apply:**
+
+- **Source comments / JSDoc:** name the underlying invariant, behavior,
+ or contract. "Refuse when the persisted embedder modelId differs from
+ the current one" is forever; "AC-C-3 refusal" is until the AC merges
+ and then forgets itself.
+- **Variable names, function names, type fields:** never carry the prefix.
+ `forceBackendMismatch` (good) not `acC3ForceBackendMismatch` (never).
+- **CLI help / MCP descriptions / tool descriptions:** describe the
+ user-visible contract. The user does not know what an AC is. Strip.
+- **ADR text:** ADRs MAY cite AC-* coordinates because the ADR is the
+ permanent home of the decision rationale and links to the spec packet.
+ But cite once, in a "References" section, not inline throughout the
+ decision body.
+- **Commit messages and PR descriptions:** AC citations are great here.
+ Reviewers grep for them; release-please may include them in the
+ changelog.
+- **Test names and fixture names:** prefer the behavior under test
+ ("graphHash parity: medium-with-empty-keywords ([] vs absent)") over
+ the AC ("AC-C-2: graphHash..."). The behavior survives renames; AC
+ numbers don't.
+- **Sweep before commit.** Run `rg -n "AC-[A-Z]-[0-9]" packages/ scripts/`
+ against your branch before PR-open. Anything that hits is a
+ candidate for rephrase. If the comment NEEDS to cite the AC, use a
+ short reference at the end like "(AC-C-5)" rather than leading with
+ it.
+- **Sweep scope is `packages/` and `scripts/`, NOT `docs/adr/*`.** PR #74
+ (`f09d804`) carved out `docs/adr/*` as the explicit place where
+ coordinates ARE permanent decision rationale. A docs-refresh subagent
+ that sees the sweep regex without the scope qualifier will scrub
+ ADRs by default — DO NOT. Brief docs subagents explicitly that ADR
+ text retains coordinates. See the
+ `parallel-docs-subagent-overscrubs-adrs.md` lesson for the failure
+ mode.
+- **The test fakes are the trap.** When a Wave subagent edits a test
+ fake, it tends to add `// AC-XXX: stubs ...` because it's writing
+ the comment WITH the AC packet open in front of it. Sweep test files
+ the same way as source files.
+
+**Why it's worth a hook:** the leakage is mechanical and silent. A
+PostToolUse hook on Edit/Write that scans the diff for `^[\\s*/]*AC-[A-Z]-[0-9]+`
+in `packages/**` (excluding `.erpaval/`, `.md` ADRs, and commit-message
+files) and either blocks the write or appends a stderr advisory would
+catch every recurrence at the source. Until that hook exists, the
+discipline is on the orchestrator + reviewer.
+
+**Carry-forward debt:** Track A merged with extensive `AC-A-*`
+references throughout `packages/storage/`, `packages/mcp/`, and
+`packages/cli/`. They are on main and any Track-after-A branch picks
+them up. A standalone `chore(repo): scrub spec coordinates from
+source` cleanup PR is the right venue — not Track C, not Track D.
+That PR can ship in its own session because the cleanup is mechanical
+and reviewable in one window.
diff --git a/.erpaval/solutions/best-practices/parallel-act-subagents-with-shared-git-tree.md b/.erpaval/solutions/best-practices/parallel-act-subagents-with-shared-git-tree.md
new file mode 100644
index 00000000..fbfd6f1f
--- /dev/null
+++ b/.erpaval/solutions/best-practices/parallel-act-subagents-with-shared-git-tree.md
@@ -0,0 +1,90 @@
+---
+title: Parallel Act subagents on a shared git tree — interleaving + cherry-pick discipline
+tags: [erpaval, act-phase, worktrees, subagents, parallelism, cherry-pick]
+session: session-33f24f
+---
+
+## Context
+
+Track A of v1-finalize ran 13 ACs. Most ACs spawned a dedicated Act
+subagent on an isolated worktree (`isolation: worktree`). Two recurring
+behaviors emerged:
+
+1. **Worktrees that branched off `main` instead of `feat/v1-finalize-track-a`.**
+ Several agents reported "fast-forwarded to feat/v1-finalize-track-a
+ before starting" — the worktree harness defaults the new branch off
+ the orchestrator's CURRENT HEAD, but if the orchestrator hasn't
+ pushed track-a, the harness picked up `origin/main` instead. Fix:
+ the agent's first action is `pwd && git rev-parse --show-toplevel
+ && git log --oneline -10` to verify expected commits are in the
+ chain. If missing, `git fetch && git merge --ff-only feat/v1-finalize-track-a`.
+ Document in the packet's Work log.
+
+2. **Worktree commits landing on the parent branch directly.** Several
+ agents committed to the worktree's local branch but their changes
+ appeared on `feat/v1-finalize-track-a` because the git dir is shared
+ across worktrees. The orchestrator's cherry-pick became a no-op
+ (commit already in branch); next cherry-pick of a NEW commit worked
+ normally. Net effect: orchestrator must verify branch state before
+ AND after each agent completion, not assume cherry-pick is required.
+
+3. **Concurrent worktrees on overlapping packages.** Two agents both
+ editing `packages/storage/` produced merge friction even when their
+ files didn't overlap because lefthook + biome lock root state. Fix:
+ spawn parallel agents on NON-OVERLAPPING package boundaries.
+ `mcp/` parallel with `storage/` is fine; `mcp/` parallel with
+ `analysis/` is fine; two agents on `storage/` is not.
+
+4. **Stale dist + test reports.** `pnpm -r test` runs `node --test
+ ./dist/**/*.test.js`. Type-only changes update `.ts` but leave
+ `.js` stale. After every interface-touching commit, rebuild
+ (`pnpm -r build`) before trusting test counts. Several agents
+ reported phantom failure counts that resolved on rebuild.
+
+## Lesson
+
+For ERPAVal Act phase with parallel subagents on a shared git tree:
+
+1. **Each Act subagent's first action is to verify branch state.**
+ Document `git log --oneline -10` in the Work log. If branched off
+ `main` instead of the feature branch, fast-forward before editing.
+
+2. **Spawn parallel agents on non-overlapping package boundaries.**
+ Worktree isolation does NOT prevent biome / lefthook root-config
+ conflicts. Don't spawn 2+ agents on the same package.
+
+3. **The orchestrator's cherry-pick may be a no-op.** Verify branch
+ HEAD post-completion via `git log --oneline -3 HEAD`. If the agent's
+ reported SHA is already at HEAD, the cherry-pick is redundant — log
+ it and move on.
+
+4. **Rebuild before trusting test counts after interface changes.**
+ `pnpm -r build && pnpm -r test`. Stale `dist/` produces phantom
+ failures.
+
+5. **Watch the test-fixup tail.** When production migrates to a new
+ interface (e.g. typed finders), per-test FakeStore mocks need
+ migration too. The packet that does the production migration should
+ either (a) hoist a shared fake to `/src/test-utils.ts` or
+ (b) explicitly defer test-fixup as a follow-on packet. Don't let
+ it slip silently — the rebuild surfaces 50+ failing tests at once.
+
+## Why this matters
+
+Track A landed 25 commits across 13 ACs in one session via parallel
+subagents. The patterns above are what kept the hash-parity invariant
+green per-commit and prevented two-week debug sessions on phantom
+failures. Future multi-AC tracks (Track C debt sweep, Track D dogfood
+polish) inherit these.
+
+## Example
+
+- `feat/v1-finalize-track-a` HEAD `894d477` — 25 commits, all green.
+- Two agents on storage/ in parallel produced the AC-A-3 / AC-A-7
+ sequencing fix that landed cleanly.
+- Mass mcp test-fixup (`a2718d4f4bf486a57`) was a deferred follow-on
+ packet because AC-A-6c's per-AC scope didn't include the 17-file
+ test mass migration. Right call — the deferred packet had a clean
+ scope and landed in one commit (`d67f115`).
+- Phantom 79-failure count appeared on first AC-A-6c rebuild;
+ resolved on full repo `pnpm -r build`.
diff --git a/.erpaval/solutions/best-practices/parallel-docs-subagent-overscrubs-adrs.md b/.erpaval/solutions/best-practices/parallel-docs-subagent-overscrubs-adrs.md
new file mode 100644
index 00000000..7cd307f6
--- /dev/null
+++ b/.erpaval/solutions/best-practices/parallel-docs-subagent-overscrubs-adrs.md
@@ -0,0 +1,61 @@
+---
+name: Parallel docs-refresh subagents must be told that ADR text is the carve-out where spec coordinates ARE allowed
+description: When a docs-refresh subagent inherits the "no spec-coordinate leakage" rule from durable lessons, it will scrub ADR text by default — but PR #74 carved out docs/adr/* as the place where coordinates ARE the durable rationale; brief explicitly
+type: best-practices
+---
+
+OCH PR #74 (`f09d804 chore(repo): scrub ERPAVal spec coordinates from
+source`) explicitly retained spec coordinates in `docs/adr/*` as
+"permanent decision rationale". The durable lesson
+`no-spec-coordinate-leakage-into-source.md` documents the scrub but
+does NOT crisply state the carve-out. When a parallel docs-refresh
+subagent reads the durable lesson and is told "no spec-coordinate
+leakage", it scrubs ADRs too — undoing PR #74's deliberate carve-out.
+
+Observed in OCH session 6c091d (2026-05-10 v1 upstream bug sweep): the
+docs-refresh subagent stripped `AC-A-1`, `AC-A-2`, `AC-A-6 a/b/c/d`,
+`AC-A-7`, `AC-A-9`, `AC-A-11` from ADR 0013-m7 and `AC-C-3`, `AC-C-5`,
+`E-C-3`, `W-A-2` from ADR 0014. Required a follow-up
+`docs(docs): restore ADR-permanent spec coordinates per PR #74 policy`
+commit on the same branch.
+
+**Why:** the durable lesson's scope says "production source, JSDoc,
+inline comments, CLI flag help, MCP tool option descriptions, test
+names" — but the ADR carve-out lives only in PR #74's body. Subagents
+read the lesson, not the PR archive. The carve-out is invisible to a
+fresh agent.
+
+**How to apply:**
+
+1. **Brief docs subagents explicitly.** When seeding a docs-refresh
+ subagent prompt, include both rules:
+ - "No spec-coordinate prefixes in production source (per durable
+ lesson)."
+ - "ADR text is the carve-out: spec coordinates in `docs/adr/*` are
+ intentional permanent rationale per PR #74. Do NOT scrub them
+ there."
+2. **Update the lesson itself.** Edit
+ `solutions/best-practices/no-spec-coordinate-leakage-into-source.md`
+ to add a "Scope" section that names `docs/adr/*` as the carve-out,
+ so future subagents reading the lesson see the constraint without
+ needing PR archaeology.
+3. **Sweep with a scope-aware regex.** When auditing leakage, exclude
+ `docs/adr/*` from the sweep:
+ `rg -n 'AC-[A-Z]-[0-9]' packages/ scripts/`
+ not
+ `rg -n 'AC-[A-Z]-[0-9]'` (which would falsely flag ADRs).
+4. **The reverse case is also valid.** `docs/adr/0014-*` originally
+ listed `.erpaval/specs/...` and `.erpaval/sessions/...` as
+ References — those paths are gitignored and rot once the packet
+ graduates. Replacing them with code-path citations IS correct, even
+ in ADR text. The carve-out is for spec-coordinate prefixes, not for
+ pointers to gitignored paths.
+
+Anti-pattern: writing a generic "scrub spec coords everywhere" rule and
+then surprised when ADR rationale gets vacuumed. The leakage rule
+exists to prevent rot; ADR rationale doesn't rot because the ADR is
+the rationale.
+
+Cross-link:
+[no-spec-coordinate-leakage-into-source](no-spec-coordinate-leakage-into-source.md) — the original rule.
+PR #74 (`f09d804`) — the carve-out's authoritative source.
diff --git a/.erpaval/solutions/best-practices/pnpm-install-on-efs.md b/.erpaval/solutions/best-practices/pnpm-install-on-efs.md
new file mode 100644
index 00000000..5893de0d
--- /dev/null
+++ b/.erpaval/solutions/best-practices/pnpm-install-on-efs.md
@@ -0,0 +1,68 @@
+---
+title: pnpm install hangs on Amazon EFS-mounted workdir without store-dir + UV_USE_IO_URING=0
+tags: [pnpm, efs, nfs, al2023, devbox, install-performance]
+first_applied: 2026-05-08
+repos: [opencodehub]
+---
+
+## The pattern
+
+`pnpm install` on an EFS-mounted working directory (typical Amazon
+devbox setup where home is local but the source tree is under `/efs`)
+will hang for 4-8 minutes with zero stdout, then eventually complete.
+Two stacked causes:
+
+1. **pnpm CAS store lands on EFS by default.** `pnpm store path` will
+ show something like `/efs//.pnpm-store/v10` when your HOME
+ resolves through EFS. Every CAS lookup becomes a ~22 ms NFS
+ round-trip (vs ~200 µs on local EBS/XFS) — a 100× latency gap.
+ With 800+ packages × dozens of files each, install is O(N) in NFS
+ stat/create syscalls.
+2. **AL2023 kernel `io_uring` cleanup bug**
+ ([amazonlinux/amazon-linux-2023#856](https://github.com/amazonlinux/amazon-linux-2023#856))
+ causes Node processes to appear hung during cleanup. Symptom:
+ pnpm's progress output stops emitting; process shows 1% CPU; then
+ minutes later a flurry of "Progress: resolved X, reused Y" lines
+ pops out at once.
+
+## Fix
+
+**User-global `~/.npmrc`** (not committed to the repo — team members
+on other hosts may want different tunings):
+
+```
+store-dir=/home//.local/share/pnpm-store
+package-import-method=hardlink
+```
+
+**Shell env** for installing (add to `~/.zshrc` permanently until AL2023
+backports the kernel fix):
+
+```bash
+export UV_USE_IO_URING=0
+```
+
+If you're applying this change on an EFS workdir with an existing
+`node_modules/`, pnpm will refuse to rebuild it without TTY — use
+`CI=true pnpm install --no-frozen-lockfile` the first time so pnpm
+can purge the old modules dir and repopulate from the new store
+location. After the first warm install, subsequent installs hardlink
+from local XFS and finish in ~5 seconds.
+
+## Verification
+
+Before: `pnpm install` → 8+ minutes, mostly silent
+After: `pnpm install --prefer-offline` → 4.6 seconds
+
+Check that the store moved: `pnpm store path` should no longer return
+an `/efs/...` path.
+
+## Sources
+
+- pnpm FAQ — cross-filesystem store falls back to copy, not hardlink
+- pnpm settings reference — `store-dir`, `package-import-method`,
+ `virtual-store-dir`
+- kdgregory blog, "EFS Performance Take 3" — bonnie++ file-create
+ latency EFS 22,516 µs vs EBS 218 µs
+- [amazonlinux/amazon-linux-2023#856](https://github.com/amazonlinux/amazon-linux-2023/issues/856)
+ — `UV_USE_IO_URING=0` workaround for io_uring hang
diff --git a/.erpaval/solutions/best-practices/spec-drift-amend-inline-with-implementing-commit.md b/.erpaval/solutions/best-practices/spec-drift-amend-inline-with-implementing-commit.md
new file mode 100644
index 00000000..8ca2ab81
--- /dev/null
+++ b/.erpaval/solutions/best-practices/spec-drift-amend-inline-with-implementing-commit.md
@@ -0,0 +1,50 @@
+---
+title: Resolve milestone-old spec drifts inline with the implementing commit, not as a separate fix
+tags: [spec-discipline, drift-resolution, commit-hygiene, ears]
+session: session-e1d819
+---
+
+## Context
+
+Spec 005 was authored before Wave 1 commits ratified its M5/M6 surface.
+By the time Wave 2 started, four drifts existed (explore-delta.yaml
+`drifts.drift_1..4`):
+
+- drift_1: spec named `chonkie-ts@^0.3.0`; impl had `chonkie@^0.3.0`
+ (and ultimately `@chonkiejs/core@^0.0.9` was correct)
+- drift_2: spec called for `IGraphStore.listNodes()`; method didn't exist
+- drift_3: spec said "extend AGENTS.md with `choices[]`"; that already shipped
+- drift_4: spec said "reuse license_audit MCP logic"; that path cycled
+
+All four were resolved at Gate 0 by amending the spec wording inline as
+part of the commit that implemented the fix (e.g., 77f37c3 amended
+AC-M5-1 wording while switching the chonkie package; 9d8d570 amended
+AC-M5-5 wording while lifting `classifyDependencies`).
+
+## Lesson
+
+When a spec drift is ≥ 1 milestone old and the implementation has already
+committed to a different reality, **amend the spec inline as part of the
+implementing commit**. Do not separate spec-fix from implementation:
+
+1. Catch drifts during the explore-delta pass (or Gate 0 of the next
+ wave). List them with `where / what / reason / action_options /
+ recommend` keys in `explore-delta.yaml` so the orchestrator confirms
+ the resolution before Plan.
+2. The implementing commit message body cites the spec line being
+ amended ("Amends spec 005 AC-M5-5: reads `chonkie` → `@chonkiejs/core`").
+3. The diff includes both the code change AND the spec edit. Reviewers
+ see the drift resolved and ratified in one atomic step.
+4. Never carry an open drift across milestones. Either accept-and-amend
+ or revert-to-spec — the only forbidden state is "spec says X, code
+ does Y, no decision recorded".
+
+## Why
+
+Separate "spec-fix" commits decouple from the reasoning that justified
+the change; future readers see a spec edit with no obvious driver.
+Inline amendment ratifies the drift at the point of decision, keeps the
+spec executable, and prevents Plan from re-litigating settled choices.
+The four-drift batch in this session resolved cleanly because every
+drift had an `action_options` block with a `recommend`, so Gate 0 was
+a four-line confirmation rather than a fresh design discussion.
diff --git a/.erpaval/solutions/best-practices/squash-merge-masks-pre-existing-debt.md b/.erpaval/solutions/best-practices/squash-merge-masks-pre-existing-debt.md
new file mode 100644
index 00000000..fd3e54d5
--- /dev/null
+++ b/.erpaval/solutions/best-practices/squash-merge-masks-pre-existing-debt.md
@@ -0,0 +1,64 @@
+---
+name: Squash-merge can mask pre-existing repo-wide debt that per-commit gating did not surface
+description: A multi-commit feature track whose per-commit `mise run check` was green can still leave the post-squash main failing because lint-rule, transitive-dep, or test-sequence interactions only manifest at the merge boundary
+type: feedback
+---
+
+A long-running feature branch lands as one squash commit on main. Per-commit
+`mise run check` was clean across all 26 of the branch's commits AND on the
+final pre-merge HEAD. The next branch cut from main hits `mise run check` and
+gets a non-zero exit on rules the previous branch never tripped.
+
+This was observed on 2026-05-09: Track A merged via squash from
+`feat/v1-finalize-track-a` (commit 81f9855). Track B cut a fresh branch from
+that main, ran `mise run check`, and immediately failed on 6 biome v2 lint
+errors (`noNonNullAssertion` in `derive.test.ts`, `noConsole` +
+`noTemplateCurlyInString` in `sagemaker-embedder.parity.test.ts`) plus 3
+"unused suppression" warnings on stale `biome-ignore lint/correctness/useYield`
+comments. None of these errors were in Track A's diff; all of them existed on
+main before Track A landed.
+
+**Why it happens:**
+
+1. **Lint rule activation is not deterministic across rebuilds.** Track A
+ bumped a transitive dep that pulled in newer biome rules (or relaxed a
+ `useYield` rule that retroactively flagged old suppressions as unused).
+ Per-commit gating inside Track A had the *old* rule set during early
+ commits and the *new* rule set during late commits — but each individual
+ commit's check ran against its own rule set, so each was self-consistent.
+ The post-squash main has the LATEST rule set against the WHOLE tree,
+ exposing lint debt that no individual commit owned.
+2. **Test-sequence interactions across packages.** A new polyglot scanner
+ (detect-secrets) triggered cli `selectScanners` test failures because
+ `selectScanners` consumed `ALL_SPECS` whose order changed. Catalog tests
+ in `packages/scanners/` updated their assertions; cli tests did not, and
+ the cross-package coupling was invisible inside Track B's package-level
+ diff.
+3. **Squash commit messages drop the bisect granularity** that would have
+ localised the rule-set change to a specific commit.
+
+**Why:** v1.0 finalize ships as four sequential PRs (A → C → B → D per
+`pr-split-analysis.md`). Each branch cuts from the prior squash. If each
+branch only validates its own diff, debt accumulates across the merge
+boundary and the team loses the per-commit U1/U6 invariant guarantee at the
+PR-graph level even though it holds inside each PR.
+
+**How to apply:**
+
+- **First action on a fresh branch from main**: run `mise run check` BEFORE
+ starting work, not at the end. If it fails, fix it in commit 1 of the new
+ branch with a clear "main-debt sweep" message; mention which prior PR's
+ squash exposed it.
+- When deleting a `biome-ignore` comment that biome v2 reports as "unused
+ suppression", verify the underlying rule actually no longer fires (run the
+ empty-pattern code through biome locally) — don't just delete the
+ suppression and hope.
+- When adding a new polyglot P1 catalog entry that flows through
+ `ALL_SPECS`, search every test file (not just `*/catalog.test.ts`) that
+ asserts a specific scanner-id list — `cli/src/commands/scan.test.ts`'s
+ `selectScanners` is the recurrent miss.
+- For the next finalize PR (Track C, Track D), expect the same pattern:
+ cut from the prior squash, immediately run `mise run check`, sweep first.
+- The compound version of this rule belongs upstream of ERPAVal: a `mise`
+ task `mise run check:branch-start` could codify the sweep so it isn't
+ optional.
diff --git a/.erpaval/solutions/best-practices/worktree-isolation-pwd-pin-and-biome-exclusion.md b/.erpaval/solutions/best-practices/worktree-isolation-pwd-pin-and-biome-exclusion.md
new file mode 100644
index 00000000..f054bf26
--- /dev/null
+++ b/.erpaval/solutions/best-practices/worktree-isolation-pwd-pin-and-biome-exclusion.md
@@ -0,0 +1,59 @@
+---
+title: Worktree isolation — pin pwd at task start and exclude worktrees from biome v2
+tags: [worktrees, biome, lefthook, ci, agent-isolation]
+session: session-e1d819
+---
+
+## Context
+
+Two distinct worktree pitfalls hit M5 Wave 2:
+
+1. T-W2-3 was provisioned as `isolation: worktree` but the agent edited
+ files in the main repo before catching that its worktree base was at
+ `ed3950f` (M3/M4) instead of `feat/v1-m5-m6` HEAD `86e295b`. Recovery
+ required `git stash` + `git stash pop`.
+2. Validation `mise run check` failed at the `lint` step because biome v2
+ recursively traversed `.claude/worktrees/agent-*/biome.json` files and
+ detected 10 nested `"root": true` configs — even though the worktrees
+ are gitignored. Scoped lint (`pnpm exec biome check packages/`) exits 0.
+
+## Lesson
+
+**At every worktree task start, byte-pin location and base SHA**:
+
+```bash
+pwd # confirm worktree path, not main
+git rev-parse --show-toplevel # toplevel matches pwd
+git rev-parse HEAD # matches expected base SHA
+git status # confirm clean tree
+```
+
+If any of these mismatch the task packet's expected state, halt and
+re-provision. Editing in the wrong tree wastes the isolation guarantee.
+
+**Biome v2 traverses gitignored worktrees by default.** `gitignore`
+alone is **not** sufficient. Two viable fixes:
+
+- (a) Scope CI/lefthook biome invocations to tracked source paths:
+ `pnpm exec biome check packages/ scripts/` (not bare `.`). This is
+ the workaround used in this session.
+- (b) Add an explicit exclusion in `biome.json`:
+ `"files": { "experimentalScannerIgnores": ["**/.claude/worktrees/**"] }`.
+ This is the durable fix; ship it the next time `biome.json` is touched.
+
+Inside a worktree, prefer `git -C ` for git ops over `cd
+ && git ...` — the harness's per-bash-call cwd reset makes
+`-C` the only reliable form across multi-step sequences.
+
+## Why
+
+Worktrees buy you parallel-agent isolation only if the agent actually
+operates inside its own tree. A wrong-pwd edit breaks the cherry-pick
+contract and pollutes the main branch with WIP. Pinning pwd takes 4
+bash calls and costs nothing.
+
+Biome v2's "scan everything" default treats `.claude/worktrees/` as
+ordinary source. The gitignore-is-enough assumption (true for git, npm,
+pnpm) does not extend to biome v2. Either scope the invocation or add
+the explicit exclusion — but document the choice so the next contributor
+with sibling worktrees doesn't burn an hour on a phantom CI failure.
diff --git a/.erpaval/solutions/conventions/npm-package-canonicality-via-upstream-readme.md b/.erpaval/solutions/conventions/npm-package-canonicality-via-upstream-readme.md
new file mode 100644
index 00000000..05a55d81
--- /dev/null
+++ b/.erpaval/solutions/conventions/npm-package-canonicality-via-upstream-readme.md
@@ -0,0 +1,46 @@
+---
+title: Verify npm package canonicality via the upstream repo README install command
+tags: [npm, supply-chain, dependency-pinning, squatters]
+session: session-e1d819
+---
+
+## Context
+
+M5 Wave 1 wired `chonkie@^0.3.0` into `packages/pack/package.json` after
+a 2026-05-05 research yaml. Reality: the npm namespace is split across
+three plausible names — `chonkie-ts` (PolyerAI squatter, v0.0.1, 2.6 kB,
+abandoned), the bare `chonkie` (chonkie-inc-owned but undocumented for
+TS callers), and the canonical TS port `@chonkiejs/core@^0.0.9`. Only
+the upstream `chonkie-inc/chonkiejs` README install command disambiguates.
+T-W2-5 retracted to `@chonkiejs/core` after grounding (commit 77f37c3:
+`chore(pack): switch chonkie dep to @chonkiejs/core@^0.0.9`).
+
+## Lesson
+
+Before pinning any npm dep — especially for an emergent library — open
+the upstream repository's README and copy the literal `npm install` /
+`pnpm add` line. The npm registry has stale squatters and unsuffixed
+namesakes that look canonical but aren't. The upstream README is the
+only authoritative source for "which package name does the maintainer
+actually ship to". Apply this rule when:
+
+- The package shows up in research yaml without a verified install command.
+- A `-ts` / `-js` suffixed variant exists alongside the bare name.
+- npm-side metadata (last publish, weekly downloads, deps) looks thin.
+
+Concrete checks for a candidate dep:
+
+1. Pull the repo README and grep for `npm install` / `pnpm add` / `yarn add`.
+2. Cross-check the package.json `name` in the upstream repo against the
+ pinned name.
+3. If the bare name and a scoped `@org/pkg` name both exist, prefer the
+ scoped name unless the README install line says otherwise.
+
+## Why
+
+npm name-squatting is undefended; the registry has no concept of
+"canonical port". The upstream maintainer's README is the only source
+of truth that survives organization renames, scope migrations, and
+abandoned forks. This is cheap to check (one README fetch) and stops
+shipping a 2.6 kB stub or an undocumented unsuffixed namesake to
+production.
diff --git a/.erpaval/solutions/conventions/release-published-event-needs-pat-or-inline.md b/.erpaval/solutions/conventions/release-published-event-needs-pat-or-inline.md
new file mode 100644
index 00000000..ae18bb4c
--- /dev/null
+++ b/.erpaval/solutions/conventions/release-published-event-needs-pat-or-inline.md
@@ -0,0 +1,68 @@
+---
+name: release-published events from default GITHUB_TOKEN do not fire downstream workflows
+description: A workflow listening on `release: [published]` will not run automatically when release-please-action creates the release with the default GITHUB_TOKEN — inline the asset-attach in release-please.yml instead, gated on `steps.release.outputs.release_created`
+type: knowledge
+tags: [github-actions, release-please, release-published, github-token, sbom, code-pack, ci]
+session: session-85faf1
+ac: AC-D-4
+---
+
+## Context
+
+Track D's AC-D-4 needed to attach a `codehub code-pack` artifact to every GitHub release. The spec offered two options: (a) extend `release-please.yml`, or (b) ship a separate `code-pack-release.yml` listening on `release: [published]`. Existing `sbom.yml` already uses option (b). Option (b) seemed cleaner — workflow-per-concern.
+
+Research surfaced a critical GitHub Actions safety rule documented in both the release-please-action README and the GitHub Actions docs:
+
+> When you use the repository's `GITHUB_TOKEN` to perform tasks, events triggered by the `GITHUB_TOKEN` will not create a new workflow run.
+
+Implication: when `googleapis/release-please-action@v5` runs with the default `GITHUB_TOKEN` (which it does by default — no PAT configured) and creates a release, that release's `published` event does NOT fire any other workflow. The downstream workflow only runs on:
+
+- a manual UI publish,
+- `workflow_dispatch:`, or
+- `gh release create` invoked by a real user / PAT-authenticated automation.
+
+This means option (b) silently never runs in normal automated releases. The sbom.yml in this repo was working only by accident — every published release was a manual `workflow_dispatch:` or UI-triggered run, never the natural release-please flow.
+
+## Lesson
+
+When attaching artifacts to a release that release-please publishes:
+
+1. **Inline the asset-attach steps in `release-please.yml`**, gated on `steps.release.outputs.release_created`. This is the pattern the upstream release-please-action README recommends. Example:
+
+ ```yaml
+ - uses: googleapis/release-please-action@v5
+ id: release
+ with: {...}
+
+ - if: ${{ steps.release.outputs.release_created }}
+ uses: actions/checkout@v6
+ with: { fetch-depth: 0 }
+
+ - if: ${{ steps.release.outputs.release_created }}
+ run:
+
+ - if: ${{ steps.release.outputs.release_created }}
+ env: { GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} }
+ run: gh release upload "${{ steps.release.outputs.tag_name }}" artifact.tar.gz --clobber
+ ```
+
+2. **The alternative is a `repo`-scoped Personal Access Token** (`RELEASE_PLEASE_PAT`) passed to `release-please-action`. The PR open / release create runs under the PAT's identity, and the resulting `release: published` event then fires downstream workflows. This adds secret-management cost but lets you keep one workflow per concern.
+
+3. **Audit existing `release: [published]` workflows in any repo using release-please-action with default GITHUB_TOKEN.** They are silent no-ops in the natural release flow. In this repo, `sbom.yml` is one such workflow and is flagged for a follow-on PR.
+
+## Why this matters
+
+The bug is silent — every release looks fine until someone notices the release page is missing the artifact. The first symptom is usually a customer asking "where's the SBOM?" months after the release. Detection costs more than the fix.
+
+For Track D, inlining was a one-step pattern shift; the alternative would have been a release that ships `release-please-action` updates with a code-pack artifact attached IF AND ONLY IF the release was triggered manually — exactly the failure mode I was being paid to prevent.
+
+## Carry-forward
+
+- Migrate `sbom.yml` to the same inline pattern (1-line workflow change). Out of scope for Track D; flagged as adjacent debt in the PR.
+- When future tracks add new release artifacts, default to the inline pattern.
+
+## References
+
+- Research artifact: `.erpaval/sessions/session-85faf1/research-track-d.md§7`
+- Implementation: PR #75 commit `1ab82a6` (`.github/workflows/release-please.yml`)
+- GitHub docs:
diff --git a/.erpaval/solutions/conventions/test-env-hermeticity-for-backend-precedence.md b/.erpaval/solutions/conventions/test-env-hermeticity-for-backend-precedence.md
new file mode 100644
index 00000000..38e172e9
--- /dev/null
+++ b/.erpaval/solutions/conventions/test-env-hermeticity-for-backend-precedence.md
@@ -0,0 +1,71 @@
+---
+name: Tests for backend-precedence libraries must wipe all env keys in the precedence chain, not just the one they assert
+description: When an SDK picks a backend by env presence (CODEHUB_EMBEDDING_SAGEMAKER_ENDPOINT, CODEHUB_EMBEDDING_URL, ...), tests of "backend X is picked when only X's env is set" must scope-stash every key in the chain, not only the local one
+type: conventions
+---
+
+`packages/embedder/src/http-embedder.test.ts:441,458` asserted that
+`tryOpenHttpEmbedder` returns `null` when its specific env var is unset.
+The test only stashed `CODEHUB_HOME`. With
+`CODEHUB_EMBEDDING_SAGEMAKER_ENDPOINT` exported in the operator's shell,
+the higher-precedence SageMaker backend short-circuited, the assertion
+flipped, and the test failed — but only on the specific dev box where
+the operator was working through SageMaker integration.
+
+The fix: a `sanitizeEmbeddingEnv()` helper that snapshots and wipes
+every `CODEHUB_EMBEDDING_*` key plus `CODEHUB_HOME`, restored on
+teardown via `beforeEach`/`afterEach`:
+
+```ts
+function sanitizeEmbeddingEnv() {
+ const saved: Record = {};
+ for (const k of Object.keys(process.env)) {
+ if (k.startsWith("CODEHUB_EMBEDDING_") || k === "CODEHUB_HOME") {
+ saved[k] = process.env[k];
+ delete process.env[k];
+ }
+ }
+ return () => {
+ for (const [k, v] of Object.entries(saved)) {
+ if (v === undefined) delete process.env[k]; else process.env[k] = v;
+ }
+ };
+}
+```
+
+**Why:** the backend-precedence pattern is a chain — env-X-set → backend-X,
+else env-Y-set → backend-Y, else fallback. A test that asserts about
+backend Y must explicitly clear backend-X's env, otherwise the assertion
+silently tests the wrong code path under any operator who happens to
+have backend-X configured. The failure is non-reproducible on a clean
+laptop, fires on a dev box with the higher-precedence env exported.
+This is exactly the env-leak class that bedevils CI-vs-local divergence
+debugging.
+
+**How to apply:**
+
+1. **Identify the precedence chain.** For OCH embedder:
+ `CODEHUB_EMBEDDING_SAGEMAKER_ENDPOINT` → `CODEHUB_EMBEDDING_URL` →
+ `CODEHUB_EMBEDDING_*` (HTTP options) → `CODEHUB_HOME` (local ONNX).
+ Any test that asserts about backend selection must wipe the entire
+ chain, not just one key.
+2. **Stash with a prefix glob, not a fixed key list.** `Object.keys`
+ filtered by `startsWith("CODEHUB_EMBEDDING_")` catches keys added
+ later (e.g. a future `CODEHUB_EMBEDDING_AZURE_*`) without revisiting
+ every test.
+3. **Wire it as `beforeEach`/`afterEach`, not per-case try/finally.**
+ Easier to audit; harder to forget on the next case.
+4. **Apply defensively to sibling describe blocks.** Even cases that
+ don't care about the env can be poisoned by stale state from a prior
+ test that mutated `process.env`. Hermetic test suites don't pay a
+ cost for being defensive.
+
+Anti-pattern: per-case `originalKey = process.env[KEY]; ... finally
+process.env[KEY] = originalKey` for a single key. The single-key save
+worked when there was one env var; with a chain, every test that misses
+a sibling key in the chain becomes flaky on operator boxes.
+
+Cross-link: pairs with the existing `sagemaker-embedder-backend.md`
+durable lesson — that one covers the SDK-side dynamic-import + soft-fail
+pattern; this one covers the test-side env-hermeticity pattern that
+that pattern requires.
diff --git a/.erpaval/specs/004-m3-m4/spec.md b/.erpaval/specs/004-m3-m4/spec.md
new file mode 100644
index 00000000..aee41f06
--- /dev/null
+++ b/.erpaval/specs/004-m3-m4/spec.md
@@ -0,0 +1,271 @@
+# EARS Spec 004 — M3 LadybugDB phase-1 + M4 Language expansion
+
+**Session**: session-a591fa · **Branch**: `feat/v1-m3-m4` · **Parent roadmap**: `.erpaval/ROADMAP.md` §M3 + §M4
+
+## Context (Explore + Research consolidated)
+
+### M3 — LadybugDB phase-1
+
+- `IGraphStore` seam at `packages/storage/src/interface.ts:11-64` is already the abstraction point. No shape change needed.
+- `graphHash` is computed in `packages/core-types/src/graph-hash.ts:20-45` from the **in-memory `KnowledgeGraph`**, never from store rows. Parity test: `graph → LbugStore → rebuildGraphFromStore → graphHash === original`. Template exists at `packages/storage/src/duckdb-adapter.test.ts:89,206-229`.
+- **Current edge-kind count is 23** (`duckdb-adapter.ts:71-96`) — roadmap's "21 types" is stale; OCH has drifted past with `FOUND_IN`, `DEPENDS_ON`, `OWNED_BY`, `WRAPS`, `QUERIES`, `REFERENCES`, `ACCESSES`. OCH uses `PROCESS_STEP` where GitNexus uses `STEP_IN_PROCESS` (banned literal).
+- **LadybugDB pattern correction** (supersedes roadmap L58): idiomatic LadybugDB uses **polymorphic rel tables — one named rel table per edge kind, each with multiple `FROM/TO` pairs**. NOT a single `CodeRelation` rel table with a `type` property column — that defeats columnar predicate pushdown. Research URL: `docs.ladybugdb.com/cypher/data-definition/create-table`.
+- **npm package**: `@ladybugdb/core@^0.16.1` (latest as of 2026-05-04). GitNexus pins 0.15.2. `lbug@0.14.3` is a stale mirror — ignore.
+- **Concurrency**: one process-wide `READ_WRITE` `Database` + pool of `Connection` objects. GitNexus's `pool-adapter.ts` (611 LOC) is user-space wrapper, not library convention — worth lifting but re-audit for current (v0.16) behavior vs v0.15.
+- **Banned literals**: `kuzu`, `ladybug`, `STEP_IN_PROCESS`, `duckpgq` are banned in tracked source by `scripts/check-banned-strings.sh`. `@ladybugdb/core` in `package.json` is allowed (not a banned form). `.erpaval/` is excluded from the scan. The `LbugStore` class name and file paths `lbug-adapter.ts` / `lbug-pool.ts` use the "lbug" token which triggers the banned literal. **Resolution**: rename everything to `GraphDbStore` / `graphdb-adapter.ts` / `graphdb-pool.ts` at the source level; keep `@ladybugdb/core` as the dep name (the package scope is exempt by precedent).
+
+### M4 — Language expansion + COBOL + framework detection
+
+- 5 live SCIP adapters in `packages/scip-ingest/src/runners/index.ts:18` as a string union `"typescript" | "python" | "go" | "rust" | "java"`. No provider-registry abstraction. Adding `clang | ruby | dotnet | kotlin` = extend union + add `buildCommand` cases.
+- **No scip-* binary downloads**: `codehub setup` only handles embeddings weights + plugin. New adapters assume binaries on `$PATH` (returns `kind: "missing"` on ENOENT). M4 must add `scip-downloader.ts` mirroring `embedder-downloader.ts` (sha256 pin + atomic rename).
+- 15 tree-sitter grammars in `grammar-registry.ts:36-52`, compile-time-enforced via `satisfies` on `LanguageId`. **No regex-provider escape hatch**; COBOL T-M4-5 cannot reuse the registry without introducing one.
+- 23-framework catalog at `frameworks-catalog.ts:437`, inline in `packages/ingestion`. Emits `{name, category, confidence: "deterministic"|"heuristic"|"composite", signals[], variant?, version?, parentName?}` — roadmap asks for numeric `confidence` + `evidence[]`. Plan must choose: **keep current discriminator** (string tag) + rename `signals` → `evidence` (cheaper), or go numeric (bigger change, arguable utility for 1 user).
+- **5 detection stages coverage**: manifest ✅, lockfile ❌ (ignored today), config-AST ❌ (exact-match only, no parse), folder-convention partial, import/SCIP ❌.
+- **No JVM subprocess prior art** — ProLeap v4 (T-M4-6) is greenfield. Grep empty for `java -jar`, `spawn.*java`, `jbang`. Needs new package + JRE probe.
+- **ProLeap NOT on Maven Central** — `search.maven.org` returns `numFound: 0` for `io.github.uwol:proleap-cobol-parser`; latest GitHub Release is v2.4.0 (2018). M4-6 must `git clone + mvn install` into a vendored JAR OR ship a prebuilt JAR under `vendor/proleap/`.
+- **tree-sitter-cobol published releases dead** (last tagged v0.1.1, 2023-02-01 per GitHub Releases API). Commit activity on default branch through 2025 but no tagged release. COBOL strategy stays as roadmap spec'd: regex hot path primary + ProLeap deep-parse gated.
+- **`--allow-build-scripts`** is internal `RunIndexerOptions` boolean at `runners/index.ts:25` — never surfaced at CLI. T-M4-6 needs CLI flag + plumbing.
+
+### Banned-string sensitivities
+
+- `kuzu`, `ladybug`, `STEP_IN_PROCESS` are guardrail-banned in tracked source.
+- Source-level naming: `GraphDbStore` / `graphdb-adapter.ts` / `graphdb-pool.ts` (not `LbugStore`).
+- `@ladybugdb/core` in `package.json` — precedent: `@opencodehub/*` scoped packages with banned substrings are allowed when the scope identifier is the whole token. Verify by running `bash scripts/check-banned-strings.sh` after adding the dep; if it flags, add an allowlist exclusion for `package.json` + `pnpm-lock.yaml` (already excluded).
+
+## Ubiquitous requirements
+
+- **U1**: The v1.0 roadmap's graphHash byte-identity invariant MUST hold across both stores — `graph → DuckDbStore → rebuildGraphFromStore → graphHash` and `graph → GraphDbStore → rebuildGraphFromStore → graphHash` MUST be equal.
+- **U2**: No tracked source file MUST introduce the banned literals `kuzu`, `ladybug`, `STEP_IN_PROCESS`, `heuristicLabel`, `codeprobe`, or `STEP_IN_FLOW`. `bash scripts/check-banned-strings.sh` MUST exit 0 post-commit.
+- **U3**: `mise run check` MUST exit 0 after every commit.
+- **U4**: Every new package MUST carry `@opencodehub/` naming, Apache-2.0 license, `type: module`, `tsc --noEmit` clean.
+- **U5**: No LLM calls in any M3/M4 path outside the existing `@opencodehub/summarizer` package.
+
+## M3 — Event-driven requirements
+
+- **E-M3-1**: When `CODEHUB_STORE=lbug` is set, `analyze`, `query`, `context`, `impact`, and `sql` CLI/MCP surfaces MUST route through `GraphDbStore` instead of `DuckDbStore`.
+- **E-M3-2**: When the `sql` MCP tool receives a `cypher` input field, it MUST evaluate as read-only Cypher against `GraphDbStore`. Write operations (`CREATE`, `DELETE`, `SET`, `MERGE`) MUST be rejected by `cypher-guard.ts` (mirror of `sql-guard.ts`).
+- **E-M3-3**: When both `sql` and `cypher` inputs are provided to the `sql` MCP tool, the tool MUST reject the call with a clear "choose one" message.
+
+## M3 — State-driven requirements
+
+- **S-M3-1**: While `CODEHUB_STORE` is unset or `=duck`, `DuckDbStore` remains the default; `GraphDbStore` is not loaded.
+- **S-M3-2**: While `@ladybugdb/core` is absent (unreachable import — should not happen because it's a hard dep, but CI platforms without prebuilt binaries will surface this), `GraphDbStore.open()` MUST fail with a clear "`@ladybugdb/core` native binding unavailable on this platform; use `CODEHUB_STORE=duck`" message — not a bare module-not-found stack trace.
+- **S-M3-3**: While a `GraphDbStore` database file exists from a prior `@ladybugdb/core` version (ABI mismatch), `open()` MUST emit a runbook hint pointing at the re-analyze path (`codehub analyze --force`), not silently truncate.
+
+## M3 — Unwanted-behavior requirements
+
+- **W-M3-1**: `GraphDbStore` MUST NOT call `conn.query()` concurrently against a single `Connection` — the pool adapter enforces one-query-per-connection at a time.
+- **W-M3-2**: Cypher write operations (`CREATE`, `DELETE`, `SET`, `MERGE`, `REMOVE`) MUST NOT pass the `cypher-guard.ts` read-only check. The `sql` MCP tool stays read-only regardless of store backend.
+- **W-M3-3**: The M3 phase-1 MUST NOT flip the default backend to `lbug`. That is T-M7-1.
+
+## M3 — Acceptance criteria
+
+### AC-M3-1: GraphDbStore scaffolding
+
+- [ ] `packages/storage/src/graphdb-adapter.ts` — `GraphDbStore implements IGraphStore`, constructor takes path, lazy-imports `@ladybugdb/core`
+- [ ] `packages/storage/src/graphdb-schema.ts` — DDL translator; per-kind `CREATE NODE TABLE` + one polymorphic rel table per edge kind
+- [ ] `packages/storage/src/graphdb-pool.ts` — lifted from GitNexus `pool-adapter.ts` (611 LOC), renamed, internals audited for v0.16 API compatibility
+- [ ] `packages/storage/src/index.ts` — export `GraphDbStore`; add `openStore(opts)` factory reading `CODEHUB_STORE`
+- [ ] `packages/storage/package.json` — add `@ladybugdb/core: ^0.16.1` as hard dep (direct dependency, not optional peer — user-approved 2026-05-05)
+- [ ] Banned-strings gate passes (no `kuzu`/`ladybug` in source)
+- [P]
+- **Dependencies**: none
+
+### AC-M3-2: Pool adapter + concurrency tests
+
+- [ ] `graphdb-pool.ts` integration test: 100 concurrent reads against one Database do not segfault or deadlock
+- [ ] Checkout/checkin queue semantics preserved from GitNexus pool (`MAX_CONNS_PER_REPO=8`, 15s waiter timeout, 30s query timeout, 60s idle sweep)
+- [ ] Timeout propagates into `IGraphStore.query()` `timeoutMs` correctly
+- **Dependencies**: AC-M3-1
+
+### AC-M3-3: Schema translation + round-trip
+
+- [ ] All 23 edge kinds from `duckdb-adapter.ts:71-96` have corresponding rel tables in `graphdb-schema.ts`
+- [ ] `PROCESS_STEP` (OCH-native, not the banned `STEP_IN_PROCESS`) maps to a rel table named `ProcessStep` (or similar — no banned literal)
+- [ ] `bulkLoad(graph, "replace")` + `rebuildGraphFromStore(graphdbStore)` round-trip produces a graph with identical nodes, edges, and properties as the input
+- **Dependencies**: AC-M3-1
+
+### AC-M3-4: graphHash parity gate (CI)
+
+- [ ] New file `packages/storage/src/graph-hash-parity.test.ts`
+- [ ] Against 3 fixture graphs (small, medium, large) assert `duckHash === graphdbHash`
+- [ ] Wired into `mise run check`
+- [ ] Test runs in <30s so it stays in the hot validate path
+- **Dependencies**: AC-M3-3
+
+### AC-M3-5: sql MCP tool dual-emit (sql | cypher)
+
+- [ ] `packages/mcp/src/tools/sql.ts` accepts optional `cypher` input field
+- [ ] `packages/storage/src/cypher-guard.ts` mirrors `sql-guard.ts` — allows `MATCH`, `RETURN`, `WITH`, `WHERE`, `ORDER BY`, `LIMIT`, `SKIP`, `UNWIND`, `CALL READ_ONLY_PROCEDURES`; rejects writes
+- [ ] When `CODEHUB_STORE=duck`, `cypher` input returns "cypher unavailable without `CODEHUB_STORE=lbug`"
+- [ ] Timeout path shared between sql + cypher branches
+- **Dependencies**: AC-M3-4
+
+### AC-M3-6: ADR — LadybugDB swap rationale
+
+- [ ] `docs/adr/NNNN-ladybugdb-graph-store.md` (numeric pick from existing ADR numbering)
+- [ ] Documents the 3-phase plan (M3 opt-in → M7 default → DuckDB legacy-only), polymorphic rel-table-per-kind decision, pool adapter rationale, banned-literal renaming strategy, Apache AGE fallback
+- [ ] Does NOT contain banned literals outside the banned-strings allowlist scope
+- **Dependencies**: AC-M3-5
+
+## M4 — Event-driven requirements
+
+- **E-M4-1**: When `codehub analyze` runs on a repo containing `*.c`/`*.cpp`/`*.h`, it MUST invoke `scip-clang` if the binary is on `$PATH` or was installed via `codehub setup --scip=clang`.
+- **E-M4-2**: When the user invokes `codehub setup --scip=`, the CLI MUST download the platform-specific binary, verify its sha256 against the pinned hash, and install into `~/.codehub/bin/` (or equivalent).
+- **E-M4-3**: When `codehub analyze` encounters COBOL files (`.cbl`, `.cob`, `.cpy`), it MUST run the regex hot path (T-M4-5) unconditionally, and MUST run the ProLeap deep-parse (T-M4-6) only when `--allow-build-scripts=proleap` is passed.
+- **E-M4-4**: When the 5-stage framework-detection pipeline emits a detection, the result MUST include `{name, version?, confidence, evidence[]}` where `confidence` is one of the discriminator strings (`"deterministic"|"heuristic"|"composite"`) AND `evidence[]` lists the stage(s) that produced the signal.
+
+## M4 — State-driven requirements
+
+- **S-M4-1**: While a SCIP adapter's binary is not installed, `codehub analyze` MUST skip that language cleanly (not crash) and emit a setup hint.
+- **S-M4-2**: While `java --version` fails or reports < 17, `codehub analyze --allow-build-scripts=proleap` MUST refuse to run and emit a clear install hint for JRE 17+.
+- **S-M4-3**: While the ProLeap JAR is not vendored under `vendor/proleap/proleap-cobol-parser-.jar`, `codehub analyze --allow-build-scripts=proleap` MUST fail with the specific missing-jar path.
+
+## M4 — Unwanted-behavior requirements
+
+- **W-M4-1**: The COBOL ProLeap path MUST NOT run by default — only when the user explicitly passes `--allow-build-scripts=proleap`. This protects against unexpected JVM subprocess spawns.
+- **W-M4-2**: The 5-stage framework-detection pipeline MUST NOT call out to network / LLM / any service. It's a pure-local file-system + AST inspection.
+- **W-M4-3**: Scip adapters MUST NOT download binaries at analyze time. All downloads happen via `codehub setup`.
+- **W-M4-4**: The framework-catalog MUST NOT double-trigger when both manifest and lockfile signals fire (the composite already handles this — do not regress).
+
+## M4 — Acceptance criteria
+
+### AC-M4-1: scip-clang adapter
+
+- [ ] Add `"clang"` to `IndexerKind` union in `packages/scip-ingest/src/runners/index.ts`
+- [ ] `buildCommand("clang", opts)` → `scip-clang index --output ` from project root with `compile_commands.json` preflight check
+- [ ] `scip-clang` version pin: v0.4.0 (2026-02-23), binary URL pattern `github.com/sourcegraph/scip-clang/releases/download/v0.4.0/scip-clang-x86_64-{linux|darwin}`
+- [ ] Tests: mock-binary invocation, missing-binary skip path, `compile_commands.json` missing → specific error
+- [P]
+- **Dependencies**: AC-M4-0 (downloader — see below)
+
+### AC-M4-2: scip-ruby adapter
+
+- [ ] Add `"ruby"` to `IndexerKind` union
+- [ ] `buildCommand("ruby")` → `scip-ruby --index-file ` (verify invocation against scip-ruby v0.4.7 docs)
+- [ ] Pin: v0.4.7 (2024-11-07), multi-arch: linux-x64, linux-arm64, darwin-x64, darwin-arm64
+- [P]
+- **Dependencies**: AC-M4-0
+
+### AC-M4-3: scip-dotnet adapter
+
+- [ ] Add `"dotnet"` to `IndexerKind` union
+- [ ] `buildCommand("dotnet")` → `scip-dotnet index -o