diff --git a/.erpaval/INDEX.md b/.erpaval/INDEX.md index dac938fd..a98e4817 100644 --- a/.erpaval/INDEX.md +++ b/.erpaval/INDEX.md @@ -34,6 +34,10 @@ development sessions. Solutions are reusable; specs are per-feature. - [No spec-coordinate leakage into source](solutions/best-practices/no-spec-coordinate-leakage-into-source.md) — ERPAVal `AC-*`, `M-*`, `W-*`, `CL-*` prefixes belong in commits, PR bodies, ADR refs sections — NOT in JSDoc, inline comments, CLI flag help, MCP tool descriptions, or test names. Sweep `rg -n "AC-[A-Z]-[0-9]" packages/` before every PR-open; LLM clients pick up the leakage and start citing it back. - [release: published events need PAT or inline](solutions/conventions/release-published-event-needs-pat-or-inline.md) — release-please-action with default `GITHUB_TOKEN` does NOT fire downstream `release: [published]` workflows; inline asset-attach in `release-please.yml` gated on `steps.release.outputs.release_created`. Fixed AC-D-4; sbom.yml has same latent bug for follow-on. - [Dogfood pre-push hook catches CLI spec drift on first push](solutions/best-practices/dogfood-prepush-hook-caught-cli-spec-mismatch.md) — the first `git push` of the commit that adds a self-targeting pre-push hook is where spec/CLI-flag mismatches and "missing index" foot-guns surface. Pattern: SKIP-with-message shape from `pack-determinism-audit.sh` for any gate that depends on a derived artifact. +- [Cherry-pick verified bug fixes from a sibling testbed clone](solutions/best-practices/cherry-pick-from-sibling-testbed.md) — when a post-filter sibling has authored fix commits with file:line repro coordinates, fetch the sibling and cherry-pick directly; preserves authorship, halves review surface, defeats re-author drift. +- [Bench dashboard ↔ acceptance script banner-text parity](solutions/architecture-patterns/bench-dashboard-acceptance-script-parity.md) — when a dashboard parses banners by exact-string match, the two artifacts must be edited together; add a roster-shape test that pulls the banner list from the script directly. Surfaced 9-of-17 gates rendering by Bug #4 in 2026-05-10 smoke campaign. +- [Test env hermeticity for backend-precedence libraries](solutions/conventions/test-env-hermeticity-for-backend-precedence.md) — when an SDK picks a backend by env presence, tests must scope-stash every key in the chain via prefix glob, not just the one they assert on. Per Bug #7 in 2026-05-10 smoke: `CODEHUB_EMBEDDING_*` chain. +- [Parallel docs subagents over-scrub ADR coordinates](solutions/best-practices/parallel-docs-subagent-overscrubs-adrs.md) — PR #74's carve-out for ADR text isn't visible in the durable lesson; brief docs subagents explicitly that `docs/adr/*` retains spec coordinates. ## Specs diff --git a/.erpaval/solutions/architecture-patterns/bench-dashboard-acceptance-script-parity.md b/.erpaval/solutions/architecture-patterns/bench-dashboard-acceptance-script-parity.md new file mode 100644 index 00000000..caac01ff --- /dev/null +++ b/.erpaval/solutions/architecture-patterns/bench-dashboard-acceptance-script-parity.md @@ -0,0 +1,66 @@ +--- +name: A dashboard that parses banner-text from a script must mirror the script's banners verbatim +description: Bench/dashboard tools that index gates/jobs by exact-title match against a script's banner output drift silently when the script grows new gates — both files must be edited together +type: architecture-patterns +--- + +`packages/cli/src/commands/bench.ts` indexes gate rows by exact-string +match against `scripts/acceptance.sh` banners (`N/17: `). When +the script grew from 9 to 17 gates and changed a few existing banner +titles ("graphHash determinism" → "determinism (double-run graphHash)"), +the dashboard didn't follow. Result: 8 gates never advance past +"pending" and post-stream get stamped "skipped — script crashed" by the +crash-fallback path; another 3 displayed under stale titles. Operators +saw 9/17 gates with confusing detail strings. + +The original code shape: + +```ts +export const MVP_GATES: readonly { id: string; title: string }[] = [ + { id: "install", title: "pnpm install --frozen-lockfile" }, + // ... 8 more, with stale titles +]; + +export function applyLine(rows: GateRow[], rawLine: string): void { + const banner = /^\d+\/\d+:\s+(.*)$/.exec(line); + if (banner) { + const idx = rows.findIndex((r) => r.title === banner[1]); // exact match + if (idx >= 0) currentGateIdx = idx; + } +} +``` + +**Why:** the dashboard is a thin presenter over the script's stdout. Any +banner text not in `MVP_GATES` is silently dropped. There is no compile- +time signal — the build is green, the unit tests are green, only the +runtime UX degrades. The same gap also caught `[SKIP]` markers: the +original `applyLine` matched `[PASS]`/`[FAIL]` but not `[SKIP]`, so +gracefully-degrading gates rendered as "skipped — script crashed" via +the crash-fallback path with a misleading detail string. + +**How to apply:** + +1. **Treat banner titles as a contract** between the script and the + dashboard. Edit both files in the same commit. +2. **Add a roster-shape test.** Assert `MVP_GATES.length === 17` AND + `MVP_GATES.map(g => g.title)` matches the banner sequence the script + emits. The test pulls the banner list from the script directly with + `grep -oE '^echo "\d+/\${TOTAL_GATES}: (.+)"$' scripts/acceptance.sh` + so the assertion follows the source of truth. +3. **Match every marker the script emits.** If the script emits `[PASS]`, + `[FAIL]`, AND `[SKIP]`, the parser must handle all three. The + crash-fallback path must NOT fire for legitimate skips. +4. **Order matters when index = listr2 row.** `MVP_GATES` order must + match script execution order — the dashboard advances rows by index + as banners stream in. + +Anti-pattern: a "we'll keep them in sync manually" comment without an +enforcement test. The 9-gate / 17-gate drift sat in `main` undetected +because no CI surface failed when the script grew. Surfacing it +required an operator to run `codehub bench` and notice the visual +mismatch. + +Cross-link: the `dogfood-prepush-hook-caught-cli-spec-mismatch` durable +lesson covers a related pattern — the dogfood pre-push hook on this +exact PR was where this bug was first surfaced (Bug #4 in +UPSTREAM_BUGS.md, 2026-05-10 smoke). diff --git a/.erpaval/solutions/best-practices/cherry-pick-from-sibling-testbed.md b/.erpaval/solutions/best-practices/cherry-pick-from-sibling-testbed.md new file mode 100644 index 00000000..afd63246 --- /dev/null +++ b/.erpaval/solutions/best-practices/cherry-pick-from-sibling-testbed.md @@ -0,0 +1,52 @@ +--- +name: Cherry-pick verified bug fixes from a sibling testbed clone +description: When a sibling/post-filter checkout has authored fix commits with file:line repro coordinates, fetch the sibling and cherry-pick directly — no need to re-author or re-test on upstream +type: best-practices +--- + +When you maintain a "post-filter testbed" sibling repo for smoke / dogfood +campaigns and you've already authored fix commits there with verified +repros, do not re-write the fixes on upstream. Fetch the sibling as a +local remote and cherry-pick. + +**Why:** The fix has already been authored, repro'd, verified. Re-authoring +on upstream loses authorship metadata, doubles review surface, and +introduces drift between what was fixed and what landed. Re-testing +re-validates the same green path. The cherry-pick is provably equivalent +when the file:line coordinates in the fix message match upstream HEAD. + +**How to apply:** + +1. **Verify file:line parity first.** Each fix in the testbed report + should cite file paths and line numbers; quickly grep upstream to + confirm the same lines exist there. Per Bug #2 in OCH 2026-05-10 + campaign: `packages/cli/src/commands/scan.ts:162-171` was identical in + testbed and upstream — direct cherry-pick worked. +2. **Fetch the sibling as a path remote.** No need to register it + permanently. One-shot: + ```bash + git fetch /efs/lalsaado/workplace/opencodehub.post-filter --no-tags + ``` + `FETCH_HEAD` now points at the sibling's HEAD; commits referenced by + short-hash become resolvable. +3. **Cherry-pick in severity order.** HIGH first, MEDIUM next, LOW last. + Each pick is one commit; do not squash them into a "umbrella fix" + commit — preserves blame and lets the PR reviewer see one + self-contained fix per scope. +4. **Re-verify after each pick** with the package-scoped check: + `pnpm -F @opencodehub/<pkg> test` plus any smoke script the fix + targets (`bash scripts/smoke-mcp.sh`, `node ... doctor`, etc.). +5. **Prefer one PR for the bundle** when the fixes are small and + thematically related (a "v1 upstream bug sweep") — reviewer context + stays coherent. Split only if the bundle exceeds reviewability. + +Anti-pattern: re-authoring the fix on upstream and citing the testbed +commit in the body. That loses the original commit's authorship and +makes blame point at the re-author for code that was thought-through +elsewhere. If you need to adapt the fix to upstream divergence, do that +as a follow-up commit on top of the cherry-pick, not a rewrite. + +Related: this pairs naturally with the durable lesson "Squash-merge +masks pre-existing repo-wide debt" — run `mise run check` on upstream +BEFORE the cherry-pick to baseline-clean, so any test regression after +the pick is unambiguously attributable to the picked fix. diff --git a/.erpaval/solutions/best-practices/no-spec-coordinate-leakage-into-source.md b/.erpaval/solutions/best-practices/no-spec-coordinate-leakage-into-source.md index d60b5c7a..d5bf17be 100644 --- a/.erpaval/solutions/best-practices/no-spec-coordinate-leakage-into-source.md +++ b/.erpaval/solutions/best-practices/no-spec-coordinate-leakage-into-source.md @@ -48,11 +48,19 @@ LLM clients then pick up and start citing back; the leakage compounds. ("graphHash parity: medium-with-empty-keywords ([] vs absent)") over the AC ("AC-C-2: graphHash..."). The behavior survives renames; AC numbers don't. -- **Sweep before commit.** Run `rg -n "AC-[A-Z]-[0-9]" packages/` +- **Sweep before commit.** Run `rg -n "AC-[A-Z]-[0-9]" packages/ scripts/` against your branch before PR-open. Anything that hits is a candidate for rephrase. If the comment NEEDS to cite the AC, use a short reference at the end like "(AC-C-5)" rather than leading with it. +- **Sweep scope is `packages/` and `scripts/`, NOT `docs/adr/*`.** PR #74 + (`f09d804`) carved out `docs/adr/*` as the explicit place where + coordinates ARE permanent decision rationale. A docs-refresh subagent + that sees the sweep regex without the scope qualifier will scrub + ADRs by default — DO NOT. Brief docs subagents explicitly that ADR + text retains coordinates. See the + `parallel-docs-subagent-overscrubs-adrs.md` lesson for the failure + mode. - **The test fakes are the trap.** When a Wave subagent edits a test fake, it tends to add `// AC-XXX: stubs ...` because it's writing the comment WITH the AC packet open in front of it. Sweep test files diff --git a/.erpaval/solutions/best-practices/parallel-docs-subagent-overscrubs-adrs.md b/.erpaval/solutions/best-practices/parallel-docs-subagent-overscrubs-adrs.md new file mode 100644 index 00000000..7cd307f6 --- /dev/null +++ b/.erpaval/solutions/best-practices/parallel-docs-subagent-overscrubs-adrs.md @@ -0,0 +1,61 @@ +--- +name: Parallel docs-refresh subagents must be told that ADR text is the carve-out where spec coordinates ARE allowed +description: When a docs-refresh subagent inherits the "no spec-coordinate leakage" rule from durable lessons, it will scrub ADR text by default — but PR #74 carved out docs/adr/* as the place where coordinates ARE the durable rationale; brief explicitly +type: best-practices +--- + +OCH PR #74 (`f09d804 chore(repo): scrub ERPAVal spec coordinates from +source`) explicitly retained spec coordinates in `docs/adr/*` as +"permanent decision rationale". The durable lesson +`no-spec-coordinate-leakage-into-source.md` documents the scrub but +does NOT crisply state the carve-out. When a parallel docs-refresh +subagent reads the durable lesson and is told "no spec-coordinate +leakage", it scrubs ADRs too — undoing PR #74's deliberate carve-out. + +Observed in OCH session 6c091d (2026-05-10 v1 upstream bug sweep): the +docs-refresh subagent stripped `AC-A-1`, `AC-A-2`, `AC-A-6 a/b/c/d`, +`AC-A-7`, `AC-A-9`, `AC-A-11` from ADR 0013-m7 and `AC-C-3`, `AC-C-5`, +`E-C-3`, `W-A-2` from ADR 0014. Required a follow-up +`docs(docs): restore ADR-permanent spec coordinates per PR #74 policy` +commit on the same branch. + +**Why:** the durable lesson's scope says "production source, JSDoc, +inline comments, CLI flag help, MCP tool option descriptions, test +names" — but the ADR carve-out lives only in PR #74's body. Subagents +read the lesson, not the PR archive. The carve-out is invisible to a +fresh agent. + +**How to apply:** + +1. **Brief docs subagents explicitly.** When seeding a docs-refresh + subagent prompt, include both rules: + - "No spec-coordinate prefixes in production source (per durable + lesson)." + - "ADR text is the carve-out: spec coordinates in `docs/adr/*` are + intentional permanent rationale per PR #74. Do NOT scrub them + there." +2. **Update the lesson itself.** Edit + `solutions/best-practices/no-spec-coordinate-leakage-into-source.md` + to add a "Scope" section that names `docs/adr/*` as the carve-out, + so future subagents reading the lesson see the constraint without + needing PR archaeology. +3. **Sweep with a scope-aware regex.** When auditing leakage, exclude + `docs/adr/*` from the sweep: + `rg -n 'AC-[A-Z]-[0-9]' packages/ scripts/` + not + `rg -n 'AC-[A-Z]-[0-9]'` (which would falsely flag ADRs). +4. **The reverse case is also valid.** `docs/adr/0014-*` originally + listed `.erpaval/specs/...` and `.erpaval/sessions/...` as + References — those paths are gitignored and rot once the packet + graduates. Replacing them with code-path citations IS correct, even + in ADR text. The carve-out is for spec-coordinate prefixes, not for + pointers to gitignored paths. + +Anti-pattern: writing a generic "scrub spec coords everywhere" rule and +then surprised when ADR rationale gets vacuumed. The leakage rule +exists to prevent rot; ADR rationale doesn't rot because the ADR is +the rationale. + +Cross-link: +[no-spec-coordinate-leakage-into-source](no-spec-coordinate-leakage-into-source.md) — the original rule. +PR #74 (`f09d804`) — the carve-out's authoritative source. diff --git a/.erpaval/solutions/conventions/test-env-hermeticity-for-backend-precedence.md b/.erpaval/solutions/conventions/test-env-hermeticity-for-backend-precedence.md new file mode 100644 index 00000000..38e172e9 --- /dev/null +++ b/.erpaval/solutions/conventions/test-env-hermeticity-for-backend-precedence.md @@ -0,0 +1,71 @@ +--- +name: Tests for backend-precedence libraries must wipe all env keys in the precedence chain, not just the one they assert +description: When an SDK picks a backend by env presence (CODEHUB_EMBEDDING_SAGEMAKER_ENDPOINT, CODEHUB_EMBEDDING_URL, ...), tests of "backend X is picked when only X's env is set" must scope-stash every key in the chain, not only the local one +type: conventions +--- + +`packages/embedder/src/http-embedder.test.ts:441,458` asserted that +`tryOpenHttpEmbedder` returns `null` when its specific env var is unset. +The test only stashed `CODEHUB_HOME`. With +`CODEHUB_EMBEDDING_SAGEMAKER_ENDPOINT` exported in the operator's shell, +the higher-precedence SageMaker backend short-circuited, the assertion +flipped, and the test failed — but only on the specific dev box where +the operator was working through SageMaker integration. + +The fix: a `sanitizeEmbeddingEnv()` helper that snapshots and wipes +every `CODEHUB_EMBEDDING_*` key plus `CODEHUB_HOME`, restored on +teardown via `beforeEach`/`afterEach`: + +```ts +function sanitizeEmbeddingEnv() { + const saved: Record<string, string | undefined> = {}; + for (const k of Object.keys(process.env)) { + if (k.startsWith("CODEHUB_EMBEDDING_") || k === "CODEHUB_HOME") { + saved[k] = process.env[k]; + delete process.env[k]; + } + } + return () => { + for (const [k, v] of Object.entries(saved)) { + if (v === undefined) delete process.env[k]; else process.env[k] = v; + } + }; +} +``` + +**Why:** the backend-precedence pattern is a chain — env-X-set → backend-X, +else env-Y-set → backend-Y, else fallback. A test that asserts about +backend Y must explicitly clear backend-X's env, otherwise the assertion +silently tests the wrong code path under any operator who happens to +have backend-X configured. The failure is non-reproducible on a clean +laptop, fires on a dev box with the higher-precedence env exported. +This is exactly the env-leak class that bedevils CI-vs-local divergence +debugging. + +**How to apply:** + +1. **Identify the precedence chain.** For OCH embedder: + `CODEHUB_EMBEDDING_SAGEMAKER_ENDPOINT` → `CODEHUB_EMBEDDING_URL` → + `CODEHUB_EMBEDDING_*` (HTTP options) → `CODEHUB_HOME` (local ONNX). + Any test that asserts about backend selection must wipe the entire + chain, not just one key. +2. **Stash with a prefix glob, not a fixed key list.** `Object.keys` + filtered by `startsWith("CODEHUB_EMBEDDING_")` catches keys added + later (e.g. a future `CODEHUB_EMBEDDING_AZURE_*`) without revisiting + every test. +3. **Wire it as `beforeEach`/`afterEach`, not per-case try/finally.** + Easier to audit; harder to forget on the next case. +4. **Apply defensively to sibling describe blocks.** Even cases that + don't care about the env can be poisoned by stale state from a prior + test that mutated `process.env`. Hermetic test suites don't pay a + cost for being defensive. + +Anti-pattern: per-case `originalKey = process.env[KEY]; ... finally +process.env[KEY] = originalKey` for a single key. The single-key save +worked when there was one env var; with a chain, every test that misses +a sibling key in the chain becomes flaky on operator boxes. + +Cross-link: pairs with the existing `sagemaker-embedder-backend.md` +durable lesson — that one covers the SDK-side dynamic-import + soft-fail +pattern; this one covers the test-side env-hermeticity pattern that +that pattern requires. diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 7e833474..0a32504e 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -90,3 +90,34 @@ jobs: --onlyAllow 'Apache-2.0;MIT;BSD-2-Clause;BSD-3-Clause;ISC;CC0-1.0;BlueOak-1.0.0;0BSD' --excludePrivatePackages --production + + # GitHub code-scanning advanced-setup is configured to look for an `osv` + # job in `ci.yml`. The standalone `osv.yml` workflow does the same scan, + # but the configured pointer lives here, so we mirror it: install + # osv-scanner, emit SARIF, upload to code-scanning, then fail the run on + # vulnerabilities. The standalone workflow remains for the weekly cron. + osv: + runs-on: ubuntu-latest + permissions: + contents: read + security-events: write + steps: + - uses: actions/checkout@v6 + - name: Install osv-scanner + run: | + curl -sL -o /tmp/osv-scanner \ + https://github.com/google/osv-scanner/releases/download/v2.3.8/osv-scanner_linux_amd64 + chmod +x /tmp/osv-scanner + - name: Scan pnpm-lock.yaml (SARIF output) + run: | + /tmp/osv-scanner scan source \ + --lockfile=pnpm-lock.yaml \ + --format=sarif \ + --output=osv.sarif || true + - uses: github/codeql-action/upload-sarif@v4 + if: always() + with: + sarif_file: osv.sarif + category: osv-scanner + - name: Fail on vulnerabilities + run: /tmp/osv-scanner scan source --lockfile=pnpm-lock.yaml diff --git a/.github/workflows/osv.yml b/.github/workflows/osv.yml index d58d10f9..dc7195d3 100644 --- a/.github/workflows/osv.yml +++ b/.github/workflows/osv.yml @@ -1,10 +1,12 @@ name: OSV-Scanner +# The push/pull_request runs were moved to `ci.yml`'s `osv` job so the +# GitHub code-scanning advanced-setup configuration (which is pinned to +# `ci.yml:osv`) finds them. This standalone workflow keeps the weekly +# cron + manual dispatch path so OSV advisory-data updates are picked +# up between PRs. on: - push: - branches: [main] - pull_request: - branches: [main] + workflow_dispatch: schedule: - cron: "33 5 * * 2"