Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
0b2c452
chore(ci): add `pack` to commitlint scope-enum for M5
theagenticguy May 6, 2026
332e595
feat(pack): scaffold @opencodehub/pack workspace (AC-M5-1)
theagenticguy May 6, 2026
f737fce
refactor(analysis): lift PageRank from scip-ingest (AC-M5-2)
theagenticguy May 6, 2026
ea75e93
feat(core-types): first-class RepoNode in graph (AC-M6-1)
theagenticguy May 6, 2026
9f4c25c
feat(mcp): structured AMBIGUOUS_REPO with choices[] + repo_uri alias …
theagenticguy May 6, 2026
4455457
feat(pack): BOM manifest + packHash helper (AC-M5-3)
theagenticguy May 6, 2026
d9d2875
feat(mcp): group_* tools emit repo_uri additively (AC-M6-4)
theagenticguy May 6, 2026
51431e5
feat(analysis): group_cross_repo_links MCP tool + v2 docmeta spec (AC…
theagenticguy May 6, 2026
f6af735
chore(pack): switch chonkie dep to @chonkiejs/core@^0.0.9
theagenticguy May 7, 2026
4cc60dd
docs(repo): close M6 — ADR 0012 + AMBIGUOUS_REPO cross-refs + 2-repo …
theagenticguy May 7, 2026
f96040b
chore(analysis): lift classifyDependencies from mcp
theagenticguy May 7, 2026
7aaf473
feat(storage): add IGraphStore.listNodes() across DuckStore + GraphDb…
theagenticguy May 7, 2026
36e1199
feat(pack): BOM items 2-4 — skeleton + file-tree + deps (AC-M5-4)
theagenticguy May 7, 2026
79e1139
feat(pack): BOM items 5-9 + generatePack assembly (AC-M5-5)
theagenticguy May 7, 2026
88fc835
feat(pack): Parquet embeddings sidecar (AC-M5-6)
theagenticguy May 7, 2026
2fc5e4d
feat(cli): codehub code-pack subcommand + pack_codebase via @opencode…
theagenticguy May 7, 2026
f8919ab
feat(plugin): codehub-code-pack skill (AC-M5-9)
theagenticguy May 7, 2026
08dba12
test(pack): byte-identity determinism suite + audit script (AC-M5-8)
theagenticguy May 8, 2026
4580ba2
docs(docs): compound lessons — session-e1d819 durable knowledge
theagenticguy May 8, 2026
82d4d42
chore(deps): regenerate pnpm-lock.yaml after rebase onto main
theagenticguy May 8, 2026
ea76e54
fix(pack): drop stat/read race in writeEmbeddingsSidecar
theagenticguy May 8, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions .erpaval/INDEX.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,11 @@ development sessions. Solutions are reusable; specs are per-feature.
- [llms-txt config strings quietly anchor doc accuracy](solutions/conventions/llms-txt-as-ground-truth.md) — in a Starlight site with `starlight-llms-txt`, `astro.config.mjs` is more load-bearing than prose READMEs; audit it first in doc-sync sweeps.
- [tsconfig project references go stale on package removal](solutions/conventions/tsconfig-project-references-stale-on-package-removal.md) — root tsconfig `references` drift is invisible until a root-scoped tsc invocation hits; clean up in the same commit as the package delete.
- [Astro NODE_ENV in CI — set it at script scope, not step scope](solutions/conventions/astro-node-env-in-ci-script-scope.md) — mise-action + pnpm + astro chain loses CI-level NODE_ENV overrides; hard-code in package.json `build` script.
- [tree-sitter-wasms catalog is unusable with web-tree-sitter 0.26+](solutions/architecture-patterns/tree-sitter-wasms-catalog-incompat.md) — 0.1.13 artifacts use legacy `dylink` section, web-tree-sitter hard-requires `dylink.0`. Build your own WASMs and commit them.
- [pnpm install hangs on EFS workdir](solutions/best-practices/pnpm-install-on-efs.md) — 8+ min → 4.6s with `store-dir=/home/...` in `~/.npmrc` + `UV_USE_IO_URING=0`. Two stacked causes: cross-fs store and AL2023 io_uring bug.
- [Finch as docker shim via PATH for CLIs that shell out to `docker`](solutions/best-practices/finch-as-docker-shim.md) — 3-line shim unlocks `tree-sitter build --wasm -d` and similar tools on Amazon AL2023 devboxes.
- [Verify npm package canonicality via the upstream repo README install command](solutions/conventions/npm-package-canonicality-via-upstream-readme.md) — `chonkie-ts` was a 2.6 kB squatter; the upstream README pointed to `@chonkiejs/core`. Apply when bare/`-ts`/`@scoped` namesakes coexist.
- [Add typed kind-filtered enumeration to IGraphStore once 3+ packages need it](solutions/architecture-patterns/storage-list-nodes-over-scattered-sql.md) — `listNodes()` collapses N raw-SQL call sites into one typed rehydration; cross-adapter parity test catches schema drift.
- [Lift pure helpers to the deepest shared workspace dependency to break future cycles](solutions/architecture-patterns/lift-pure-functions-to-shared-dep-to-break-cycles.md) — `mcp → pack → mcp` was averted by lifting `classifyDependencies` into `@opencodehub/analysis` (the LCA dep). 30-LOC mechanical chore commit.
- [Worktree isolation — pin pwd at task start and exclude worktrees from biome v2](solutions/best-practices/worktree-isolation-pwd-pin-and-biome-exclusion.md) — gitignore is not enough for biome v2; scope to `packages/` or add `experimentalScannerIgnores`. Always `pwd && git rev-parse --show-toplevel` at task start.
- [Resolve milestone-old spec drifts inline with the implementing commit](solutions/best-practices/spec-drift-amend-inline-with-implementing-commit.md) — amend spec wording in the same commit that implements the resolution; record drifts with `recommend` in explore-delta so Gate 0 is a confirmation, not a fresh debate.

## Specs

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
---
title: Lift pure helpers to the deepest shared workspace dependency to break future cycles
tags: [monorepo, dependency-graph, refactoring, workspace-cycles]
session: session-e1d819
---

## Context

`classifyDependencies` (license tier classification, ~30 LOC pure
function) lived in `packages/mcp/src/tools/license-audit.ts`.
`packages/pack/src/licenses.ts` (M5-5 BOM body) needed it. But
`@opencodehub/mcp` already depends on `@opencodehub/pack` via the
`pack_codebase` MCP tool wrapper — a `pack → mcp` import would create
a `mcp → pack → mcp` cycle. T-W2-3 (commit 9d8d570) lifted the function
into `@opencodehub/analysis`, which both `mcp` and `pack` already depend
on, in a single mechanical chore commit.

## Lesson

When a pure helper in package A is needed by package B, and a `B → A`
import would create a cycle, lift the helper to the **deepest shared
dependency** in the workspace dep graph (the LCA in package-import
terms). Procedure:

1. Identify the LCA package by walking up imports from both A and B
(`pnpm why @opencodehub/<dep>` or visual inspection of
`package.json` workspace deps).
2. Move the function + supporting types **byte-identical** — preserve
every comment, signature, regex (in this case `COPYLEFT_PATTERN
= /^(GPL|AGPL|SSPL|EUPL|CPAL|OSL|RPL)/`).
3. Re-export from the destination package's barrel (`index.ts`) at the
alphabetically-correct position to match existing convention.
4. Replace local impl in package A with `import { fn } from "@org/lca"`.
Do **not** retain a re-export shim — direct imports are cleaner and
prevent future "should I import from A or LCA?" drift.
5. Move tests to the LCA package; keep the original package's test if
it covers integration via the imported symbol.
6. Commit scope: `chore(<lca-pkg>):` (cross-package symbol moves are
chores, not features).

## Why

The alternative — path-importing from `packages/<pkg>/src/...` or
hardcoding a `.js` import — works but cements the cycle, blocks future
tree-shaking, and creates two ways to call the same function. Lifting
to the LCA preserves the dep graph as a DAG and gives every future
consumer one canonical import path. The 30-LOC mechanical lift takes
~1 hour and unblocks the downstream feature with zero behavior change.
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
---
title: Add typed kind-filtered enumeration to IGraphStore once 3+ packages need it
tags: [storage, graph-store, api-design, typed-rehydration]
session: session-e1d819
---

## Context

Spec 005 originally called for `IGraphStore.listNodes()`. Implementation
diverged into raw SQL (`SELECT id, kind, ... FROM nodes WHERE kind = ?`)
scattered across `packages/mcp/src/tools/{scan,project-profile,
dependencies,verdict}.ts`. M5 BOM bodies (skeleton, file-tree, deps,
xrefs) were about to add four more raw-SQL call sites in
`packages/pack/`. T-W2-2 lifted the abstraction back into
`packages/storage/src/interface.ts` (commit 018c253).

## Lesson

When ≥ 3 packages need typed kind-filtered node enumeration from a
polymorphic graph store, add the method to the storage interface
instead of duplicating SQL. The shape that worked here:

```ts
// packages/storage/src/interface.ts
listNodes(opts?: {
readonly kinds?: readonly string[]; // undefined → all; [] → []
readonly limit?: number;
readonly offset?: number;
}): Promise<readonly GraphNode[]>; // typed discriminated union
```

Implementation requirements:

- Both adapters must rehydrate to the **typed** `GraphNode` discriminated
union — not `Record<string, unknown>`. This forces every column-to-field
mapping to be reversed once, in the adapter, instead of duplicated in
each consumer (`packages/storage/src/duckdb-adapter.ts:rowToGraphNode`,
`packages/storage/src/graphdb-adapter.ts:recordToGraphNode`).
- `ORDER BY id ASC` at the SQL layer + JS-side lex-stable tiebreak — this
is what gives cross-adapter byte-identical output (parity test in
`graphdb-adapter.test.ts`).
- Empty `kinds: []` short-circuits **before** opening any native binding
pool; this preserves the pure-JS contract for never-opened stores.
- Additive interface change: every existing `implements IGraphStore`
fake (4 found in this repo: `analysis/test-utils.ts`, `wiki/index.test.ts`,
`search/bm25.test.ts`, `search/hybrid.test.ts`) needs a no-op or
in-memory `listNodes` to typecheck.

## Why

Scattered SQL ages badly: every new column on the polymorphic `nodes`
table forces N consumers to update; per-kind rehydration drifts; tests
silently miss new fields. A typed `listNodes` collapses N rehydration
implementations to one and turns "did the consumer remember to read
`languageStats`?" into a compile error. The 25-test cross-adapter parity
suite added here is the canary for future schema additions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
title: Resolve milestone-old spec drifts inline with the implementing commit, not as a separate fix
tags: [spec-discipline, drift-resolution, commit-hygiene, ears]
session: session-e1d819
---

## Context

Spec 005 was authored before Wave 1 commits ratified its M5/M6 surface.
By the time Wave 2 started, four drifts existed (explore-delta.yaml
`drifts.drift_1..4`):

- drift_1: spec named `chonkie-ts@^0.3.0`; impl had `chonkie@^0.3.0`
(and ultimately `@chonkiejs/core@^0.0.9` was correct)
- drift_2: spec called for `IGraphStore.listNodes()`; method didn't exist
- drift_3: spec said "extend AGENTS.md with `choices[]`"; that already shipped
- drift_4: spec said "reuse license_audit MCP logic"; that path cycled

All four were resolved at Gate 0 by amending the spec wording inline as
part of the commit that implemented the fix (e.g., 77f37c3 amended
AC-M5-1 wording while switching the chonkie package; 9d8d570 amended
AC-M5-5 wording while lifting `classifyDependencies`).

## Lesson

When a spec drift is ≥ 1 milestone old and the implementation has already
committed to a different reality, **amend the spec inline as part of the
implementing commit**. Do not separate spec-fix from implementation:

1. Catch drifts during the explore-delta pass (or Gate 0 of the next
wave). List them with `where / what / reason / action_options /
recommend` keys in `explore-delta.yaml` so the orchestrator confirms
the resolution before Plan.
2. The implementing commit message body cites the spec line being
amended ("Amends spec 005 AC-M5-5: reads `chonkie` → `@chonkiejs/core`").
3. The diff includes both the code change AND the spec edit. Reviewers
see the drift resolved and ratified in one atomic step.
4. Never carry an open drift across milestones. Either accept-and-amend
or revert-to-spec — the only forbidden state is "spec says X, code
does Y, no decision recorded".

## Why

Separate "spec-fix" commits decouple from the reasoning that justified
the change; future readers see a spec edit with no obvious driver.
Inline amendment ratifies the drift at the point of decision, keeps the
spec executable, and prevents Plan from re-litigating settled choices.
The four-drift batch in this session resolved cleanly because every
drift had an `action_options` block with a `recommend`, so Gate 0 was
a four-line confirmation rather than a fresh design discussion.
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
---
title: Worktree isolation — pin pwd at task start and exclude worktrees from biome v2
tags: [worktrees, biome, lefthook, ci, agent-isolation]
session: session-e1d819
---

## Context

Two distinct worktree pitfalls hit M5 Wave 2:

1. T-W2-3 was provisioned as `isolation: worktree` but the agent edited
files in the main repo before catching that its worktree base was at
`ed3950f` (M3/M4) instead of `feat/v1-m5-m6` HEAD `86e295b`. Recovery
required `git stash` + `git stash pop`.
2. Validation `mise run check` failed at the `lint` step because biome v2
recursively traversed `.claude/worktrees/agent-*/biome.json` files and
detected 10 nested `"root": true` configs — even though the worktrees
are gitignored. Scoped lint (`pnpm exec biome check packages/`) exits 0.

## Lesson

**At every worktree task start, byte-pin location and base SHA**:

```bash
pwd # confirm worktree path, not main
git rev-parse --show-toplevel # toplevel matches pwd
git rev-parse HEAD # matches expected base SHA
git status # confirm clean tree
```

If any of these mismatch the task packet's expected state, halt and
re-provision. Editing in the wrong tree wastes the isolation guarantee.

**Biome v2 traverses gitignored worktrees by default.** `gitignore`
alone is **not** sufficient. Two viable fixes:

- (a) Scope CI/lefthook biome invocations to tracked source paths:
`pnpm exec biome check packages/ scripts/` (not bare `.`). This is
the workaround used in this session.
- (b) Add an explicit exclusion in `biome.json`:
`"files": { "experimentalScannerIgnores": ["**/.claude/worktrees/**"] }`.
This is the durable fix; ship it the next time `biome.json` is touched.

Inside a worktree, prefer `git -C <worktree>` for git ops over `cd
<worktree> && git ...` — the harness's per-bash-call cwd reset makes
`-C` the only reliable form across multi-step sequences.

## Why

Worktrees buy you parallel-agent isolation only if the agent actually
operates inside its own tree. A wrong-pwd edit breaks the cherry-pick
contract and pollutes the main branch with WIP. Pinning pwd takes 4
bash calls and costs nothing.

Biome v2's "scan everything" default treats `.claude/worktrees/` as
ordinary source. The gitignore-is-enough assumption (true for git, npm,
pnpm) does not extend to biome v2. Either scope the invocation or add
the explicit exclusion — but document the choice so the next contributor
with sibling worktrees doesn't burn an hour on a phantom CI failure.
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
title: Verify npm package canonicality via the upstream repo README install command
tags: [npm, supply-chain, dependency-pinning, squatters]
session: session-e1d819
---

## Context

M5 Wave 1 wired `chonkie@^0.3.0` into `packages/pack/package.json` after
a 2026-05-05 research yaml. Reality: the npm namespace is split across
three plausible names — `chonkie-ts` (PolyerAI squatter, v0.0.1, 2.6 kB,
abandoned), the bare `chonkie` (chonkie-inc-owned but undocumented for
TS callers), and the canonical TS port `@chonkiejs/core@^0.0.9`. Only
the upstream `chonkie-inc/chonkiejs` README install command disambiguates.
T-W2-5 retracted to `@chonkiejs/core` after grounding (commit 77f37c3:
`chore(pack): switch chonkie dep to @chonkiejs/core@^0.0.9`).

## Lesson

Before pinning any npm dep — especially for an emergent library — open
the upstream repository's README and copy the literal `npm install` /
`pnpm add` line. The npm registry has stale squatters and unsuffixed
namesakes that look canonical but aren't. The upstream README is the
only authoritative source for "which package name does the maintainer
actually ship to". Apply this rule when:

- The package shows up in research yaml without a verified install command.
- A `-ts` / `-js` suffixed variant exists alongside the bare name.
- npm-side metadata (last publish, weekly downloads, deps) looks thin.

Concrete checks for a candidate dep:

1. Pull the repo README and grep for `npm install` / `pnpm add` / `yarn add`.
2. Cross-check the package.json `name` in the upstream repo against the
pinned name.
3. If the bare name and a scoped `@org/pkg` name both exist, prefer the
scoped name unless the README install line says otherwise.

## Why

npm name-squatting is undefended; the registry has no concept of
"canonical port". The upstream maintainer's README is the only source
of truth that survives organization renames, scope migrations, and
abandoned forks. This is cheap to check (one README fetch) and stops
shipping a 2.6 kB stub or an undocumented unsuffixed namesake to
production.
Loading
Loading