Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .erpaval/INDEX.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@ development sessions. Solutions are reusable; specs are per-feature.
- [llms-txt config strings quietly anchor doc accuracy](solutions/conventions/llms-txt-as-ground-truth.md) — in a Starlight site with `starlight-llms-txt`, `astro.config.mjs` is more load-bearing than prose READMEs; audit it first in doc-sync sweeps.
- [tsconfig project references go stale on package removal](solutions/conventions/tsconfig-project-references-stale-on-package-removal.md) — root tsconfig `references` drift is invisible until a root-scoped tsc invocation hits; clean up in the same commit as the package delete.
- [Astro NODE_ENV in CI — set it at script scope, not step scope](solutions/conventions/astro-node-env-in-ci-script-scope.md) — mise-action + pnpm + astro chain loses CI-level NODE_ENV overrides; hard-code in package.json `build` script.
- [tree-sitter-wasms catalog is unusable with web-tree-sitter 0.26+](solutions/architecture-patterns/tree-sitter-wasms-catalog-incompat.md) — 0.1.13 artifacts use legacy `dylink` section, web-tree-sitter hard-requires `dylink.0`. Build your own WASMs and commit them.
- [pnpm install hangs on EFS workdir](solutions/best-practices/pnpm-install-on-efs.md) — 8+ min → 4.6s with `store-dir=/home/...` in `~/.npmrc` + `UV_USE_IO_URING=0`. Two stacked causes: cross-fs store and AL2023 io_uring bug.
- [Finch as docker shim via PATH for CLIs that shell out to `docker`](solutions/best-practices/finch-as-docker-shim.md) — 3-line shim unlocks `tree-sitter build --wasm -d` and similar tools on Amazon AL2023 devboxes.

## Specs

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
---
title: tree-sitter-wasms catalog package is unusable with web-tree-sitter 0.26+
tags: [tree-sitter, web-tree-sitter, wasm, dylink, parser-runtime, ingestion]
first_applied: 2026-05-08
repos: [opencodehub]
---

## The pattern

When a tree-sitter grammar npm package doesn't ship a `.wasm` alongside
its `.node` binding (kotlin `fwcd/tree-sitter-kotlin`, swift
`alex-pinkus/tree-sitter-swift`, dart `UserNobody14/tree-sitter-dart`),
the obvious workaround is the shared catalog package
`tree-sitter-wasms` which pre-builds `.wasm` for ~40 grammars in one
place.

**Do not reach for `tree-sitter-wasms@0.1.13` with
`web-tree-sitter@0.26+`. It won't load.**

## Why

`tree-sitter-wasms@0.1.13` (npm latest as of 2026-05-08) built its
`.wasm` artifacts with `tree-sitter-cli@0.20.8`, which emits the
legacy `dylink` custom section (6 bytes). `web-tree-sitter@0.26+`
hard-requires the standardized `dylink.0` section name (8 bytes) and
throws `Error: need the dylink section to be first` at
`Language.load(path)`.

Byte-level verification:

```
$ xxd -l 32 node_modules/tree-sitter-python/tree-sitter-python.wasm
00000000: 0061 736d 0100 0000 0011 0864 796c 696e .asm.......dylin
00000010: 6b2e 3001 0694 c41a 0407 0001 2908 6001 k.0.........).`.

$ xxd -l 32 node_modules/tree-sitter-wasms/out/tree-sitter-kotlin.wasm
00000000: 0061 736d 0100 0000 000f 0664 796c 696e .asm.......dylin
00000010: 6ba8 87ee 0104 0200 0001 2908 6001 7f00 k.........).`.
```

The 11 per-grammar packages that DO ship their own `.wasm` (python,
typescript, javascript, go, rust, java, csharp, c, cpp, ruby, php)
were built with current tree-sitter-cli and use `dylink.0` — those
load cleanly.

## Do this instead

Build your own `.wasm` blobs from the exact grammar sources your
package.json pins and commit them to the repo. See the opencodehub
implementation:

- `scripts/build-vendor-wasms.sh` — reproducible build via
tree-sitter CLI + docker/podman/finch/local emcc
- `packages/ingestion/vendor/wasms/{kotlin,swift,dart}.wasm` — committed
artifacts (8.1 MB total)
- `packages/ingestion/src/parse/wasm-fallback.ts` —
`resolveGrammarWasmPath` falls back to `vendor/wasms/` for these 3
languages when per-grammar `.wasm` isn't present

Zero grammar-version drift (built from same source as native), zero
install-time emscripten requirement (artifacts committed), zero CI-time
build (fast install everywhere).

## Related

- ADR 0013 (`docs/adr/0013-parse-runtime-wasm-default.md`) records the
full WASM-default decision.
- Upstream publish blocker that forced the whole reshuffle:
[tree-sitter/node-tree-sitter#276](https://github.com/tree-sitter/node-tree-sitter/issues/276)
(Node 24 ABI break fix blocked on npm OIDC publish issue since 2025-06).
51 changes: 51 additions & 0 deletions .erpaval/solutions/best-practices/finch-as-docker-shim.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
---
title: Use finch as a drop-in docker via PATH shim on Amazon AL2023 devboxes
tags: [finch, docker, al2023, containers, emscripten, tree-sitter-cli]
first_applied: 2026-05-08
repos: [opencodehub]
---

## The pattern

CLIs that shell out to `docker` (like `tree-sitter build --wasm -d`,
which runs `docker run emscripten/emsdk ...`) don't know about Amazon
Finch. AL2023 devboxes typically have finch installed via
`/usr/bin/sudo finch ...` (aliased in zsh) but no `docker` on PATH. The
tool errors out with "You must have either emcc, docker, or podman on
your PATH".

Workaround: a 3-line shell shim.

## Fix

```bash
cat > /tmp/docker-shim.sh <<'EOF'
#!/usr/bin/env bash
exec sudo HOME=/home/$USER DOCKER_CONFIG=/home/$USER/.docker finch "$@"
EOF
chmod +x /tmp/docker-shim.sh
mkdir -p /tmp/docker-bin && ln -sf /tmp/docker-shim.sh /tmp/docker-bin/docker

PATH=/tmp/docker-bin:$PATH <your-tool-that-needs-docker>
```

Verified against `tree-sitter build --wasm -d` — finch pulled
`docker.io/emscripten/emsdk:3.1.64` (30 s), built kotlin/swift/dart
WASM grammars (~1 min each), output byte-identical to what a native
docker install would produce.

## Caveats

- `finch run -v /path:/path` works with volume mounts.
- The `sudo HOME=... DOCKER_CONFIG=...` wrapping matches Amazon's
standard finch alias — without it, finch writes container state to
`/root/` and breaks cache reuse.
- Warnings like `unsupported volume option "Z"` are harmless (SELinux
label option that finch/nerdctl ignores).

## When to reach for this

One-off container needs where installing Docker Desktop or podman is
heavier than justifying — e.g. pre-building WASM artifacts to commit,
running a one-shot emsdk compile, or testing something in an
`emscripten/emsdk`-style official image.
68 changes: 68 additions & 0 deletions .erpaval/solutions/best-practices/pnpm-install-on-efs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
---
title: pnpm install hangs on Amazon EFS-mounted workdir without store-dir + UV_USE_IO_URING=0
tags: [pnpm, efs, nfs, al2023, devbox, install-performance]
first_applied: 2026-05-08
repos: [opencodehub]
---

## The pattern

`pnpm install` on an EFS-mounted working directory (typical Amazon
devbox setup where home is local but the source tree is under `/efs`)
will hang for 4-8 minutes with zero stdout, then eventually complete.
Two stacked causes:

1. **pnpm CAS store lands on EFS by default.** `pnpm store path` will
show something like `/efs/<user>/.pnpm-store/v10` when your HOME
resolves through EFS. Every CAS lookup becomes a ~22 ms NFS
round-trip (vs ~200 µs on local EBS/XFS) — a 100× latency gap.
With 800+ packages × dozens of files each, install is O(N) in NFS
stat/create syscalls.
2. **AL2023 kernel `io_uring` cleanup bug**
([amazonlinux/amazon-linux-2023#856](https://github.com/amazonlinux/amazon-linux-2023#856))
causes Node processes to appear hung during cleanup. Symptom:
pnpm's progress output stops emitting; process shows 1% CPU; then
minutes later a flurry of "Progress: resolved X, reused Y" lines
pops out at once.

## Fix

**User-global `~/.npmrc`** (not committed to the repo — team members
on other hosts may want different tunings):

```
store-dir=/home/<user>/.local/share/pnpm-store
package-import-method=hardlink
```

**Shell env** for installing (add to `~/.zshrc` permanently until AL2023
backports the kernel fix):

```bash
export UV_USE_IO_URING=0
```

If you're applying this change on an EFS workdir with an existing
`node_modules/`, pnpm will refuse to rebuild it without TTY — use
`CI=true pnpm install --no-frozen-lockfile` the first time so pnpm
can purge the old modules dir and repopulate from the new store
location. After the first warm install, subsequent installs hardlink
from local XFS and finish in ~5 seconds.

## Verification

Before: `pnpm install` → 8+ minutes, mostly silent
After: `pnpm install --prefer-offline` → 4.6 seconds

Check that the store moved: `pnpm store path` should no longer return
an `/efs/...` path.

## Sources

- pnpm FAQ — cross-filesystem store falls back to copy, not hardlink
- pnpm settings reference — `store-dir`, `package-import-method`,
`virtual-store-dir`
- kdgregory blog, "EFS Performance Take 3" — bonnie++ file-create
latency EFS 22,516 µs vs EBS 218 µs
- [amazonlinux/amazon-linux-2023#856](https://github.com/amazonlinux/amazon-linux-2023/issues/856)
— `UV_USE_IO_URING=0` workaround for io_uring hang
25 changes: 18 additions & 7 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,24 +33,35 @@ jobs:
- run: pnpm -r exec tsc --noEmit

test:
# Node 24 temporarily dropped from matrix: tree-sitter@0.25.0 fails to
# compile against Node 24's V8 ABI. Upstream fix landed in node-tree-sitter
# git tag v0.25.1 but is blocked on an npm OIDC publish issue
# (tree-sitter/node-tree-sitter#268, #276). Re-add `24` to the matrix once
# 0.25.1+ lands on npm. Types stay on @types/node@24.x so we surface any
# type-level Node 24 breakage early.
# Node 22 = native-opt-in path (OCH_NATIVE_PARSER=1); Node 24 = WASM default
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
node-version: [22, 24]
runs-on: ${{ matrix.os }}
env:
MISE_NODE_VERSION: ${{ matrix.node-version }}
steps:
- uses: actions/checkout@v6
- uses: jdx/mise-action@v4
- name: Ensure node-gyp is available for native tree-sitter build
if: matrix.node-version == 22
run: npm i -g node-gyp
- run: pnpm install --frozen-lockfile
# Node 22: let native tree-sitter grammars postinstall (scripts enabled)
# so the OCH_NATIVE_PARSER=1 test path has working N-API bindings.
# Node 24: skip postinstall — native grammars can't build against the
# Node 24 V8 ABI yet (tree-sitter/node-tree-sitter#276). WASM default
# doesn't need the N-API addons on disk.
- name: Install deps (Node 22, with postinstall)
if: matrix.node-version == 22
run: pnpm install --frozen-lockfile
- name: Install deps (Node 24, ignore-scripts)
if: matrix.node-version == 24
run: pnpm install --frozen-lockfile --ignore-scripts
- run: pnpm -r test
env:
OCH_NATIVE_PARSER: ${{ matrix.node-version == 22 && '1' || '' }}

sarif-validate:
runs-on: ubuntu-latest
Expand Down
20 changes: 20 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,3 +39,23 @@ This repo ships a Claude Code plugin at `plugins/opencodehub/` — it
provides `/probe`, `/verdict`, `/owners`, `/audit-deps`, `/rename` slash
commands plus a `code-analyst` subagent and 10 skills. Install via
`codehub init` (writes `.mcp.json` + links the plugin).

## Parse runtime — WASM default, native opt-in

`@opencodehub/ingestion` defaults to the `web-tree-sitter` (WASM) runtime
on both Node 22 and Node 24. To opt into the faster native `tree-sitter`
N-API addon on Node 22 dev boxes, set `OCH_NATIVE_PARSER=1` or pass
`--native-parser` to the `codehub` CLI. Native is not supported on
Node 24 until `node-tree-sitter@0.25.1` lands on npm
(tree-sitter/node-tree-sitter#276).

Kotlin, Swift, and Dart grammars use `.wasm` blobs vendored at
`packages/ingestion/vendor/wasms/` (built from the same grammar sources
pinned in `package.json`). Rebuild via `bash scripts/build-vendor-wasms.sh`
after bumping any of those grammars — requires docker, podman, finch
(aliased as docker), or a local emcc install.

The complexity phase (`packages/ingestion/src/pipeline/phases/complexity.ts`)
still uses native tree-sitter for cyclomatic-complexity metrics. On Node 24
or Node 22 without the opt-in, complexity extraction degrades with a
one-shot stderr warning; all other parsing continues via WASM.
113 changes: 113 additions & 0 deletions docs/adr/0013-parse-runtime-wasm-default.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# ADR 0013 — Parse runtime: WASM default, native opt-in

- Status: **Accepted** — 2026-05-08.
- Authors: Laith Al-Saadoon + Claude.
- Branch: `feat/node24-wasm-default`.
- Closes: GitHub issues #19 (`@types/node` 20→24), #23 (Node 24 CI matrix).
- Interacts with: the Dependabot unified bump PR #69 (merged 2026-05-08).

## Context

`@opencodehub/ingestion` used the native `tree-sitter` N-API addon as
the default parse runtime with a `web-tree-sitter` WASM fallback behind
an `OCH_WASM_ONLY=1` opt-in. Adding Node 24 to CI was blocked on an
upstream issue: `node-tree-sitter` 0.25.1 fixes the Node 24 ABI break
but the maintainers' npm OIDC publish has been failing since 2025-06
(tree-sitter/node-tree-sitter#276, still open as of 2026-05-08). We had
no visibility into an ETA.

Three downstream questions fell out:

1. How do we get Node 24 into CI without waiting on the publish?
2. Do we keep native as a supported path for Node 22 developer speed,
or drop it entirely?
3. What do we do about kotlin, swift, dart — the 3 grammar packages
whose npm tarballs ship only `.node` addons with no `.wasm` asset?

## Decision

**WASM is now the default parse runtime on both Node 22 and Node 24.
Native is an opt-in second path controlled by `OCH_NATIVE_PARSER=1` or
the `--native-parser` CLI flag.**

### Rationale for each question

**(Q1) Node 24.** WASM has no native ABI dependency, so it works on
Node 24 immediately. The CI `test` job now runs a `[ubuntu, macos,
windows] × [22, 24]` matrix (6 cells). Node 22 rows set
`OCH_NATIVE_PARSER=1` to exercise the native path; Node 24 rows leave
the env unset to exercise WASM. Both paths are tested every PR.

**(Q2) Native stays.** Native parsing is measurably faster than WASM
for large-repo indexing. On Node 22, developers still get that speed
via the opt-in. We did not drop the 13 `tree-sitter-<lang>` npm deps
from `packages/ingestion/package.json` — they remain installable, just
not default. `isNativeAvailable()` still probes them at runtime.

**(Q3) Kotlin / Swift / Dart.** Their npm packages ship only native
`.node` bindings. The obvious workaround — the `tree-sitter-wasms`
catalog package — is unusable: its 0.1.13 artifacts were built with
`tree-sitter-cli` 0.20.x, which emits the legacy `dylink` custom
section. `web-tree-sitter` 0.26+ hard-rejects anything that's not the
standardized `dylink.0` section. We verified this at the byte level
(python grammar ships `dylink.0`; tree-sitter-wasms ships `dylink` and
throws at load). So we build our own `.wasm` blobs once, from the
exact grammar sources we pin, and commit them to
`packages/ingestion/vendor/wasms/`. The build script at
`scripts/build-vendor-wasms.sh` reproduces the build via docker /
podman / finch / local emsdk and takes ~3 minutes end-to-end. Zero
grammar-version drift between native and WASM paths.

## Consequences

- **Node 24 is a first-class CI target.** Issue #23 closed.
- **Native-parser dispatch is explicit.** `parse-worker.ts` logs which
runtime it picked at worker startup; neither path is silent anymore.
- **Parity test covers all 14 tree-sitter languages** (was 3). The suite
skips cleanly when `isNativeAvailable()` returns false so Node 24 CI
runs it as a no-op; on Node 22 + `OCH_NATIVE_PARSER=1` it asserts
byte-identical ParseCapture output across runtimes.
- **Complexity phase has a documented degradation.** The cyclomatic-
complexity phase at `packages/ingestion/src/pipeline/phases/complexity.ts`
has an independent `requireFn("tree-sitter")` path that cannot use
WASM. When native is unavailable, it emits a one-shot stderr warning
and returns `undefined`; all other parsing continues. Upgrading this
to WASM is a follow-up (the current `ts-morph`-backed implementation
depends on native AST walking).
- **`vendor/wasms/` adds 8.1 MB to the repo.** Acceptable vs the
alternative (emsdk at install time on every dev box + CI runner).
- **Grammar bumps now require a WASM rebuild.** When we bump
`tree-sitter-kotlin` / `tree-sitter-swift` / `tree-sitter-dart` in
`package.json`, the `vendor/wasms/*.wasm` files must be rebuilt via
the committed script and re-committed. The parity test will catch
forgotten rebuilds on the Node 22 + opt-in CI row.
- **Old flag removed without deprecation shim.** `OCH_WASM_ONLY` is
gone; the M5 `--wasm-only` CLI flag becomes `--native-parser` (inverse
meaning). This was a fresh flag from the M5 release with zero
external consumers.

## Alternatives considered

- **Drop native entirely** — rejected; local dev speed still matters.
- **Pin to an older `web-tree-sitter`** that accepted legacy dylink —
rejected; pins us to an unmaintained line and doesn't solve future
per-grammar packages shipping `dylink.0`.
- **Use `tree-sitter-wasms` catalog as-is** — investigated, it doesn't
load. Documented above.
- **Build `.wasm` at install time via a postinstall** — requires emsdk
or docker on every developer machine; CI cache strategy becomes a
headache across the OS × Node matrix. Pre-committing the artifacts
is simpler, faster, more deterministic.
- **Ship kotlin / swift / dart as native-only** (WASM default for the
other 13) — considered after `tree-sitter-wasms` was ruled out.
Rejected because Amazon-internal Finch is available on dev boxes and
the build worked in one shot, making the extra 8.1 MB of vendored
wasms the cleaner long-term answer.

## References

- GitHub issue: tree-sitter/node-tree-sitter#276 (publish blocker,
still open 2026-05-08)
- Lesson: `.erpaval/solutions/architecture-patterns/parse-runtime-wasm-default.md`
(written post-merge)
- Session trace: `.erpaval/sessions/session-b4fcc7/`
4 changes: 3 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,9 @@
"tmp@<0.2.4": "0.2.4",
"dompurify@<3.4.0": "3.4.0",
"hono@<4.12.16": "4.12.16",
"ip-address@<10.1.1": "10.1.1"
"ip-address@<10.1.1": "10.1.1",
"fast-uri@<3.1.2": "3.1.2",
"fast-xml-builder@<1.1.7": "1.1.7"
},
"onlyBuiltDependencies": [
"@duckdb/node-api",
Expand Down
Loading
Loading