Skip to content

feat(ingestion): WASM-default parse runtime + Node 24 CI matrix#70

Merged
theagenticguy merged 1 commit into
mainfrom
feat/node24-wasm-default
May 8, 2026
Merged

feat(ingestion): WASM-default parse runtime + Node 24 CI matrix#70
theagenticguy merged 1 commit into
mainfrom
feat/node24-wasm-default

Conversation

@theagenticguy
Copy link
Copy Markdown
Owner

Summary

Flips @opencodehub/ingestion from native-default to WASM-default parser so Node 24 becomes a first-class CI target. Native tree-sitter stays fully supported as an opt-in via OCH_NATIVE_PARSER=1 / --native-parser for Node 22 dev speed.

The upstream blocker for native on Node 24 is tree-sitter/node-tree-sitter#276 — the 0.25.1 fix merged upstream but has been blocked on an npm OIDC publish misconfiguration since mid-2025. Rather than wait indefinitely, WASM path (which has no native ABI dependency) becomes the default.

What this closes

What changed

Runtime dispatch

  • packages/ingestion/src/parse/parse-worker.ts — inverted: WASM runs by default; native is opt-in. Startup warning emits for both runtimes so runtime choice is always logged.
  • packages/cli/src/index.ts--wasm-only--native-parser (inverse meaning).
  • packages/ingestion/src/parse/parse-worker.test.ts (new) — 5 unit tests covering all dispatch branches.

Grammar resolution

  • packages/ingestion/src/parse/wasm-fallback.ts — two-stage cascade in resolveGrammarWasmPath:
    1. Per-grammar package lookup (11 languages that ship .wasm alongside .node)
    2. Vendored-WASM fallback for kotlin/swift/dart at packages/ingestion/vendor/wasms/
  • PHP mapping now uses tree-sitter-php_only.wasm to match the native loader's mod.php_only choice at grammar-registry.ts:253 (previously a silent native-vs-WASM divergence).

Vendored WASM artifacts (new)

  • packages/ingestion/vendor/wasms/{kotlin,swift,dart}.wasm — 8.1 MB total, built from the exact grammar sources pinned in package.json (zero drift).
  • packages/ingestion/vendor/wasms/{README,LICENSES}.md — build provenance + MIT attribution for the three upstream grammars.
  • scripts/build-vendor-wasms.sh — reproducible rebuild via docker / podman / finch (as docker shim) / local emcc + tree-sitter-cli.

Why not the tree-sitter-wasms npm catalog?

Investigated and rejected: its 0.1.13 artifacts were built with tree-sitter-cli@0.20.x and ship the legacy dylink custom section (6 bytes). web-tree-sitter@0.26+ hard-requires the standardized dylink.0 (8 bytes) and throws Error: need the dylink section to be first. See ADR 0013 for byte-level verification and .erpaval/solutions/architecture-patterns/tree-sitter-wasms-catalog-incompat.md for the durable lesson.

Complexity phase degradation

  • packages/ingestion/src/pipeline/phases/complexity.ts — the cyclomatic-complexity phase has an independent native-only requireFn("tree-sitter") path that cannot use WASM. Now emits a one-shot stderr warning when native is unavailable instead of silently returning undefined.

Parity test expansion

  • packages/ingestion/src/parse/wasm-parity.test.ts — extended from 3 to 14 tree-sitter languages (COBOL stays out, it's regex-only). Hard assert.ok(isNativeAvailable()) softened to per-test skip so the suite runs clean on Node 24 CI as a no-op.

CI matrix

  • .github/workflows/ci.ymltest job now runs [ubuntu, macos, windows] × [22, 24] via MISE_NODE_VERSION. Node 22 rows install with scripts + set OCH_NATIVE_PARSER=1 (exercises native path); Node 24 rows use --ignore-scripts + leave env unset (exercises WASM default).

Security fixes (surfaced by OSV rescan)

Two pre-existing transitive vulns closed in passing via pnpm.overrides:

Both were present on main before this PR; added overrides because the rescan caught them and it costs 2 lines to close.

Docs + lessons

  • docs/adr/0013-parse-runtime-wasm-default.md — architectural decision record
  • CLAUDE.md — new section documenting OCH_NATIVE_PARSER, vendored WASMs, complexity.ts caveat
  • .erpaval/solutions/ — three Compound lessons (WASM catalog incompat, pnpm-on-EFS, finch as docker shim)
  • .erpaval/INDEX.md — updated pointers

Test plan

  • pnpm -r clean && pnpm -r build && pnpm -r exec tsc --noEmit && pnpm -r test — green on Node 22 WASM default (572/572 ingestion, 225/225 CLI, all other packages green)
  • OCH_NATIVE_PARSER=1 pnpm --filter @opencodehub/ingestion test — green (regression gate, 572/572)
  • pnpm run lint && pnpm run banned-strings — green
  • Simulated native-missing: all 15 parity iterations skip cleanly with descriptive reason, suite exit 0 (proves Node 24 CI won't fail on the parity test)
  • osv-scanner scan source --lockfile=pnpm-lock.yamlNo issues found
  • L2 code review (opus): 4 findings surfaced, 3 fixed in-branch (comment precision, MIT attribution, Node 24 CI --ignore-scripts), 1 accepted (ruby #match? predicate coverage gap, non-blocker)
  • CI matrix all-green on push — verify before merge

Session trace

.erpaval/sessions/session-b4fcc7/ — full classifier trace, explore + research packets, per-AC work logs, validation verdict, extracted lessons.

Flips @opencodehub/ingestion from native-default to WASM-default parser
so Node 24 becomes a first-class CI target. Native tree-sitter stays
fully supported as an opt-in via OCH_NATIVE_PARSER=1 / --native-parser
for Node 22 dev speed.

Key changes:
- parse-worker.ts: inverted dispatch; WASM runs by default, native is
  opt-in, startup warning emits for both runtimes
- wasm-fallback.ts: two-stage cascade in resolveGrammarWasmPath — per-
  grammar .wasm for 11 langs, vendored .wasm for kotlin/swift/dart
- Vendor kotlin/swift/dart WASMs at packages/ingestion/vendor/wasms/
  (built from pinned grammar sources, zero drift; 8.1 MB total)
- scripts/build-vendor-wasms.sh: reproducible rebuild via finch/docker/
  podman/emcc + tree-sitter-cli
- PHP mapping now uses tree-sitter-php_only.wasm (matches native loader)
- complexity.ts: one-shot degradation warning when native unavailable
- wasm-parity.test.ts: extended from 3 to 14 languages; native-prereq
  softened to test.skip so Node 24 CI runs clean
- ci.yml test job: 3 OS × 2 Node (22, 24) via MISE_NODE_VERSION; Node 22
  rows exercise native via OCH_NATIVE_PARSER=1, Node 24 rows exercise
  WASM default with --ignore-scripts
- CLAUDE.md + ADR 0013 document the runtime + flag story
- Transitive CVE fixes surfaced by OSV: fast-xml-builder → 1.1.7,
  fast-uri → 3.1.2 (pre-existing on main, closed in passing)

Closes #19, #23.

Tests: 572/572 ingestion default (WASM), 572/572 with OCH_NATIVE_PARSER=1,
all other workspace packages green. OSV: No issues found.
@theagenticguy theagenticguy enabled auto-merge (squash) May 8, 2026 20:52
@theagenticguy theagenticguy merged commit 704fd67 into main May 8, 2026
17 checks passed
@theagenticguy theagenticguy deleted the feat/node24-wasm-default branch May 8, 2026 21:36
theagenticguy added a commit that referenced this pull request May 8, 2026
Rebase onto main brought in PR #70's transitive-CVE overrides
(fast-xml-builder@1.1.7, fast-uri@3.1.2, hono@4.12.16, ip-address@10.1.1).
Regenerating the lockfile pulls those in alongside the M5/M6 pack deps.

No source changes — build + typecheck + tests + banned-strings all green
locally before push.
theagenticguy added a commit that referenced this pull request May 10, 2026
## Summary

Flips `@opencodehub/ingestion` from **native-default** to
**WASM-default** parser so Node 24 becomes a first-class CI target.
Native `tree-sitter` stays fully supported as an opt-in via
`OCH_NATIVE_PARSER=1` / `--native-parser` for Node 22 dev speed.

The upstream blocker for native on Node 24 is
[`tree-sitter/node-tree-sitter#276`](tree-sitter/node-tree-sitter#276)
— the 0.25.1 fix merged upstream but has been blocked on an npm OIDC
publish misconfiguration since mid-2025. Rather than wait indefinitely,
WASM path (which has no native ABI dependency) becomes the default.

### What this closes

- **Closes #19** — `@types/node` was already bumped to 25.x in a prior
PR; the real ask behind #19 was "get Node 24 runnable in CI" — delivered
here.
- **Closes #23** — Node 24 in the `test` matrix, unblocked.

## What changed

### Runtime dispatch
- `packages/ingestion/src/parse/parse-worker.ts` — inverted: WASM runs
by default; native is opt-in. Startup warning emits for **both**
runtimes so runtime choice is always logged.
- `packages/cli/src/index.ts` — `--wasm-only` → `--native-parser`
(inverse meaning).
- `packages/ingestion/src/parse/parse-worker.test.ts` (new) — 5 unit
tests covering all dispatch branches.

### Grammar resolution
- `packages/ingestion/src/parse/wasm-fallback.ts` — two-stage cascade in
`resolveGrammarWasmPath`:
1. Per-grammar package lookup (11 languages that ship `.wasm` alongside
`.node`)
2. Vendored-WASM fallback for kotlin/swift/dart at
`packages/ingestion/vendor/wasms/`
- PHP mapping now uses `tree-sitter-php_only.wasm` to match the native
loader's `mod.php_only` choice at `grammar-registry.ts:253` (previously
a silent native-vs-WASM divergence).

### Vendored WASM artifacts (new)
- `packages/ingestion/vendor/wasms/{kotlin,swift,dart}.wasm` — 8.1 MB
total, built from the exact grammar sources pinned in `package.json`
(zero drift).
- `packages/ingestion/vendor/wasms/{README,LICENSES}.md` — build
provenance + MIT attribution for the three upstream grammars.
- `scripts/build-vendor-wasms.sh` — reproducible rebuild via docker /
podman / finch (as docker shim) / local emcc + tree-sitter-cli.

### Why not the `tree-sitter-wasms` npm catalog?
Investigated and **rejected**: its 0.1.13 artifacts were built with
`tree-sitter-cli@0.20.x` and ship the legacy `dylink` custom section (6
bytes). `web-tree-sitter@0.26+` hard-requires the standardized
`dylink.0` (8 bytes) and throws `Error: need the dylink section to be
first`. See ADR 0013 for byte-level verification and
`.erpaval/solutions/architecture-patterns/tree-sitter-wasms-catalog-incompat.md`
for the durable lesson.

### Complexity phase degradation
- `packages/ingestion/src/pipeline/phases/complexity.ts` — the
cyclomatic-complexity phase has an independent native-only
`requireFn("tree-sitter")` path that cannot use WASM. Now emits a
one-shot stderr warning when native is unavailable instead of silently
returning `undefined`.

### Parity test expansion
- `packages/ingestion/src/parse/wasm-parity.test.ts` — extended from 3
to 14 tree-sitter languages (COBOL stays out, it's regex-only). Hard
`assert.ok(isNativeAvailable())` softened to per-test `skip` so the
suite runs clean on Node 24 CI as a no-op.

### CI matrix
- `.github/workflows/ci.yml` — `test` job now runs `[ubuntu, macos,
windows] × [22, 24]` via `MISE_NODE_VERSION`. Node 22 rows install with
scripts + set `OCH_NATIVE_PARSER=1` (exercises native path); Node 24
rows use `--ignore-scripts` + leave env unset (exercises WASM default).

### Security fixes (surfaced by OSV rescan)
Two pre-existing transitive vulns closed in passing via
`pnpm.overrides`:
- `fast-xml-builder@<1.1.7 → 1.1.7` (GHSA-5wm8-gmm8-39j9 CVSS 8.7,
GHSA-45c6-75p6-83cc CVSS 6.1) — transitive via `@aws-sdk/core`
- `fast-uri@<3.1.2 → 3.1.2` (GHSA-v39h-62p7-jpjc CVSS 7.5,
GHSA-q3j6-qgpj-74h6 CVSS 7.5) — transitive via `ajv`

Both were present on `main` before this PR; added overrides because the
rescan caught them and it costs 2 lines to close.

### Docs + lessons
- `docs/adr/0013-parse-runtime-wasm-default.md` — architectural decision
record
- `CLAUDE.md` — new section documenting `OCH_NATIVE_PARSER`, vendored
WASMs, complexity.ts caveat
- `.erpaval/solutions/` — three Compound lessons (WASM catalog incompat,
pnpm-on-EFS, finch as docker shim)
- `.erpaval/INDEX.md` — updated pointers

## Test plan

- [x] `pnpm -r clean && pnpm -r build && pnpm -r exec tsc --noEmit &&
pnpm -r test` — green on Node 22 WASM default (572/572 ingestion,
225/225 CLI, all other packages green)
- [x] `OCH_NATIVE_PARSER=1 pnpm --filter @opencodehub/ingestion test` —
green (regression gate, 572/572)
- [x] `pnpm run lint && pnpm run banned-strings` — green
- [x] Simulated native-missing: all 15 parity iterations skip cleanly
with descriptive reason, suite exit 0 (proves Node 24 CI won't fail on
the parity test)
- [x] `osv-scanner scan source --lockfile=pnpm-lock.yaml` — **No issues
found**
- [x] L2 code review (opus): 4 findings surfaced, 3 fixed in-branch
(comment precision, MIT attribution, Node 24 CI `--ignore-scripts`), 1
accepted (ruby `#match?` predicate coverage gap, non-blocker)
- [ ] CI matrix all-green on push — verify before merge

## Session trace

`.erpaval/sessions/session-b4fcc7/` — full classifier trace, explore +
research packets, per-AC work logs, validation verdict, extracted
lessons.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ci: add Node 24 to test matrix once tree-sitter@0.25.1 lands on npm deps: bump @types/node 20 → 24 (Node 24 LTS)

1 participant