feat(ingestion): WASM-default parse runtime + Node 24 CI matrix#70
Merged
Conversation
Flips @opencodehub/ingestion from native-default to WASM-default parser so Node 24 becomes a first-class CI target. Native tree-sitter stays fully supported as an opt-in via OCH_NATIVE_PARSER=1 / --native-parser for Node 22 dev speed. Key changes: - parse-worker.ts: inverted dispatch; WASM runs by default, native is opt-in, startup warning emits for both runtimes - wasm-fallback.ts: two-stage cascade in resolveGrammarWasmPath — per- grammar .wasm for 11 langs, vendored .wasm for kotlin/swift/dart - Vendor kotlin/swift/dart WASMs at packages/ingestion/vendor/wasms/ (built from pinned grammar sources, zero drift; 8.1 MB total) - scripts/build-vendor-wasms.sh: reproducible rebuild via finch/docker/ podman/emcc + tree-sitter-cli - PHP mapping now uses tree-sitter-php_only.wasm (matches native loader) - complexity.ts: one-shot degradation warning when native unavailable - wasm-parity.test.ts: extended from 3 to 14 languages; native-prereq softened to test.skip so Node 24 CI runs clean - ci.yml test job: 3 OS × 2 Node (22, 24) via MISE_NODE_VERSION; Node 22 rows exercise native via OCH_NATIVE_PARSER=1, Node 24 rows exercise WASM default with --ignore-scripts - CLAUDE.md + ADR 0013 document the runtime + flag story - Transitive CVE fixes surfaced by OSV: fast-xml-builder → 1.1.7, fast-uri → 3.1.2 (pre-existing on main, closed in passing) Closes #19, #23. Tests: 572/572 ingestion default (WASM), 572/572 with OCH_NATIVE_PARSER=1, all other workspace packages green. OSV: No issues found.
theagenticguy
added a commit
that referenced
this pull request
May 8, 2026
Rebase onto main brought in PR #70's transitive-CVE overrides (fast-xml-builder@1.1.7, fast-uri@3.1.2, hono@4.12.16, ip-address@10.1.1). Regenerating the lockfile pulls those in alongside the M5/M6 pack deps. No source changes — build + typecheck + tests + banned-strings all green locally before push.
theagenticguy
added a commit
that referenced
this pull request
May 10, 2026
## Summary Flips `@opencodehub/ingestion` from **native-default** to **WASM-default** parser so Node 24 becomes a first-class CI target. Native `tree-sitter` stays fully supported as an opt-in via `OCH_NATIVE_PARSER=1` / `--native-parser` for Node 22 dev speed. The upstream blocker for native on Node 24 is [`tree-sitter/node-tree-sitter#276`](tree-sitter/node-tree-sitter#276) — the 0.25.1 fix merged upstream but has been blocked on an npm OIDC publish misconfiguration since mid-2025. Rather than wait indefinitely, WASM path (which has no native ABI dependency) becomes the default. ### What this closes - **Closes #19** — `@types/node` was already bumped to 25.x in a prior PR; the real ask behind #19 was "get Node 24 runnable in CI" — delivered here. - **Closes #23** — Node 24 in the `test` matrix, unblocked. ## What changed ### Runtime dispatch - `packages/ingestion/src/parse/parse-worker.ts` — inverted: WASM runs by default; native is opt-in. Startup warning emits for **both** runtimes so runtime choice is always logged. - `packages/cli/src/index.ts` — `--wasm-only` → `--native-parser` (inverse meaning). - `packages/ingestion/src/parse/parse-worker.test.ts` (new) — 5 unit tests covering all dispatch branches. ### Grammar resolution - `packages/ingestion/src/parse/wasm-fallback.ts` — two-stage cascade in `resolveGrammarWasmPath`: 1. Per-grammar package lookup (11 languages that ship `.wasm` alongside `.node`) 2. Vendored-WASM fallback for kotlin/swift/dart at `packages/ingestion/vendor/wasms/` - PHP mapping now uses `tree-sitter-php_only.wasm` to match the native loader's `mod.php_only` choice at `grammar-registry.ts:253` (previously a silent native-vs-WASM divergence). ### Vendored WASM artifacts (new) - `packages/ingestion/vendor/wasms/{kotlin,swift,dart}.wasm` — 8.1 MB total, built from the exact grammar sources pinned in `package.json` (zero drift). - `packages/ingestion/vendor/wasms/{README,LICENSES}.md` — build provenance + MIT attribution for the three upstream grammars. - `scripts/build-vendor-wasms.sh` — reproducible rebuild via docker / podman / finch (as docker shim) / local emcc + tree-sitter-cli. ### Why not the `tree-sitter-wasms` npm catalog? Investigated and **rejected**: its 0.1.13 artifacts were built with `tree-sitter-cli@0.20.x` and ship the legacy `dylink` custom section (6 bytes). `web-tree-sitter@0.26+` hard-requires the standardized `dylink.0` (8 bytes) and throws `Error: need the dylink section to be first`. See ADR 0013 for byte-level verification and `.erpaval/solutions/architecture-patterns/tree-sitter-wasms-catalog-incompat.md` for the durable lesson. ### Complexity phase degradation - `packages/ingestion/src/pipeline/phases/complexity.ts` — the cyclomatic-complexity phase has an independent native-only `requireFn("tree-sitter")` path that cannot use WASM. Now emits a one-shot stderr warning when native is unavailable instead of silently returning `undefined`. ### Parity test expansion - `packages/ingestion/src/parse/wasm-parity.test.ts` — extended from 3 to 14 tree-sitter languages (COBOL stays out, it's regex-only). Hard `assert.ok(isNativeAvailable())` softened to per-test `skip` so the suite runs clean on Node 24 CI as a no-op. ### CI matrix - `.github/workflows/ci.yml` — `test` job now runs `[ubuntu, macos, windows] × [22, 24]` via `MISE_NODE_VERSION`. Node 22 rows install with scripts + set `OCH_NATIVE_PARSER=1` (exercises native path); Node 24 rows use `--ignore-scripts` + leave env unset (exercises WASM default). ### Security fixes (surfaced by OSV rescan) Two pre-existing transitive vulns closed in passing via `pnpm.overrides`: - `fast-xml-builder@<1.1.7 → 1.1.7` (GHSA-5wm8-gmm8-39j9 CVSS 8.7, GHSA-45c6-75p6-83cc CVSS 6.1) — transitive via `@aws-sdk/core` - `fast-uri@<3.1.2 → 3.1.2` (GHSA-v39h-62p7-jpjc CVSS 7.5, GHSA-q3j6-qgpj-74h6 CVSS 7.5) — transitive via `ajv` Both were present on `main` before this PR; added overrides because the rescan caught them and it costs 2 lines to close. ### Docs + lessons - `docs/adr/0013-parse-runtime-wasm-default.md` — architectural decision record - `CLAUDE.md` — new section documenting `OCH_NATIVE_PARSER`, vendored WASMs, complexity.ts caveat - `.erpaval/solutions/` — three Compound lessons (WASM catalog incompat, pnpm-on-EFS, finch as docker shim) - `.erpaval/INDEX.md` — updated pointers ## Test plan - [x] `pnpm -r clean && pnpm -r build && pnpm -r exec tsc --noEmit && pnpm -r test` — green on Node 22 WASM default (572/572 ingestion, 225/225 CLI, all other packages green) - [x] `OCH_NATIVE_PARSER=1 pnpm --filter @opencodehub/ingestion test` — green (regression gate, 572/572) - [x] `pnpm run lint && pnpm run banned-strings` — green - [x] Simulated native-missing: all 15 parity iterations skip cleanly with descriptive reason, suite exit 0 (proves Node 24 CI won't fail on the parity test) - [x] `osv-scanner scan source --lockfile=pnpm-lock.yaml` — **No issues found** - [x] L2 code review (opus): 4 findings surfaced, 3 fixed in-branch (comment precision, MIT attribution, Node 24 CI `--ignore-scripts`), 1 accepted (ruby `#match?` predicate coverage gap, non-blocker) - [ ] CI matrix all-green on push — verify before merge ## Session trace `.erpaval/sessions/session-b4fcc7/` — full classifier trace, explore + research packets, per-AC work logs, validation verdict, extracted lessons.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Flips
@opencodehub/ingestionfrom native-default to WASM-default parser so Node 24 becomes a first-class CI target. Nativetree-sitterstays fully supported as an opt-in viaOCH_NATIVE_PARSER=1/--native-parserfor Node 22 dev speed.The upstream blocker for native on Node 24 is
tree-sitter/node-tree-sitter#276— the 0.25.1 fix merged upstream but has been blocked on an npm OIDC publish misconfiguration since mid-2025. Rather than wait indefinitely, WASM path (which has no native ABI dependency) becomes the default.What this closes
@types/nodewas already bumped to 25.x in a prior PR; the real ask behind deps: bump @types/node 20 → 24 (Node 24 LTS) #19 was "get Node 24 runnable in CI" — delivered here.testmatrix, unblocked.What changed
Runtime dispatch
packages/ingestion/src/parse/parse-worker.ts— inverted: WASM runs by default; native is opt-in. Startup warning emits for both runtimes so runtime choice is always logged.packages/cli/src/index.ts—--wasm-only→--native-parser(inverse meaning).packages/ingestion/src/parse/parse-worker.test.ts(new) — 5 unit tests covering all dispatch branches.Grammar resolution
packages/ingestion/src/parse/wasm-fallback.ts— two-stage cascade inresolveGrammarWasmPath:.wasmalongside.node)packages/ingestion/vendor/wasms/tree-sitter-php_only.wasmto match the native loader'smod.php_onlychoice atgrammar-registry.ts:253(previously a silent native-vs-WASM divergence).Vendored WASM artifacts (new)
packages/ingestion/vendor/wasms/{kotlin,swift,dart}.wasm— 8.1 MB total, built from the exact grammar sources pinned inpackage.json(zero drift).packages/ingestion/vendor/wasms/{README,LICENSES}.md— build provenance + MIT attribution for the three upstream grammars.scripts/build-vendor-wasms.sh— reproducible rebuild via docker / podman / finch (as docker shim) / local emcc + tree-sitter-cli.Why not the
tree-sitter-wasmsnpm catalog?Investigated and rejected: its 0.1.13 artifacts were built with
tree-sitter-cli@0.20.xand ship the legacydylinkcustom section (6 bytes).web-tree-sitter@0.26+hard-requires the standardizeddylink.0(8 bytes) and throwsError: need the dylink section to be first. See ADR 0013 for byte-level verification and.erpaval/solutions/architecture-patterns/tree-sitter-wasms-catalog-incompat.mdfor the durable lesson.Complexity phase degradation
packages/ingestion/src/pipeline/phases/complexity.ts— the cyclomatic-complexity phase has an independent native-onlyrequireFn("tree-sitter")path that cannot use WASM. Now emits a one-shot stderr warning when native is unavailable instead of silently returningundefined.Parity test expansion
packages/ingestion/src/parse/wasm-parity.test.ts— extended from 3 to 14 tree-sitter languages (COBOL stays out, it's regex-only). Hardassert.ok(isNativeAvailable())softened to per-testskipso the suite runs clean on Node 24 CI as a no-op.CI matrix
.github/workflows/ci.yml—testjob now runs[ubuntu, macos, windows] × [22, 24]viaMISE_NODE_VERSION. Node 22 rows install with scripts + setOCH_NATIVE_PARSER=1(exercises native path); Node 24 rows use--ignore-scripts+ leave env unset (exercises WASM default).Security fixes (surfaced by OSV rescan)
Two pre-existing transitive vulns closed in passing via
pnpm.overrides:fast-xml-builder@<1.1.7 → 1.1.7(GHSA-5wm8-gmm8-39j9 CVSS 8.7, GHSA-45c6-75p6-83cc CVSS 6.1) — transitive via@aws-sdk/corefast-uri@<3.1.2 → 3.1.2(GHSA-v39h-62p7-jpjc CVSS 7.5, GHSA-q3j6-qgpj-74h6 CVSS 7.5) — transitive viaajvBoth were present on
mainbefore this PR; added overrides because the rescan caught them and it costs 2 lines to close.Docs + lessons
docs/adr/0013-parse-runtime-wasm-default.md— architectural decision recordCLAUDE.md— new section documentingOCH_NATIVE_PARSER, vendored WASMs, complexity.ts caveat.erpaval/solutions/— three Compound lessons (WASM catalog incompat, pnpm-on-EFS, finch as docker shim).erpaval/INDEX.md— updated pointersTest plan
pnpm -r clean && pnpm -r build && pnpm -r exec tsc --noEmit && pnpm -r test— green on Node 22 WASM default (572/572 ingestion, 225/225 CLI, all other packages green)OCH_NATIVE_PARSER=1 pnpm --filter @opencodehub/ingestion test— green (regression gate, 572/572)pnpm run lint && pnpm run banned-strings— greenosv-scanner scan source --lockfile=pnpm-lock.yaml— No issues found--ignore-scripts), 1 accepted (ruby#match?predicate coverage gap, non-blocker)Session trace
.erpaval/sessions/session-b4fcc7/— full classifier trace, explore + research packets, per-AC work logs, validation verdict, extracted lessons.