Skip to content

perf(embeddings): cross-node batching + worker pool#33

Merged
theagenticguy merged 1 commit into
mainfrom
feat/embeddings-parallel-batching
Apr 27, 2026
Merged

perf(embeddings): cross-node batching + worker pool#33
theagenticguy merged 1 commit into
mainfrom
feat/embeddings-parallel-batching

Conversation

@theagenticguy
Copy link
Copy Markdown
Owner

Summary

  • Refactors the embeddings phase from one-embedding-per-node-per-await into two stages: a job-collection pass that walks symbol/file/community tiers in canonical order producing {text, emitRow} records, and a dispatch loop that fires workers × batchSize embeds concurrently per wave and scatters vectors back into the row buffer.
  • Adds a Piscina pool of independent OnnxEmbedder workers (packages/ingestion/src/pipeline/phases/embedder-{worker,pool}.ts). Each worker holds its own ONNX session; the pool is exposed behind an Embedder-shaped facade so the phase doesn't branch. A main-thread canary OnnxEmbedder opens first so EmbedderNotSetupError keeps its class identity across the structured-clone boundary.
  • New flags: --embeddings-workers <n|auto> and --embeddings-batch-size <n> (defaults: 1 and 32 — unchanged single-threaded behaviour out of the box).

Motivation

Real-world codehub analyze --embeddings --force --granularity symbol,file,community on a ~1,922-file AWS codebase sat at 95% CPU for 7+ minutes before the refactor. The phase was awaiting embedBatch() per node inside a single-threaded ONNX session (intraOpNumThreads: 1, graphOptimizationLevel: "disabled" — required for the graphHash determinism contract), so there was no concurrency anywhere in the stack.

Determinism

The graphHash / embeddingsHash contract is preserved:

  • Canonical tier ordering (symbol → file → community) is unchanged.
  • Rows are still sorted by (granularity, nodeId, chunkIndex) before hashing.
  • openOnnxEmbedder()'s deterministic knobs are intact per worker — which input produces which vector is independent of which worker ran it.
  • New regression test asserts embeddingsHash at batchSize=1 equals embeddingsHash at batchSize=32.

Expected speedup

On an M-series laptop with --embeddings-workers auto --embeddings-batch-size 32, the 7-minute AWSQuickWork run should drop to roughly 1–2 minutes. --embeddings-int8 cuts that further.

Test plan

  • pnpm build — clean
  • pnpm --filter @opencodehub/ingestion test — 576/576 pass
  • New test: embeddings.test.tsbatchSize=1 vs batchSize=32 produce byte-identical embeddingsHash
  • codehub analyze --help surfaces --embeddings-workers and --embeddings-batch-size
  • End-to-end: run codehub analyze AWSQuickWork --embeddings --force --granularity symbol,file,community --embeddings-workers auto and confirm wall time drop + identical embeddingsHash vs a single-threaded control run

@theagenticguy theagenticguy force-pushed the feat/embeddings-parallel-batching branch from 282cbce to 9c762e8 Compare April 27, 2026 14:41
The embeddings phase was pegged to one embedding per node per await,
behind a single-threaded ONNX session — an AWSQuickWork run sat at 95%
CPU for 7+ minutes on 1,922 files.

Refactor into two stages: walk tiers once to collect (text, emitRow)
jobs in canonical order, then dispatch in fixed-size batches across a
configurable Piscina pool of OnnxEmbedder workers. Each wave fires
workers × batchSize embeds concurrently and scatters vectors back into
the row buffer. Row ordering and the embeddingsHash contract are
preserved — confirmed by a new test that asserts byte-identical hashes
across batchSize=1 vs 32.

- New flags: --embeddings-workers <n|auto>, --embeddings-batch-size <n>.
- A main-thread canary OnnxEmbedder opens before the pool so
  EmbedderNotSetupError keeps its class identity across the
  structured-clone boundary.
- HTTP backend unaffected (pool flag ignored when endpoint is set).
@theagenticguy theagenticguy force-pushed the feat/embeddings-parallel-batching branch from 9c762e8 to 8bbf5b8 Compare April 27, 2026 15:20
@theagenticguy theagenticguy enabled auto-merge (squash) April 27, 2026 15:30
@theagenticguy theagenticguy disabled auto-merge April 27, 2026 15:30
@theagenticguy theagenticguy merged commit f8454b5 into main Apr 27, 2026
19 checks passed
@theagenticguy theagenticguy deleted the feat/embeddings-parallel-batching branch April 27, 2026 15:30
@github-actions github-actions Bot mentioned this pull request Apr 27, 2026
theagenticguy added a commit that referenced this pull request May 1, 2026
## Summary
- Refactors the embeddings phase from one-embedding-per-node-per-await
into two stages: a **job-collection** pass that walks
symbol/file/community tiers in canonical order producing `{text,
emitRow}` records, and a **dispatch** loop that fires `workers ×
batchSize` embeds concurrently per wave and scatters vectors back into
the row buffer.
- Adds a Piscina pool of independent `OnnxEmbedder` workers
(`packages/ingestion/src/pipeline/phases/embedder-{worker,pool}.ts`).
Each worker holds its own ONNX session; the pool is exposed behind an
`Embedder`-shaped facade so the phase doesn't branch. A main-thread
canary `OnnxEmbedder` opens first so `EmbedderNotSetupError` keeps its
class identity across the structured-clone boundary.
- New flags: `--embeddings-workers <n|auto>` and
`--embeddings-batch-size <n>` (defaults: 1 and 32 — unchanged
single-threaded behaviour out of the box).

### Motivation
Real-world `codehub analyze --embeddings --force --granularity
symbol,file,community` on a ~1,922-file AWS codebase sat at 95% CPU for
7+ minutes before the refactor. The phase was awaiting `embedBatch()`
per node inside a single-threaded ONNX session (`intraOpNumThreads: 1`,
`graphOptimizationLevel: "disabled"` — required for the graphHash
determinism contract), so there was no concurrency anywhere in the
stack.

### Determinism
The graphHash / `embeddingsHash` contract is preserved:
- Canonical tier ordering (symbol → file → community) is unchanged.
- Rows are still sorted by `(granularity, nodeId, chunkIndex)` before
hashing.
- `openOnnxEmbedder()`'s deterministic knobs are intact per worker —
which input produces which vector is independent of which worker ran it.
- New regression test asserts `embeddingsHash` at `batchSize=1` equals
`embeddingsHash` at `batchSize=32`.

### Expected speedup
On an M-series laptop with `--embeddings-workers auto
--embeddings-batch-size 32`, the 7-minute AWSQuickWork run should drop
to roughly 1–2 minutes. `--embeddings-int8` cuts that further.

## Test plan
- [x] `pnpm build` — clean
- [x] `pnpm --filter @opencodehub/ingestion test` — 576/576 pass
- [x] New test: `embeddings.test.ts` — `batchSize=1` vs `batchSize=32`
produce byte-identical `embeddingsHash`
- [x] `codehub analyze --help` surfaces `--embeddings-workers` and
`--embeddings-batch-size`
- [ ] End-to-end: run `codehub analyze AWSQuickWork --embeddings --force
--granularity symbol,file,community --embeddings-workers auto` and
confirm wall time drop + identical `embeddingsHash` vs a single-threaded
control run
This was referenced May 1, 2026
@github-actions github-actions Bot mentioned this pull request May 11, 2026
theagenticguy added a commit that referenced this pull request May 12, 2026
🤖 Automated release via release-please
---


<details><summary>analysis: 0.1.1</summary>

##
[0.1.1](analysis-v0.1.0...analysis-v0.1.1)
(2026-05-12)


### Features

* initial public release of opencodehub v0.1.1
([3f23006](3f23006))
* M7 LadybugDB default + IGraphStore abstraction hardening (Track A)
([#71](#71))
([0175113](0175113))


### Documentation

* **repo:** pre-publish npm readiness — READMEs, GOVERNANCE, CODEOWNERS,
package metadata
([dd10f72](dd10f72))


### Refactoring

* consolidate repo-local dir references on META_DIR_NAME
([ce4b63d](ce4b63d))


### Dependencies

* The following workspace dependencies were updated
  * dependencies
    * @opencodehub/core-types bumped to 0.2.0
    * @opencodehub/sarif bumped to 0.1.1
    * @opencodehub/storage bumped to 0.1.1
</details>

<details><summary>cli: 0.2.0</summary>

##
[0.2.0](cli-v0.1.0...cli-v0.2.0)
(2026-05-12)


### ⚠ BREAKING CHANGES

* replace LSP oracle with SCIP indexers (TS/Py/Go/Rust/Java)
([#32](#32))

### Features

* artifact factory + codehub init + CI UX fixes
([#38](#38))
([d6ffafa](d6ffafa))
* **cli:** add --granularity flag to analyze for hierarchical embeddings
([defa9b6](defa9b6))
* **cli:** add --strict-detectors flag + ts-morph optional dep
([329f5c3](329f5c3))
* **cli:** add exact-name resolver and disambiguation flags to context
([7f279a9](7f279a9))
* **cli:** flip query hybrid-by-default with --bm25-only +
--rerank-top-k
([3e924b5](3e924b5))
* detect-secrets as 20th scanner (Track B)
([#72](#72))
([8fbdd61](8fbdd61))
* **embedder:** replace Arctic Embed XS with gte-modernbert-base
([#31](#31))
([1214071](1214071))
* **ingestion:** WASM fallback via web-tree-sitter + --wasm-only flag
([cecb401](cecb401))
* initial public release of opencodehub v0.1.1
([3f23006](3f23006))
* M7 LadybugDB default + IGraphStore abstraction hardening (Track A)
([#71](#71))
([0175113](0175113))
* **mcp,cli:** join symbol summaries into query results (P04 surface)
([3d73b65](3d73b65))
* replace LSP oracle with SCIP indexers (TS/Py/Go/Rust/Java)
([#32](#32))
([1cceb24](1cceb24))
* **scanners:** persist partialFingerprint, baselineState,
suppressedJson
([fb4585d](fb4585d))
* **search:** add filter-aware zoom retrieval across hierarchical tiers
([5ab80c4](5ab80c4))
* v1 finalize Track C — debt sweep (7 ACs)
([#73](#73))
([06d2bb1](06d2bb1))


### Bug Fixes

* **cli:** accurate doctor native-binding + int8 weights checks
([fb569f9](fb569f9))
* **storage:** wire @ladybugdb/core binding, fix lbug open() guards,
upgrade pnpm v10→v11
([#93](#93))
([78d6a85](78d6a85))


### Performance

* **embeddings:** cross-node batching + worker pool
([#33](#33))
([acb59d0](acb59d0))


### Documentation

* **repo:** pre-publish npm readiness — READMEs, GOVERNANCE, CODEOWNERS,
package metadata
([dd10f72](dd10f72))


### Dependencies

* The following workspace dependencies were updated
  * dependencies
    * @opencodehub/analysis bumped to 0.1.1
    * @opencodehub/core-types bumped to 0.2.0
    * @opencodehub/embedder bumped to 0.1.1
    * @opencodehub/ingestion bumped to 0.2.0
    * @opencodehub/mcp bumped to 0.2.0
    * @opencodehub/sarif bumped to 0.1.1
    * @opencodehub/scanners bumped to 0.1.1
    * @opencodehub/search bumped to 0.1.1
    * @opencodehub/storage bumped to 0.1.1
</details>

<details><summary>core-types: 0.2.0</summary>

##
[0.2.0](core-types-v0.1.0...core-types-v0.2.0)
(2026-05-12)


### ⚠ BREAKING CHANGES

* replace LSP oracle with SCIP indexers (TS/Py/Go/Rust/Java)
([#32](#32))

### Features

* **core-types:** scaffold v1.1 node-shape extensions for planned
packets
([e17a4b5](e17a4b5))
* initial public release of opencodehub v0.1.1
([3f23006](3f23006))
* M7 LadybugDB default + IGraphStore abstraction hardening (Track A)
([#71](#71))
([0175113](0175113))
* replace LSP oracle with SCIP indexers (TS/Py/Go/Rust/Java)
([#32](#32))
([1cceb24](1cceb24))
* **storage:** populate reserved complexity, coverage, deadness columns
([c81e4c3](c81e4c3))
* v1 finalize Track C — debt sweep (7 ACs)
([#73](#73))
([06d2bb1](06d2bb1))


### Documentation

* **repo:** pre-publish npm readiness — READMEs, GOVERNANCE, CODEOWNERS,
package metadata
([dd10f72](dd10f72))


### Refactoring

* **core-types:** centralize LanguageId in core-types
([4c33fc7](4c33fc7))
</details>

<details><summary>embedder: 0.1.1</summary>

##
[0.1.1](embedder-v0.1.0...embedder-v0.1.1)
(2026-05-12)


### Features

* detect-secrets as 20th scanner (Track B)
([#72](#72))
([8fbdd61](8fbdd61))
* **embedder:** add SageMaker backend for remote embeddings
([9b5c53d](9b5c53d))
* **embedder:** replace Arctic Embed XS with gte-modernbert-base
([#31](#31))
([1214071](1214071))
* initial public release of opencodehub v0.1.1
([3f23006](3f23006))
* v1 finalize Track C — debt sweep (7 ACs)
([#73](#73))
([06d2bb1](06d2bb1))


### Documentation

* **repo:** pre-publish npm readiness — READMEs, GOVERNANCE, CODEOWNERS,
package metadata
([dd10f72](dd10f72))


### Dependencies

* The following workspace dependencies were updated
  * dependencies
    * @opencodehub/core-types bumped to 0.2.0
</details>

<details><summary>ingestion: 0.2.0</summary>

##
[0.2.0](ingestion-v0.1.0...ingestion-v0.2.0)
(2026-05-12)


### ⚠ BREAKING CHANGES

* replace LSP oracle with SCIP indexers (TS/Py/Go/Rust/Java)
([#32](#32))

### Features

* **cli:** add --strict-detectors flag + ts-morph optional dep
([329f5c3](329f5c3))
* **embedder:** add SageMaker backend for remote embeddings
([9b5c53d](9b5c53d))
* **embedder:** replace Arctic Embed XS with gte-modernbert-base
([#31](#31))
([1214071](1214071))
* **ingestion:** [@doc](https://github.com/doc) captures + description
field populated
([d63dfa6](d63dfa6))
* **ingestion:** add receiver resolver + detector precision (P06)
([431f428](431f428))
* **ingestion:** add top-20 framework detection catalog and dispatcher
([02f4864](02f4864))
* **ingestion:** capture MCP tool inputSchema as canonical JSON
([9872710](9872710))
* **ingestion:** emit CodeElement stubs for external imports
([49eefe7](49eefe7))
* **ingestion:** emit file-level and community-level embeddings
([09a117f](09a117f))
* **ingestion:** FastAPI, Spring, NestJS, Rails route detectors
([62bebfb](62bebfb))
* **ingestion:** Go IMPLEMENTS method-set resolver + C++20 import
([85c60f9](85c60f9))
* **ingestion:** nested .gitignore with layered negation
([40b5286](40b5286))
* **ingestion:** populate DependencyNode license from manifest
([f947194](f947194))
* **ingestion:** provider-driven complexity + Halstead volume
([5e1379a](5e1379a))
* **ingestion:** soft-fail summarize on credential errors, thread
summaryModel
([d90eb38](d90eb38))
* **ingestion:** WASM fallback via web-tree-sitter + --wasm-only flag
([cecb401](cecb401))
* **ingestion:** wire framework catalog into profile phase
([d491401](d491401))
* initial public release of opencodehub v0.1.1
([3f23006](3f23006))
* replace LSP oracle with SCIP indexers (TS/Py/Go/Rust/Java)
([#32](#32))
([1cceb24](1cceb24))
* v1 finalize Track C — debt sweep (7 ACs)
([#73](#73))
([06d2bb1](06d2bb1))


### Bug Fixes

* **ingestion:** enumerate git submodule paths in the scan phase
([d290d04](d290d04))
* **ingestion:** skip submodule paths in the ownership blame pass
([e28f3e6](e28f3e6))
* **scip-ingest:** resolve caller/callee correctly for SCIP edges
([c15f928](c15f928))


### Performance

* **embeddings:** cross-node batching + worker pool
([#33](#33))
([acb59d0](acb59d0))


### Documentation

* **repo:** pre-publish npm readiness — READMEs, GOVERNANCE, CODEOWNERS,
package metadata
([dd10f72](dd10f72))


### Refactoring

* consolidate repo-local dir references on META_DIR_NAME
([ce4b63d](ce4b63d))
* **core-types:** centralize LanguageId in core-types
([4c33fc7](4c33fc7))


### Dependencies

* The following workspace dependencies were updated
  * dependencies
    * @opencodehub/analysis bumped to 0.1.1
    * @opencodehub/core-types bumped to 0.2.0
    * @opencodehub/embedder bumped to 0.1.1
    * @opencodehub/storage bumped to 0.1.1
</details>

<details><summary>mcp: 0.2.0</summary>

##
[0.2.0](mcp-v0.1.0...mcp-v0.2.0)
(2026-05-12)


### ⚠ BREAKING CHANGES

* replace LSP oracle with SCIP indexers (TS/Py/Go/Rust/Java)
([#32](#32))

### Features

* **embedder:** replace Arctic Embed XS with gte-modernbert-base
([#31](#31))
([1214071](1214071))
* initial public release of opencodehub v0.1.1
([3f23006](3f23006))
* M7 LadybugDB default + IGraphStore abstraction hardening (Track A)
([#71](#71))
([0175113](0175113))
* **mcp,cli:** join symbol summaries into query results (P04 surface)
([3d73b65](3d73b65))
* **mcp:** short-circuit list_findings_delta via stored baselineState
([4d9c187](4d9c187))
* **mcp:** surface structured FrameworkDetection in project_profile tool
([15fb309](15fb309))
* replace LSP oracle with SCIP indexers (TS/Py/Go/Rust/Java)
([#32](#32))
([1cceb24](1cceb24))
* **search:** add filter-aware zoom retrieval across hierarchical tiers
([5ab80c4](5ab80c4))
* v1 finalize Track C — debt sweep (7 ACs)
([#73](#73))
([06d2bb1](06d2bb1))


### Documentation

* **repo:** pre-publish npm readiness — READMEs, GOVERNANCE, CODEOWNERS,
package metadata
([dd10f72](dd10f72))


### Refactoring

* **mcp:** consume shared tryOpenEmbedder + embeddingsPopulated from
@opencodehub/search
([54f00de](54f00de))


### Dependencies

* The following workspace dependencies were updated
  * dependencies
    * @opencodehub/analysis bumped to 0.1.1
    * @opencodehub/core-types bumped to 0.2.0
    * @opencodehub/embedder bumped to 0.1.1
    * @opencodehub/sarif bumped to 0.1.1
    * @opencodehub/scanners bumped to 0.1.1
    * @opencodehub/search bumped to 0.1.1
    * @opencodehub/storage bumped to 0.1.1
</details>

<details><summary>sarif: 0.1.1</summary>

##
[0.1.1](sarif-v0.1.0...sarif-v0.1.1)
(2026-05-12)


### Features

* initial public release of opencodehub v0.1.1
([3f23006](3f23006))


### Documentation

* **repo:** pre-publish npm readiness — READMEs, GOVERNANCE, CODEOWNERS,
package metadata
([dd10f72](dd10f72))
</details>

<details><summary>scanners: 0.1.1</summary>

##
[0.1.1](scanners-v0.1.0...scanners-v0.1.1)
(2026-05-12)


### Features

* detect-secrets as 20th scanner (Track B)
([#72](#72))
([8fbdd61](8fbdd61))
* **embedder:** replace Arctic Embed XS with gte-modernbert-base
([#31](#31))
([1214071](1214071))
* initial public release of opencodehub v0.1.1
([3f23006](3f23006))
* v1 finalize Track C — debt sweep (7 ACs)
([#73](#73))
([06d2bb1](06d2bb1))


### Documentation

* **repo:** pre-publish npm readiness — READMEs, GOVERNANCE, CODEOWNERS,
package metadata
([dd10f72](dd10f72))


### Dependencies

* The following workspace dependencies were updated
  * dependencies
    * @opencodehub/sarif bumped to 0.1.1
</details>

<details><summary>search: 0.1.1</summary>

##
[0.1.1](search-v0.1.0...search-v0.1.1)
(2026-05-12)


### Features

* detect-secrets as 20th scanner (Track B)
([#72](#72))
([8fbdd61](8fbdd61))
* **embedder:** replace Arctic Embed XS with gte-modernbert-base
([#31](#31))
([1214071](1214071))
* initial public release of opencodehub v0.1.1
([3f23006](3f23006))
* M7 LadybugDB default + IGraphStore abstraction hardening (Track A)
([#71](#71))
([0175113](0175113))
* **search:** add filter-aware zoom retrieval across hierarchical tiers
([5ab80c4](5ab80c4))
* **search:** extract tryOpenEmbedder + embeddingsPopulated, demote
NullEmbedder throw
([c4cc680](c4cc680))


### Documentation

* **repo:** pre-publish npm readiness — READMEs, GOVERNANCE, CODEOWNERS,
package metadata
([dd10f72](dd10f72))


### Dependencies

* The following workspace dependencies were updated
  * dependencies
    * @opencodehub/core-types bumped to 0.2.0
    * @opencodehub/storage bumped to 0.1.1
</details>

<details><summary>storage: 0.1.1</summary>

##
[0.1.1](storage-v0.1.0...storage-v0.1.1)
(2026-05-12)


### Features

* **embedder:** replace Arctic Embed XS with gte-modernbert-base
([#31](#31))
([1214071](1214071))
* **ingestion:** emit file-level and community-level embeddings
([09a117f](09a117f))
* initial public release of opencodehub v0.1.1
([3f23006](3f23006))
* M7 LadybugDB default + IGraphStore abstraction hardening (Track A)
([#71](#71))
([0175113](0175113))
* **mcp:** short-circuit list_findings_delta via stored baselineState
([4d9c187](4d9c187))
* **search:** add filter-aware zoom retrieval across hierarchical tiers
([5ab80c4](5ab80c4))
* **storage:** add granularity column to embeddings for hierarchical
retrieval
([b5bd5f8](b5bd5f8))
* **storage:** add summary fields to SearchResult and batch lookup
helper
([4944a56](4944a56))
* **storage:** persist structured FrameworkDetection in frameworks_json
([75423fe](75423fe))
* **storage:** populate reserved complexity, coverage, deadness columns
([c81e4c3](c81e4c3))
* v1 finalize Track C — debt sweep (7 ACs)
([#73](#73))
([06d2bb1](06d2bb1))


### Bug Fixes

* **storage:** wire @ladybugdb/core binding, fix lbug open() guards,
upgrade pnpm v10→v11
([#93](#93))
([78d6a85](78d6a85))


### Documentation

* **repo:** pre-publish npm readiness — READMEs, GOVERNANCE, CODEOWNERS,
package metadata
([dd10f72](dd10f72))


### Dependencies

* The following workspace dependencies were updated
  * dependencies
    * @opencodehub/core-types bumped to 0.2.0
</details>

<details><summary>root: 0.2.0</summary>

##
[0.2.0](root-v0.1.1...root-v0.2.0)
(2026-05-12)


### ⚠ BREAKING CHANGES

* **release:** footers in the commit log.
* replace LSP oracle with SCIP indexers (TS/Py/Go/Rust/Java)
([#32](#32))

### Features

* artifact factory + codehub init + CI UX fixes
([#38](#38))
([d6ffafa](d6ffafa))
* cleanups
([bf1536e](bf1536e))
* **cli:** add --granularity flag to analyze for hierarchical embeddings
([defa9b6](defa9b6))
* **cli:** add --strict-detectors flag + ts-morph optional dep
([329f5c3](329f5c3))
* **cli:** add exact-name resolver and disambiguation flags to context
([7f279a9](7f279a9))
* **cli:** flip query hybrid-by-default with --bm25-only +
--rerank-top-k
([3e924b5](3e924b5))
* **core-types:** scaffold v1.1 node-shape extensions for planned
packets
([e17a4b5](e17a4b5))
* detect-secrets as 20th scanner (Track B)
([#72](#72))
([8fbdd61](8fbdd61))
* **embedder:** add SageMaker backend for remote embeddings
([9b5c53d](9b5c53d))
* **embedder:** replace Arctic Embed XS with gte-modernbert-base
([#31](#31))
([1214071](1214071))
* **gym:** add rust-spike trigger benchmark
([43c26d3](43c26d3))
* **ingestion:** [@doc](https://github.com/doc) captures + description
field populated
([d63dfa6](d63dfa6))
* **ingestion:** add receiver resolver + detector precision (P06)
([431f428](431f428))
* **ingestion:** add top-20 framework detection catalog and dispatcher
([02f4864](02f4864))
* **ingestion:** capture MCP tool inputSchema as canonical JSON
([9872710](9872710))
* **ingestion:** emit CodeElement stubs for external imports
([49eefe7](49eefe7))
* **ingestion:** emit file-level and community-level embeddings
([09a117f](09a117f))
* **ingestion:** FastAPI, Spring, NestJS, Rails route detectors
([62bebfb](62bebfb))
* **ingestion:** Go IMPLEMENTS method-set resolver + C++20 import
([85c60f9](85c60f9))
* **ingestion:** nested .gitignore with layered negation
([40b5286](40b5286))
* **ingestion:** populate DependencyNode license from manifest
([f947194](f947194))
* **ingestion:** provider-driven complexity + Halstead volume
([5e1379a](5e1379a))
* **ingestion:** soft-fail summarize on credential errors, thread
summaryModel
([d90eb38](d90eb38))
* **ingestion:** WASM fallback via web-tree-sitter + --wasm-only flag
([cecb401](cecb401))
* **ingestion:** wire framework catalog into profile phase
([d491401](d491401))
* initial public release of opencodehub v0.1.1
([3f23006](3f23006))
* M7 LadybugDB default + IGraphStore abstraction hardening (Track A)
([#71](#71))
([0175113](0175113))
* **mcp,cli:** join symbol summaries into query results (P04 surface)
([3d73b65](3d73b65))
* **mcp:** short-circuit list_findings_delta via stored baselineState
([4d9c187](4d9c187))
* **mcp:** surface structured FrameworkDetection in project_profile tool
([15fb309](15fb309))
* replace LSP oracle with SCIP indexers (TS/Py/Go/Rust/Java)
([#32](#32))
([1cceb24](1cceb24))
* **scanners:** persist partialFingerprint, baselineState,
suppressedJson
([fb4585d](fb4585d))
* **search:** add filter-aware zoom retrieval across hierarchical tiers
([5ab80c4](5ab80c4))
* **search:** extract tryOpenEmbedder + embeddingsPopulated, demote
NullEmbedder throw
([c4cc680](c4cc680))
* **storage:** add granularity column to embeddings for hierarchical
retrieval
([b5bd5f8](b5bd5f8))
* **storage:** add summary fields to SearchResult and batch lookup
helper
([4944a56](4944a56))
* **storage:** persist structured FrameworkDetection in frameworks_json
([75423fe](75423fe))
* **storage:** populate reserved complexity, coverage, deadness columns
([c81e4c3](c81e4c3))
* v1 finalize Track C — debt sweep (7 ACs)
([#73](#73))
([06d2bb1](06d2bb1))
* v1 finalize Track D — dogfood polish (6 ACs)
([#75](#75))
([e9da048](e9da048))


### Bug Fixes

* **ci:** pin gopls@v0.18.1 for Go 1.23 + add pnpm build-script
allowlist
([c78b31d](c78b31d))
* **cli:** accurate doctor native-binding + int8 weights checks
([fb569f9](fb569f9))
* **deps:** bump minimatch override to 9.0.7 (GHSA-23c5/-7r86)
([7f6e2ae](7f6e2ae))
* **deps:** pin brace-expansion/minimatch/picomatch to patched versions
([5a7d1e0](5a7d1e0))
* **deps:** refresh pnpm-lock.yaml with ts-morph optional dep from P06
([0dfee11](0dfee11))
* **docs:** rename agents/*.md to .mdx so JSX components render
([#89](#89))
([d2d8bc7](d2d8bc7))
* **gym:** update corpus test waiver ID to window.desktop after PR
[#38](#38) rename
([933b5f2](933b5f2))
* **ingestion:** enumerate git submodule paths in the scan phase
([d290d04](d290d04))
* **ingestion:** skip submodule paths in the ownership blame pass
([e28f3e6](e28f3e6))
* **repo:** replace stale lsp-oracle tsconfig reference with scip-ingest
([0ce5e29](0ce5e29))
* **scip-ingest:** resolve caller/callee correctly for SCIP edges
([c15f928](c15f928))
* **storage:** wire @ladybugdb/core binding, fix lbug open() guards,
upgrade pnpm v10→v11
([#93](#93))
([78d6a85](78d6a85))


### Performance

* **embeddings:** cross-node batching + worker pool
([#33](#33))
([acb59d0](acb59d0))


### Documentation

* add SPECS, USECASE, and OBJECTIVES docs
([f3120de](f3120de))
* **adr:** record hierarchical embeddings decision (0004)
([6d28631](6d28631))
* **adr:** update 0002 with P09 Phase 1 measurements
([92b9a1c](92b9a1c))
* clean-slate v1 — drop migration prose, milestone framing, 0.x caveats
([#90](#90))
([af88fbc](af88fbc))
* compound — durable lessons from docs site revival
([#88](#88))
([95642f0](95642f0))
* compound — durable lessons from v1 upstream bug sweep
([#77](#77))
([60eef57](60eef57))
* deep refresh + sync + new architecture pages
([3693ddd](3693ddd))
* **repo:** durable lesson — set NODE_ENV at script scope for astro in
CI
([18c159b](18c159b))
* **repo:** durable lesson — stale tsconfig project references
([ea67d7a](ea67d7a))
* **repo:** EARS 006 spec — v1 finalize (M7 + constraint-10 + debt +
dogfood)
([67198e3](67198e3))
* **repo:** pre-publish npm readiness — READMEs, GOVERNANCE, CODEOWNERS,
package metadata
([dd10f72](dd10f72))
* restore Starlight site + refresh for v1 + agent-friendly USAGE section
([#87](#87))
([d9b2b30](d9b2b30))
* **site:** add Astro Starlight docs site + GitHub Pages deploy
([#34](#34))
([5ce0191](5ce0191))
* **site:** add llms.txt + Copy-as-Markdown + Open-in-ChatGPT/Claude
([#36](#36))
([149ba4e](149ba4e))
* **site:** inject LLM-nav banner + 'See also' footer into every .md
([#37](#37))
([77190a5](77190a5))
* strip legacy stanzas + capture session lessons
([85f6881](85f6881))


### Refactoring

* consolidate repo-local dir references on META_DIR_NAME
([ce4b63d](ce4b63d))
* **core-types:** centralize LanguageId in core-types
([4c33fc7](4c33fc7))
* **mcp:** consume shared tryOpenEmbedder + embeddingsPopulated from
@opencodehub/search
([54f00de](54f00de))
* **plugin:** file-level packet skeletons for codehub-document
([40a09c8](40a09c8))


### CI

* **release:** keep 0.x semver — breaking changes bump minor, feats bump
patch
([a6ee4bf](a6ee4bf))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Laith Al-Saadoon <alsaadoonlaith@gmail.com>
Co-authored-by: Laith Al-Saadoon <9553966+theagenticguy@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant