Project objectives inferred from the repo: what success looks like, where the quality bar sits, and what is deliberately out of scope.
-
Answer graph questions in one MCP call. AI coding agents need callers, callees, processes, and blast radius before they write a diff. Grep is textual. Language servers work one file at a time. Embeddings lose precision. OpenCodeHub answers all four from a persisted graph, and the 28-tool MCP surface is the primary product.
-
Stay Apache-2.0 across every runtime dependency. Each transitive dependency sits on the permissive allowlist: Apache-2.0, MIT, BSD, ISC, CC0, BlueOak, 0BSD. CI enforces it on every push. Scanners with incompatible licenses never link into the host. hadolint (GPL) and tflint (MPL/BUSL) run as subprocesses instead. The README frames this as a fork-and-embed posture.
-
Keep the index local, offline, and deterministic.
codehub analyze --offlineopens zero sockets. ThegraphHashis byte-identical whether you run a full or an incremental index at the same commit. Acceptance gate 6 checks that invariant on every run. -
Cover 15 languages; deepen five with SCIP. Fourteen parse through tree-sitter, and a regex provider handles fixed-format COBOL. Five (TypeScript, Python, Go, Rust, Java) also run a native SCIP indexer. Heuristic call edges miss cross-module resolution. So the
scip-indexphase indexes each language once, and theconfidence-demotephase reconciles heuristic edges against compiler-grade ones. A sibling testbed repo holds the gym that gates per-language F1 against SCIP baselines.
-
Gate every eval and gym run in three layers. The testbed checks an absolute F1 floor, a relative F1 delta, and per-case non-regression. Acceptance gate 9 requires at least 40 of 49 Python-eval cases to pass. A soft regression fails the run.
-
Fail CI on any non-zero exit.
pnpm run checkchains lint, typecheck, test, and banned-strings, stopping at the first failure. CI runs the same chain plus OSV, CodeQL, Scorecard, SARIF schema validation, commitlint, and the license allowlist. The banned-strings sweep enforces clean-room IP hygiene. -
Keep MCP responses structured, versioned, and self-describing. Every tool returns a
next_stepsarray. When the index lags HEAD, it adds a_meta["codehub/staleness"]envelope. Failures carry a typed code such asAMBIGUOUS_REPO, never a free-form string. That shape is what makes the tools safe inside automated agent loops.
-
Do not operate a server or SaaS. DuckDB is embedded. The MCP server is a stdio process. ADR 0001 rejects any engine that would need a daemon. The product ships as a CLI plus an MCP server, nothing hosted.
-
Do not port to Rust before it is needed. ADR 0002 measured p95 single-file incremental analysis at 195-250ms on the 100-file fixture, well under the 1s gate. It projects a cold 100k-LOC analyze at 3 to 5 seconds. Rust is deferred to v2.1 or later. The re-trigger conditions are tied to measured latency on real user repos.
- Get from clone to a working agent in one sequence. A user runs
mise install, thenpnpm install --frozen-lockfile, thencodehub setup, thencodehub analyze. After that, their agent can callimpact,detect_changes, andcontexton a real repo. Acceptance gates 1-8 confirm this happy path. Gates 10-15 confirm the opt-in paths for embeddings, scanners, SARIF validation, license audit, and verdict. The README promises this flow.scripts/acceptance.shverifies it.