chore: security alerts + supply-chain release hardening#78
Merged
Conversation
- git.ts: replace `^\+\+\+\s+(?:b\/)?(.+)$` regex with non-regex
startsWith + slice scan so `+++\t\t\t...` lines cannot trigger
polynomial backtracking.
- http-patterns.ts:normalizeHttpPath: replace `\?.*$` and `\/+$`
with deterministic indexOf/charCodeAt loops.
- http-patterns.ts:PY_ROUTE_DECORATOR_RE: cap the path and methods
literals at 256 chars; the unbounded `+` quantifier is what made
the regex slow on `@A.route("!",methods=[\\...`.
Behaviour preserved: same set of matched paths, same hunk parser
contract. Existing analysis tests (127) still pass.
Fixes alerts #41 #119 #120 from CodeQL.
The `cfg.endpointUrl.replace(/\/+$/, "")` call trimmed trailing slashes via a regex that runs polynomial-time on inputs with many `/` characters. Replace with a character-by-character loop using `charCodeAt` — same result, deterministic worst case. Fixes alert #121 from CodeQL.
The yarn.lock entry regex `^"?([^"\s@][^"\s]*)@[^"\n]*"?:\s*$` had an inner char class `[^"\s]*` that overlapped with the trailing `@` delimiter, so an input like `!@@@@@@@@@@` would let the regex backtrack across every `@` looking for a match. Tighten the inner class to `[^"\s@]*` so the engine commits to the first `@` it sees. Behaviour is unchanged for valid yarn.lock entries — the original regex already forbade `@` in the package-name leading character, and unscoped names never contain `@` mid-string. Fixes alert #180 from CodeQL.
Resolves the Scorecard `Pinned-Dependencies` MEDIUM alerts by replacing every `uses: <action>@<tag>` reference with a SHA-pinned form plus a trailing comment carrying the original tag for human readability. The trailing comment is also what Dependabot rewrites on weekly SHA bumps. Tag-to-SHA mapping (resolved via `gh api /repos/<owner>/<repo>/commits/<tag>`): actions/checkout@v6 -> de0fac2e4500dabe0009e67214ff5f5447ce83dd actions/upload-artifact@v7 -> 043fb46d1a93c77aae656e7c1c64a875d1fc6a0a jdx/mise-action@v4 -> 1648a7812b9aeae629881980618f079932869151 github/codeql-action/* @v4 -> 68bde559dea0fdcac2102bfdf6230c5f70eb485e ossf/scorecard-action@v2.4.3 -> 4eaacf0543bb3f2c246792bd56e8cdeffafb205a Files touched: ci.yml, codeql.yml, commitlint.yml, och-self-scan.yml, osv.yml, scorecard.yml, semgrep.yml. release-please.yml is being rewritten in parallel by the release-hardening track and already carries SHA pins as part of that rewrite.
Resolves Scorecard `Token-Permissions` HIGH alerts by demoting the top-level workflow scope to `contents: read` and lifting the write-scopes onto the single job that needs them. CodeQL's analyze job keeps `security-events: write` for the SARIF upload; semgrep's job keeps the same plus `contents: read`. Same effective permissions, but any unrelated step in either workflow now runs read-only. Files: codeql.yml, semgrep.yml. Out of scope here: - sbom.yml — file removed in the parallel release-hardening track (SBOM generation moved into the new release.yml). - release-please.yml — rewritten in the parallel release-hardening track with the same hoist already applied.
…ign signing Single tag-triggered workflow that anchors every job to the released commit SHA. Listens on `release: published`, `workflow_call`, and `workflow_dispatch` so it works with default GITHUB_TOKEN (via inline workflow_call from release-please.yml), with a PAT-driven release-please publish, and as a manual hotfix path. Each release ships: - opencodehub-pack.tar.gz (deterministic 100k-token code-pack BOM) - SBOM.cdx.json (CycloneDX 1.5) - och-scan.sarif (OCH self-scan at the released SHA) - *.sig.bundle (cosign keyless Sigstore bundles for each blob) Top-level permissions are read-only; per-job grants escalate where strictly required (id-token: write for OIDC -> Fulcio + SLSA, contents: write for release uploads, security-events: write for SARIF upload). npm-publish job is gated by OCH_NPM_PUBLISH_ENABLED repo variable so the dry-run scaffolding stays inert until packages flip to public. All third-party actions pinned to commit SHAs with version comments; the SLSA generator reusable workflow is the single tag-pinned exception (the SLSA project's trust model relies on the tag).
…tion Adds the release-time-only checks that don't belong in everyday CI: - npm-audit at high+ severity - pnpm lockfile integrity (--frozen-lockfile --ignore-scripts) - detect-secrets full sweep against .secrets.baseline - license allowlist re-assertion Each job is gated `if: startsWith(github.head_ref, 'release-please--')` so non-release PRs are no-ops. The aggregator job (`pre-release-gate`) runs `if: always()` and treats skipped dependencies as pass — so the required-status-check name resolves uniformly on every PR while actually gating only release-please PRs. Configure branch protection on main to require the `Pre-release gate (aggregate)` job. Documented in docs/RELEASE.md.
Re-apply the analysis-package changes from 050acd7 that were lost when c47286d (the parallel ci-pinning track) committed an old tree snapshot. - git.ts: replace the `+++` header regex with non-regex startsWith + slice scan so polynomial backtracking on tab-padded diff headers is impossible. - http-patterns.ts:normalizeHttpPath: replace `\?.*$` and `\/+$` with deterministic indexOf/charCodeAt loops. - http-patterns.ts:PY_ROUTE_DECORATOR_RE: cap path and methods literals at 256 chars to bound regex work. Behaviour preserved; existing analysis tests (127) still pass. Fixes alerts #41 #119 #120 from CodeQL.
Two changes wired together: 1. release-please.yml hands off to release.yml via uses / workflow_call after `release_created` is true, instead of inlining the artifact pipeline. This sidesteps the GITHUB_TOKEN downstream-event suppression rule (default token does NOT fire downstream `release: published` events). The inline call works regardless of token type. 2. sbom.yml retired. SBOM generation now lives in release.yml's `build` job alongside the code-pack, so SBOM + code-pack + scan output share a single anchored SHA and are co-signed in lockstep. Eliminates the drift class where SBOM and code-pack could reference different commits. The split surface is now: push:main -> release-please.yml (open/update PR, cut tag) pull_request -> pre-release-gate.yml (block merge if scans fail) workflow_call -> release.yml (inline post-tag pipeline) release:published -> release.yml (PAT-driven flow + manual) workflow_dispatch -> release.yml (operator hotfix path)
Documents the trigger model (push -> release-please-action -> PR -> gate -> merge -> tag -> release.yml builds + signs), the artifacts that ship with each release, downstream-consumer cosign + SLSA verification commands, the manual hotfix override path, and the environment configuration the pipeline expects (no long-lived secrets except GITHUB_TOKEN; cosign keyless uses OIDC; SLSA generator uses the same). Calls out two operator-facing decisions: - Optional `RELEASE_PLEASE_PAT` if you prefer one-workflow-per-concern over the workflow_call inline path. - Optional `production-release` environment for a manual approval gate before any artifact is built / signed / attached. Includes the verification recipe for slsa-verifier and cosign with worked examples of `--certificate-identity` for both the direct release.yml entry point and the release-please.yml workflow_call entry point.
- pipeline/phases/scan.ts: replace path-based `fs.stat` + `fs.readFile`
with a single `fs.open` handle, then `handle.stat()` and
`handle.readFile()`. Operations now share one file descriptor —
closes the TOCTOU window flagged by js/file-system-race.
- extract/tool-detector.ts:relaxedToJson: insert a `\\` -> `\\\\`
escape pass before escaping `"` so JS literals containing a lone
backslash (e.g. `'foo\"bar'`) no longer produce malformed JSON.
- extract/property-access.ts: drop the redundant `A-Za-z` ranges
inside `[A-Za-z_$\w]` lookbehinds — `\w` already covers them and
the overlap was tripping js/overly-large-range. Use `[\w$]` instead.
- pipeline/phases/markdown.test.ts: replace
`.includes("example.com")` with a strict
`new URL(...).hostname === "example.com"` check so a crafted
`example.com.evil.test` host could not slip past the assertion
(js/incomplete-url-substring-sanitization).
Existing 607 ingestion tests still pass.
Fixes alerts #38 #39 #40 #44 #131 from CodeQL.
- doctor.ts:registryPathCheck — drop the `access` probe and branch on `ENOENT` from the `readFile` itself, so the missing-file warn path and the read share one syscall. - setup.test.ts — replace the `stat` then `readFile` pair with a single `readFile`; existence is inferred from a non-empty body. Both paths previously opened a TOCTOU window between the existence check and the read (js/file-system-race). Existing 236 cli tests still pass. Fixes alerts #42 #43 from CodeQL.
…mitters - resources/repos.ts:yaml — escape `\` -> `\\` before escaping `"`, so a literal backslash in a registry value cannot pair with the appended `\"` to produce a malformed YAML escape. - tools/sql.ts:formatCell — escape `\` -> `\\` before escaping `|`, so a pre-existing backslash in a SQL cell value cannot combine with the appended `\|` to break the markdown table escape (e.g. `foo\|bar` rendering as `foo\` + literal pipe). Both paths previously triggered js/incomplete-sanitization. Existing 167 mcp tests still pass. Fixes alerts #36 #37 from CodeQL.
`escapePipe` previously only escaped `|` for markdown table cells, which meant a value like `foo\|bar` (literal backslash followed by pipe) became `foo\\|bar` — a `\\` escape (rendered as `\`) followed by an unescaped pipe, breaking the table layout. Escape `\` -> `\\` first, then `|` -> `\|`, so pre-existing backslashes survive intact as literal `\` and the pipe stays escaped. Fixes alert #176 from CodeQL.
The chained `replace(/\\'/g, "'").replace(/\\/g, "\\\\").replace(/"/g, '\\"')`
approach incorrectly doubled valid JS escapes like `\n` and `\t`,
turning a JS source `'foo\nbar'` into a literal `\n` in the JSON
output instead of a newline character.
Replace with `jsSingleQuotedToJsonInner`, a character-walking pass
that:
- drops the JS-only `\'` escape,
- passes JSON-recognized escapes (`\"`, `\\`, `\/`, `\b`, `\f`,
`\n`, `\r`, `\t`, `\uXXXX`) through unchanged,
- escapes a bare `"` to `\"`,
- doubles any other lone `\` so the literal backslash survives the
JSON parser.
Adds a regression test covering `\\`, `\n`, and `\"` inputs.
This refines the alert #131 (js/incomplete-sanitization) fix from
a16dcee — same defect class, more accurate fix.
5 tasks
theagenticguy
added a commit
that referenced
this pull request
May 10, 2026
…section (#87) ## Summary The OpenCodeHub Starlight docs site was deleted in PR #53 (May 4, commit `4431b53`) under T-M2-3 with the explicit promise to spin it up as `theagenticguy/opencodehub-docs`. That separate repo was never created. The site at https://theagenticguy.github.io/opencodehub/ has been serving the May 1 snapshot ever since — 28-tool / DuckDB-default / Node 20 / 14-language prose, missing every milestone since (M3-M7, Track A-D, parse-runtime flip, 20-scanner inventory, supply-chain hardening). This PR restores `packages/docs/` + `.github/workflows/pages.yml` from `4431b53^`, refreshes every page against v1 reality, adds a deep agent-friendly `agents/` section, ships a machine-readable tool catalog, hardens the workflow, and lifts `LadybugDB` out of the banned-strings policy now that it's a first-class product name. Three deep specialists ran in parallel after the bulk-restore, with one polish pass at the end. ## What's in here ### Restoration (`f801f1a`) 56 files restored from history. Build clean out of the box: 47 pages, links valid, Pagefind index, llm-nav banners. ### Content refresh (8 commits, `00a0fce` → `c0376d8`) - **Start here** — install (Node 22 or 24, mise, `codehub init`), quick-start (first MCP call), what-is-opencodehub, codehub-init, first-query — all v1. - **MCP** — `mcp/overview.md` reframes 29 tools across five families (exploration, group/federation, scan/findings/verdict, HTTP/routing, meta). `mcp/tools.md` rewritten as full per-tool catalog with when-to-use / when-not-to-use / signature / example. `mcp/resources.md` + `mcp/prompts.md` updated. - **Reference** — `cli.md` verified against `packages/cli/src/index.ts` shape; `configuration.md` env-var inventory + `AMBIGUOUS_REPO` envelope + `EMBEDDER_MISMATCH` from ADR 0014; `languages.md` 15-language table; `error-codes.md` current set. - **Architecture** — overview, monorepo-map (17 packages, dropped eval/gym, added cobol-proleap/frameworks/pack/policy/wiki), embeddings (3-backend precedence), parsing-and-resolution (WASM-default + native opt-in), determinism (graphHash invariant), scanners-and-sarif (20-scanner inventory), scip-reconciliation, supply-chain, adrs (0001-0014 index). - **New architecture pages** — `storage-backend.md` (LadybugDB + DuckDB segregation, IGraphStore/ITemporalStore, community-adapter escape hatch); `cross-repo-federation.md` (repo-as-typed-node, AMBIGUOUS_REPO, group_* tools); `lessons.md` (pointer to `.erpaval/solutions/`). - **New guides** — `migrating-from-duckdb.md` (three migration paths). - **Index hero** — splash with three CTAs (Install / Use / Develop) using Starlight `<Card>` / `<CardGrid>` — no marketing tiles. - **Sidebar IA** — Start here · Agents · MCP · Reference · Guides · Architecture · Skills · Contributing. - **astro.config llms-txt** — `description` + `details` rewritten with current 29-tool / 15-language / LadybugDB-default reality (per the durable lesson `llms-txt-as-ground-truth.md`). ### Tool catalog as data (`b112b67`) `packages/docs/public/tool-catalog.json` — machine-readable canonical catalog of all 29 tools. Schema: `{ tools: [{ name, family, description, when_to_use, when_not_to_use, signature_sketch, example }] }`. Agents can `fetch('https://theagenticguy.github.io/opencodehub/tool-catalog.json')`. ### Agents section (4 commits, `4e55203` → `3547b74`) A new `packages/docs/src/content/docs/agents/` section, 14 pages, dedicated to AI-coding-agent discovery + usage: - `agents/index.md` — section landing with 90-second setup + 5-editor card grid. - `agents/why-mcp.md` — what an agent can't see without the graph; three failure modes; four MCP tool families. - `agents/install.md` — generic install for any MCP-speaking agent: prereqs, `mise run cli:link`, `codehub init` (writes `.mcp.json` + plugin link), `codehub analyze`, `codehub doctor`, per-editor handoff. - `agents/editors/claude-code.md` — deepest editor page: `.mcp.json` shape, 5 slash commands, `code-analyst` subagent, all 11 skills tabled, `hooks.json`. - `agents/editors/cursor.md` — `.cursor/mcp.json` (project + global), absolute-path fallback, verification. - `agents/editors/codex.md` — `~/.codex/config.toml` + CLI helper, stdio-only caveat. - `agents/editors/windsurf.md` — `~/.codeium/windsurf/mcp_config.json`, restart caveat. - `agents/editors/opencode.md` — `opencode.json` with the differing key shape (`mcp` vs `mcpServers`, `command: [...]`, `environment` vs `env`). - `agents/tool-decision-matrix.md` — 21-row single-repo intent → tool table with anti-pattern column, plus 5-row group-mode table and a "When to chain" section. - `agents/idiomatic-prompts.md` — 5 paste-ready prompts (rename audit / auth-flow surfacing / HTTP contract reconstruction / findings-vs-baseline / onboarding) with target editor + expected tool calls + expected output. - `agents/discovery-and-resources.md` — site URL, `/llms.txt`, `/llms-full.txt`, `/llms-small.txt`, `/tool-catalog.json`, `AGENTS.md`, `CLAUDE.md`, registries. - `agents/registries.md` — Official MCP Registry (`server.json` shape), Smithery (`smithery.yaml` shape), Glama, awesome-mcp-servers, aggregator directories. - `agents/llms-txt-cheatsheet.md` — picking guidance for the three core bundles + custom sets. ### Banned-strings policy (`d8dddb2`) Removed `ladybug` and `kuzu` from `BANNED_LITERALS` in `scripts/check-banned-strings.sh`. LadybugDB is the default graph backend (M7) and a first-class product name in docs. The original ban dated from when the project was still deciding which graph engine to vendor; that decision shipped. `kuzu` is retained as historical lineage in cross-link prose ("the open-source successor to the pre-1.0 Kuzu codebase") which already lives in ADR 0011. ### Pages workflow hardening (`c54231d`) - `actions/checkout@v6` → `@de0fac2e...` (v6.0.2) - `jdx/mise-action@v4` → `@c37c9329...` (v2.4.4) - `actions/upload-pages-artifact@v5` → `@fc324d35...` - `actions/deploy-pages@v5` → `@cd2ce8fc...` Top-level `permissions: contents: read`; write scopes (`pages: write` + `id-token: write`) granted only on the `deploy` job. Resolves the same Token-Permissions HIGH pattern fixed in PR #78 for the other 4 workflows. ### LadybugDB polish (`3c7166b`) 38 prose substitutions across 13 files: replace awkward "the graph-database backend" workarounds with plain "LadybugDB" now that the literal is allowed. `@ladybugdb/core` (npm package) and `graph.lbug` (file extension) preserved. ## Validation - `mise run check` exit 0 — 1,339 tests across 8 packages (lint + typecheck + test + banned-strings + verdict) - `pnpm -F @opencodehub/docs build` — **64 pages built, all internal links valid**, Pagefind index ok, llm-nav banners patch all 63 .md files - `actionlint .github/workflows/*.yml` — clean - `bash scripts/check-banned-strings.sh` — PASS - `rg 'AC-[A-Z]-[0-9]|T-M[0-9]+-[0-9]+|W-[A-Z]-[0-9]+|S-[A-Z]-[0-9]+|E-[A-Z]-[0-9]+|CL-[A-Z]+|architecture-revised\.md' packages/docs/src/` — zero hits - Marketing-words sweep (`effortless`, `leverage`, `synergy`, `world-class`, `blazing-fast`, `cutting-edge`) — zero hits in docs prose ## Test plan - [ ] CI green on `docs/site-restore-v1` - [ ] After merge, the Pages workflow at `.github/workflows/pages.yml` triggers on first push to `main` (paths-filter on `packages/docs/**`) - [ ] Deployed site at https://theagenticguy.github.io/opencodehub/ replaces the May 1 snapshot - [ ] Manual verification: visit /agents/, /mcp/tools/, /tool-catalog.json - [ ] Manual verification: `/llms.txt`, `/llms-full.txt`, `/llms-small.txt` all resolve and contain "29 tools" / "LadybugDB" / "WASM" facts ## Out of scope - Submission to `skills.sh`, the official MCP Registry, Smithery, awesome-mcp-servers — research file at `.erpaval/sessions/session-05809d/research-skills-sh.md` and `.erpaval/sessions/session-05809d/research-agent-docs.md` capture the exact shape; PR-able as separate follow-ups. - Importing `.erpaval/solutions/**.md` as a Starlight content collection — investigated, deemed not worth shipping (lessons audience is the agent at edit-time, not docs readers; some lesson titles include literals the docs build's other guardrails reject). The `architecture/lessons.md` stub points readers at the directory.
theagenticguy
added a commit
that referenced
this pull request
May 10, 2026
## Summary Three deep specialists ran in parallel to: (1) close every CodeQL HIGH alert plus low-hanging mediums, (2) resolve every Scorecard HIGH `Token-Permissions` and MEDIUM `Pinned-Dependencies` alert, and (3) build a production-grade release pipeline with SLSA L3 provenance, cosign keyless signing, gated pre-release scans, and an operator runbook. ## Code-scanning alerts resolved ### CodeQL HIGH (15) — `packages/` | Class | Count | Pattern | |---|---|---| | `js/polynomial-redos` | 6 | regex tightened or replaced with deterministic `startsWith` / `charCodeAt` scans; ReDoS-prone alternations bounded | | `js/incomplete-sanitization` | 4 | escape `\\` BEFORE adding new backslashes from quote/pipe/SQL-LIKE escapes | | `js/file-system-race` | 3 | TOCTOU closed by collapsing `stat`+`read` into a single fd handle | | `js/redos` (exponential) | 1 | tightened SCIP descriptor regex char class | | `js/incomplete-url-substring-sanitization` | 1 | `new URL().hostname` in test fixture | ### CodeQL MEDIUM (low-hanging, 6) — `packages/scip-ingest/src/runners/index.ts` + `property-access.ts` - 3× `js/shell-command-*` — explicit `shell: false` in spawn calls; absolute-path resolution before exec - 1× `js/indirect-command-line-injection`, 1× `js/shell-command-injection-from-environment` — same fix - 2× `js/overly-large-range` — drop redundant `A-Z` from `[A-Za-z\\w]` ### Scorecard HIGH — Token-Permissions (4) - `codeql.yml`, `semgrep.yml`, `release-please.yml`, `sbom.yml` (sbom.yml has been retired; SBOM now lives in `release.yml` with proper job-scoped permissions). Top-level hoisted to `contents: read`; write scopes granted only on the job that needs them. ### Scorecard MEDIUM — Pinned-Dependencies (39) Every `uses:` across all 9 workflows pinned to a 40-char SHA with a trailing `# vX.Y.Z` comment. Dependabot updated to group all `github-actions` SHA bumps into a single weekly PR. `npm i -g node-gyp` pinned to `node-gyp@12.3.0`. Single documented exception: `slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@v2.1.0` is intentionally tag-pinned per the SLSA project's trust model (the trusted-builder protocol verifies the tag boundary; SHA-pinning short-circuits SLSA's provenance chain). Documented inline + in `RELEASE.md`. ## Release pipeline (new) ### `.github/workflows/release.yml` (new, 351 lines) Triggered by `release: [published]` and via `workflow_call` from `release-please.yml`. Job graph: 1. **`build`** — pnpm install + `pnpm -r build` + CycloneDX SBOM 1.5 + OCH analyze + OCH code-pack on the released SHA 2. **`scan`** — OCH self-scan + SARIF upload to code-scanning under category `release` 3. **`sign`** — cosign keyless (Sigstore OIDC) signs every artifact (SBOM, code-pack, attestations) and emits `.sig.bundle` files 4. **`provenance`** — SLSA L3 generator-generic-slsa3 reusable workflow emits `slsa.intoto.jsonl` 5. **`upload`** — attaches everything to the GitHub release in lockstep 6. **`publish`** (gated, off by default) — `vars.OCH_NPM_PUBLISH_ENABLED == 'true'` operator switch for future npm publish of `@opencodehub/cli` and `@opencodehub/mcp` ### `.github/workflows/pre-release-gate.yml` (new) Triggers on the release-please PR (`head_ref: ^release-please--`). Adds release-time scans not in the regular CI (npm-audit at high+, lockfile integrity, detect-secrets full sweep, license re-assert) plus an `if: always()` aggregator suitable as a required status check before the release PR can merge. ### `.github/workflows/release-please.yml` (refactor) Reduced to: run `release-please-action` → on `release_created`, hand off via `workflow_call` to `release.yml`. Sidesteps the durable-lesson finding that default `GITHUB_TOKEN` does not fire downstream `release: [published]` events. `release.yml` also listens on `release: published` so PAT-driven and manual `gh release create` flows still trigger the same pipeline. ### `.github/workflows/sbom.yml` (deleted) Consolidated into `release.yml`'s `build` job — SBOM, code-pack, and SARIF now share one anchored SHA. ### `docs/RELEASE.md` (new, 271 lines) Operator runbook: trigger model · asset inventory · cosign + slsa-verifier verification commands · manual hotfix path · environment configuration (cosign keyless requires only OIDC — no secrets to provision). ## Validation - `mise run check` exit 0 (lint + typecheck + 1,339 tests across 8 packages + banned-strings + verdict) - `bash scripts/smoke-mcp.sh` → PASS (29 tools) - `actionlint .github/workflows/*.yml` clean - All 11 YAML files (`workflows/` + `dependabot.yml`) parse via PyYAML - `rg 'AC-[A-Z]-[0-9]' packages/ scripts/` empty (zero spec-coordinate leakage) - Per-package test counts: analysis 127, cli 236, embedder 79+1skip, frameworks 86, ingestion 607, mcp 167, scip-ingest 58, wiki 15 ## Test plan - [ ] CI green on `chore/security-and-release-hardening` - [ ] CodeQL re-scan on PR shows 15 HIGH alerts fixed (CodeQL re-runs automatically; existing alerts auto-close on the next push to main) - [ ] Scorecard re-scan shows TokenPermissions HIGH alerts cleared (next weekly cron) - [ ] After merge, the next `release-please-action` PR exercises the new `pre-release-gate.yml` - [ ] After the next tag, manually verify cosign + slsa-verifier per `docs/RELEASE.md` instructions ## Out of scope - Scorecard `BinaryArtifactsID` (3) — vendored Tree-sitter `.wasm` blobs are intentional and reproducibly built (`scripts/build-vendor-wasms.sh`); no fix. - Scorecard `MaintainedID` / `CodeReviewID` / `VulnerabilitiesID` / `CIIBestPracticesID` / `FuzzingID` / `SASTID` — repo-meta signals, not code fixes; release-grade workflow + Scorecard re-run will improve these naturally. - CodeQL `note` severity (2 alerts) — out of scope per request.
theagenticguy
added a commit
that referenced
this pull request
May 10, 2026
…section (#87) ## Summary The OpenCodeHub Starlight docs site was deleted in PR #53 (May 4, commit `ecc86a3`) under T-M2-3 with the explicit promise to spin it up as `theagenticguy/opencodehub-docs`. That separate repo was never created. The site at https://theagenticguy.github.io/opencodehub/ has been serving the May 1 snapshot ever since — 28-tool / DuckDB-default / Node 20 / 14-language prose, missing every milestone since (M3-M7, Track A-D, parse-runtime flip, 20-scanner inventory, supply-chain hardening). This PR restores `packages/docs/` + `.github/workflows/pages.yml` from `ecc86a3^`, refreshes every page against v1 reality, adds a deep agent-friendly `agents/` section, ships a machine-readable tool catalog, hardens the workflow, and lifts `LadybugDB` out of the banned-strings policy now that it's a first-class product name. Three deep specialists ran in parallel after the bulk-restore, with one polish pass at the end. ## What's in here ### Restoration (`d393ecf`) 56 files restored from history. Build clean out of the box: 47 pages, links valid, Pagefind index, llm-nav banners. ### Content refresh (8 commits, `3148769` → `1eb333d`) - **Start here** — install (Node 22 or 24, mise, `codehub init`), quick-start (first MCP call), what-is-opencodehub, codehub-init, first-query — all v1. - **MCP** — `mcp/overview.md` reframes 29 tools across five families (exploration, group/federation, scan/findings/verdict, HTTP/routing, meta). `mcp/tools.md` rewritten as full per-tool catalog with when-to-use / when-not-to-use / signature / example. `mcp/resources.md` + `mcp/prompts.md` updated. - **Reference** — `cli.md` verified against `packages/cli/src/index.ts` shape; `configuration.md` env-var inventory + `AMBIGUOUS_REPO` envelope + `EMBEDDER_MISMATCH` from ADR 0014; `languages.md` 15-language table; `error-codes.md` current set. - **Architecture** — overview, monorepo-map (17 packages, dropped eval/gym, added cobol-proleap/frameworks/pack/policy/wiki), embeddings (3-backend precedence), parsing-and-resolution (WASM-default + native opt-in), determinism (graphHash invariant), scanners-and-sarif (20-scanner inventory), scip-reconciliation, supply-chain, adrs (0001-0014 index). - **New architecture pages** — `storage-backend.md` (LadybugDB + DuckDB segregation, IGraphStore/ITemporalStore, community-adapter escape hatch); `cross-repo-federation.md` (repo-as-typed-node, AMBIGUOUS_REPO, group_* tools); `lessons.md` (pointer to `.erpaval/solutions/`). - **New guides** — `migrating-from-duckdb.md` (three migration paths). - **Index hero** — splash with three CTAs (Install / Use / Develop) using Starlight `<Card>` / `<CardGrid>` — no marketing tiles. - **Sidebar IA** — Start here · Agents · MCP · Reference · Guides · Architecture · Skills · Contributing. - **astro.config llms-txt** — `description` + `details` rewritten with current 29-tool / 15-language / LadybugDB-default reality (per the durable lesson `llms-txt-as-ground-truth.md`). ### Tool catalog as data (`b3aed17`) `packages/docs/public/tool-catalog.json` — machine-readable canonical catalog of all 29 tools. Schema: `{ tools: [{ name, family, description, when_to_use, when_not_to_use, signature_sketch, example }] }`. Agents can `fetch('https://theagenticguy.github.io/opencodehub/tool-catalog.json')`. ### Agents section (4 commits, `c771f40` → `473cb82`) A new `packages/docs/src/content/docs/agents/` section, 14 pages, dedicated to AI-coding-agent discovery + usage: - `agents/index.md` — section landing with 90-second setup + 5-editor card grid. - `agents/why-mcp.md` — what an agent can't see without the graph; three failure modes; four MCP tool families. - `agents/install.md` — generic install for any MCP-speaking agent: prereqs, `mise run cli:link`, `codehub init` (writes `.mcp.json` + plugin link), `codehub analyze`, `codehub doctor`, per-editor handoff. - `agents/editors/claude-code.md` — deepest editor page: `.mcp.json` shape, 5 slash commands, `code-analyst` subagent, all 11 skills tabled, `hooks.json`. - `agents/editors/cursor.md` — `.cursor/mcp.json` (project + global), absolute-path fallback, verification. - `agents/editors/codex.md` — `~/.codex/config.toml` + CLI helper, stdio-only caveat. - `agents/editors/windsurf.md` — `~/.codeium/windsurf/mcp_config.json`, restart caveat. - `agents/editors/opencode.md` — `opencode.json` with the differing key shape (`mcp` vs `mcpServers`, `command: [...]`, `environment` vs `env`). - `agents/tool-decision-matrix.md` — 21-row single-repo intent → tool table with anti-pattern column, plus 5-row group-mode table and a "When to chain" section. - `agents/idiomatic-prompts.md` — 5 paste-ready prompts (rename audit / auth-flow surfacing / HTTP contract reconstruction / findings-vs-baseline / onboarding) with target editor + expected tool calls + expected output. - `agents/discovery-and-resources.md` — site URL, `/llms.txt`, `/llms-full.txt`, `/llms-small.txt`, `/tool-catalog.json`, `AGENTS.md`, `CLAUDE.md`, registries. - `agents/registries.md` — Official MCP Registry (`server.json` shape), Smithery (`smithery.yaml` shape), Glama, awesome-mcp-servers, aggregator directories. - `agents/llms-txt-cheatsheet.md` — picking guidance for the three core bundles + custom sets. ### Banned-strings policy (`a85a8f4`) Removed `ladybug` and `kuzu` from `BANNED_LITERALS` in `scripts/check-banned-strings.sh`. LadybugDB is the default graph backend (M7) and a first-class product name in docs. The original ban dated from when the project was still deciding which graph engine to vendor; that decision shipped. `kuzu` is retained as historical lineage in cross-link prose ("the open-source successor to the pre-1.0 Kuzu codebase") which already lives in ADR 0011. ### Pages workflow hardening (`808d97f`) - `actions/checkout@v6` → `@de0fac2e...` (v6.0.2) - `jdx/mise-action@v4` → `@c37c9329...` (v2.4.4) - `actions/upload-pages-artifact@v5` → `@fc324d35...` - `actions/deploy-pages@v5` → `@cd2ce8fc...` Top-level `permissions: contents: read`; write scopes (`pages: write` + `id-token: write`) granted only on the `deploy` job. Resolves the same Token-Permissions HIGH pattern fixed in PR #78 for the other 4 workflows. ### LadybugDB polish (`3d78ab8`) 38 prose substitutions across 13 files: replace awkward "the graph-database backend" workarounds with plain "LadybugDB" now that the literal is allowed. `@ladybugdb/core` (npm package) and `graph.lbug` (file extension) preserved. ## Validation - `mise run check` exit 0 — 1,339 tests across 8 packages (lint + typecheck + test + banned-strings + verdict) - `pnpm -F @opencodehub/docs build` — **64 pages built, all internal links valid**, Pagefind index ok, llm-nav banners patch all 63 .md files - `actionlint .github/workflows/*.yml` — clean - `bash scripts/check-banned-strings.sh` — PASS - `rg 'AC-[A-Z]-[0-9]|T-M[0-9]+-[0-9]+|W-[A-Z]-[0-9]+|S-[A-Z]-[0-9]+|E-[A-Z]-[0-9]+|CL-[A-Z]+|architecture-revised\.md' packages/docs/src/` — zero hits - Marketing-words sweep (`effortless`, `leverage`, `synergy`, `world-class`, `blazing-fast`, `cutting-edge`) — zero hits in docs prose ## Test plan - [ ] CI green on `docs/site-restore-v1` - [ ] After merge, the Pages workflow at `.github/workflows/pages.yml` triggers on first push to `main` (paths-filter on `packages/docs/**`) - [ ] Deployed site at https://theagenticguy.github.io/opencodehub/ replaces the May 1 snapshot - [ ] Manual verification: visit /agents/, /mcp/tools/, /tool-catalog.json - [ ] Manual verification: `/llms.txt`, `/llms-full.txt`, `/llms-small.txt` all resolve and contain "29 tools" / "LadybugDB" / "WASM" facts ## Out of scope - Submission to `skills.sh`, the official MCP Registry, Smithery, awesome-mcp-servers — research file at `.erpaval/sessions/session-05809d/research-skills-sh.md` and `.erpaval/sessions/session-05809d/research-agent-docs.md` capture the exact shape; PR-able as separate follow-ups. - Importing `.erpaval/solutions/**.md` as a Starlight content collection — investigated, deemed not worth shipping (lessons audience is the agent at edit-time, not docs readers; some lesson titles include literals the docs build's other guardrails reject). The `architecture/lessons.md` stub points readers at the directory.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three deep specialists ran in parallel to: (1) close every CodeQL HIGH alert plus low-hanging mediums, (2) resolve every Scorecard HIGH
Token-Permissionsand MEDIUMPinned-Dependenciesalert, and (3) build a production-grade release pipeline with SLSA L3 provenance, cosign keyless signing, gated pre-release scans, and an operator runbook.Code-scanning alerts resolved
CodeQL HIGH (15) —
packages/js/polynomial-redosstartsWith/charCodeAtscans; ReDoS-prone alternations boundedjs/incomplete-sanitization\\BEFORE adding new backslashes from quote/pipe/SQL-LIKE escapesjs/file-system-racestat+readinto a single fd handlejs/redos(exponential)js/incomplete-url-substring-sanitizationnew URL().hostnamein test fixtureCodeQL MEDIUM (low-hanging, 6) —
packages/scip-ingest/src/runners/index.ts+property-access.tsjs/shell-command-*— explicitshell: falsein spawn calls; absolute-path resolution before execjs/indirect-command-line-injection, 1×js/shell-command-injection-from-environment— same fixjs/overly-large-range— drop redundantA-Zfrom[A-Za-z\\w]Scorecard HIGH — Token-Permissions (4)
codeql.yml,semgrep.yml,release-please.yml,sbom.yml(sbom.yml has been retired; SBOM now lives inrelease.ymlwith proper job-scoped permissions). Top-level hoisted tocontents: read; write scopes granted only on the job that needs them.Scorecard MEDIUM — Pinned-Dependencies (39)
Every
uses:across all 9 workflows pinned to a 40-char SHA with a trailing# vX.Y.Zcomment. Dependabot updated to group allgithub-actionsSHA bumps into a single weekly PR.npm i -g node-gyppinned tonode-gyp@12.3.0.Single documented exception:
slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@v2.1.0is intentionally tag-pinned per the SLSA project's trust model (the trusted-builder protocol verifies the tag boundary; SHA-pinning short-circuits SLSA's provenance chain). Documented inline + inRELEASE.md.Release pipeline (new)
.github/workflows/release.yml(new, 351 lines)Triggered by
release: [published]and viaworkflow_callfromrelease-please.yml. Job graph:build— pnpm install +pnpm -r build+ CycloneDX SBOM 1.5 + OCH analyze + OCH code-pack on the released SHAscan— OCH self-scan + SARIF upload to code-scanning under categoryreleasesign— cosign keyless (Sigstore OIDC) signs every artifact (SBOM, code-pack, attestations) and emits.sig.bundlefilesprovenance— SLSA L3 generator-generic-slsa3 reusable workflow emitsslsa.intoto.jsonlupload— attaches everything to the GitHub release in locksteppublish(gated, off by default) —vars.OCH_NPM_PUBLISH_ENABLED == 'true'operator switch for future npm publish of@opencodehub/cliand@opencodehub/mcp.github/workflows/pre-release-gate.yml(new)Triggers on the release-please PR (
head_ref: ^release-please--). Adds release-time scans not in the regular CI (npm-audit at high+, lockfile integrity, detect-secrets full sweep, license re-assert) plus anif: always()aggregator suitable as a required status check before the release PR can merge..github/workflows/release-please.yml(refactor)Reduced to: run
release-please-action→ onrelease_created, hand off viaworkflow_calltorelease.yml. Sidesteps the durable-lesson finding that defaultGITHUB_TOKENdoes not fire downstreamrelease: [published]events.release.ymlalso listens onrelease: publishedso PAT-driven and manualgh release createflows still trigger the same pipeline..github/workflows/sbom.yml(deleted)Consolidated into
release.yml'sbuildjob — SBOM, code-pack, and SARIF now share one anchored SHA.docs/RELEASE.md(new, 271 lines)Operator runbook: trigger model · asset inventory · cosign + slsa-verifier verification commands · manual hotfix path · environment configuration (cosign keyless requires only OIDC — no secrets to provision).
Validation
mise run checkexit 0 (lint + typecheck + 1,339 tests across 8 packages + banned-strings + verdict)bash scripts/smoke-mcp.sh→ PASS (29 tools)actionlint .github/workflows/*.ymlcleanworkflows/+dependabot.yml) parse via PyYAMLrg 'AC-[A-Z]-[0-9]' packages/ scripts/empty (zero spec-coordinate leakage)Test plan
chore/security-and-release-hardeningrelease-please-actionPR exercises the newpre-release-gate.ymldocs/RELEASE.mdinstructionsOut of scope
BinaryArtifactsID(3) — vendored Tree-sitter.wasmblobs are intentional and reproducibly built (scripts/build-vendor-wasms.sh); no fix.MaintainedID/CodeReviewID/VulnerabilitiesID/CIIBestPracticesID/FuzzingID/SASTID— repo-meta signals, not code fixes; release-grade workflow + Scorecard re-run will improve these naturally.noteseverity (2 alerts) — out of scope per request.