chore: security alerts + supply-chain release hardening by theagenticguy · Pull Request #78 · theagenticguy/opencodehub

theagenticguy · 2026-05-10T17:35:24Z

Summary

Three deep specialists ran in parallel to: (1) close every CodeQL HIGH alert plus low-hanging mediums, (2) resolve every Scorecard HIGH Token-Permissions and MEDIUM Pinned-Dependencies alert, and (3) build a production-grade release pipeline with SLSA L3 provenance, cosign keyless signing, gated pre-release scans, and an operator runbook.

Code-scanning alerts resolved

CodeQL HIGH (15) — `packages/`

Class	Count	Pattern
`js/polynomial-redos`	6	regex tightened or replaced with deterministic `startsWith` / `charCodeAt` scans; ReDoS-prone alternations bounded
`js/incomplete-sanitization`	4	escape `\\` BEFORE adding new backslashes from quote/pipe/SQL-LIKE escapes
`js/file-system-race`	3	TOCTOU closed by collapsing `stat`+`read` into a single fd handle
`js/redos` (exponential)	1	tightened SCIP descriptor regex char class
`js/incomplete-url-substring-sanitization`	1	`new URL().hostname` in test fixture

CodeQL MEDIUM (low-hanging, 6) — `packages/scip-ingest/src/runners/index.ts` + `property-access.ts`

3× js/shell-command-* — explicit shell: false in spawn calls; absolute-path resolution before exec
1× js/indirect-command-line-injection, 1× js/shell-command-injection-from-environment — same fix
2× js/overly-large-range — drop redundant A-Z from [A-Za-z\\w]

Scorecard HIGH — Token-Permissions (4)

codeql.yml, semgrep.yml, release-please.yml, sbom.yml (sbom.yml has been retired; SBOM now lives in release.yml with proper job-scoped permissions). Top-level hoisted to contents: read; write scopes granted only on the job that needs them.

Scorecard MEDIUM — Pinned-Dependencies (39)

Every uses: across all 9 workflows pinned to a 40-char SHA with a trailing # vX.Y.Z comment. Dependabot updated to group all github-actions SHA bumps into a single weekly PR. npm i -g node-gyp pinned to node-gyp@12.3.0.

Single documented exception: slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@v2.1.0 is intentionally tag-pinned per the SLSA project's trust model (the trusted-builder protocol verifies the tag boundary; SHA-pinning short-circuits SLSA's provenance chain). Documented inline + in RELEASE.md.

Release pipeline (new)

`.github/workflows/release.yml` (new, 351 lines)

Triggered by release: [published] and via workflow_call from release-please.yml. Job graph:

build — pnpm install + pnpm -r build + CycloneDX SBOM 1.5 + OCH analyze + OCH code-pack on the released SHA
scan — OCH self-scan + SARIF upload to code-scanning under category release
sign — cosign keyless (Sigstore OIDC) signs every artifact (SBOM, code-pack, attestations) and emits .sig.bundle files
provenance — SLSA L3 generator-generic-slsa3 reusable workflow emits slsa.intoto.jsonl
upload — attaches everything to the GitHub release in lockstep
publish (gated, off by default) — vars.OCH_NPM_PUBLISH_ENABLED == 'true' operator switch for future npm publish of @opencodehub/cli and @opencodehub/mcp

`.github/workflows/pre-release-gate.yml` (new)

Triggers on the release-please PR (head_ref: ^release-please--). Adds release-time scans not in the regular CI (npm-audit at high+, lockfile integrity, detect-secrets full sweep, license re-assert) plus an if: always() aggregator suitable as a required status check before the release PR can merge.

`.github/workflows/release-please.yml` (refactor)

Reduced to: run release-please-action → on release_created, hand off via workflow_call to release.yml. Sidesteps the durable-lesson finding that default GITHUB_TOKEN does not fire downstream release: [published] events. release.yml also listens on release: published so PAT-driven and manual gh release create flows still trigger the same pipeline.

`.github/workflows/sbom.yml` (deleted)

Consolidated into release.yml's build job — SBOM, code-pack, and SARIF now share one anchored SHA.

`docs/RELEASE.md` (new, 271 lines)

Operator runbook: trigger model · asset inventory · cosign + slsa-verifier verification commands · manual hotfix path · environment configuration (cosign keyless requires only OIDC — no secrets to provision).

Validation

mise run check exit 0 (lint + typecheck + 1,339 tests across 8 packages + banned-strings + verdict)
bash scripts/smoke-mcp.sh → PASS (29 tools)
actionlint .github/workflows/*.yml clean
All 11 YAML files (workflows/ + dependabot.yml) parse via PyYAML
rg 'AC-[A-Z]-[0-9]' packages/ scripts/ empty (zero spec-coordinate leakage)
Per-package test counts: analysis 127, cli 236, embedder 79+1skip, frameworks 86, ingestion 607, mcp 167, scip-ingest 58, wiki 15

Test plan

CI green on chore/security-and-release-hardening
CodeQL re-scan on PR shows 15 HIGH alerts fixed (CodeQL re-runs automatically; existing alerts auto-close on the next push to main)
Scorecard re-scan shows TokenPermissions HIGH alerts cleared (next weekly cron)
After merge, the next release-please-action PR exercises the new pre-release-gate.yml
After the next tag, manually verify cosign + slsa-verifier per docs/RELEASE.md instructions

Out of scope

Scorecard BinaryArtifactsID (3) — vendored Tree-sitter .wasm blobs are intentional and reproducibly built (scripts/build-vendor-wasms.sh); no fix.
Scorecard MaintainedID / CodeReviewID / VulnerabilitiesID / CIIBestPracticesID / FuzzingID / SASTID — repo-meta signals, not code fixes; release-grade workflow + Scorecard re-run will improve these naturally.
CodeQL note severity (2 alerts) — out of scope per request.

- git.ts: replace `^\+\+\+\s+(?:b\/)?(.+)$` regex with non-regex startsWith + slice scan so `+++\t\t\t...` lines cannot trigger polynomial backtracking. - http-patterns.ts:normalizeHttpPath: replace `\?.*$` and `\/+$` with deterministic indexOf/charCodeAt loops. - http-patterns.ts:PY_ROUTE_DECORATOR_RE: cap the path and methods literals at 256 chars; the unbounded `+` quantifier is what made the regex slow on `@A.route("!",methods=[\\...`. Behaviour preserved: same set of matched paths, same hunk parser contract. Existing analysis tests (127) still pass. Fixes alerts #41 #119 #120 from CodeQL.

The `cfg.endpointUrl.replace(/\/+$/, "")` call trimmed trailing slashes via a regex that runs polynomial-time on inputs with many `/` characters. Replace with a character-by-character loop using `charCodeAt` — same result, deterministic worst case. Fixes alert #121 from CodeQL.

The yarn.lock entry regex `^"?([^"\s@][^"\s]*)@[^"\n]*"?:\s*$` had an inner char class `[^"\s]*` that overlapped with the trailing `@` delimiter, so an input like `!@@@@@@@@@@` would let the regex backtrack across every `@` looking for a match. Tighten the inner class to `[^"\s@]*` so the engine commits to the first `@` it sees. Behaviour is unchanged for valid yarn.lock entries — the original regex already forbade `@` in the package-name leading character, and unscoped names never contain `@` mid-string. Fixes alert #180 from CodeQL.

@v4

Resolves the Scorecard `Pinned-Dependencies` MEDIUM alerts by replacing every `uses: <action>@<tag>` reference with a SHA-pinned form plus a trailing comment carrying the original tag for human readability. The trailing comment is also what Dependabot rewrites on weekly SHA bumps. Tag-to-SHA mapping (resolved via `gh api /repos/<owner>/<repo>/commits/<tag>`): actions/checkout@v6 -> de0fac2e4500dabe0009e67214ff5f5447ce83dd actions/upload-artifact@v7 -> 043fb46d1a93c77aae656e7c1c64a875d1fc6a0a jdx/mise-action@v4 -> 1648a7812b9aeae629881980618f079932869151 github/codeql-action/* @v4 -> 68bde559dea0fdcac2102bfdf6230c5f70eb485e ossf/scorecard-action@v2.4.3 -> 4eaacf0543bb3f2c246792bd56e8cdeffafb205a Files touched: ci.yml, codeql.yml, commitlint.yml, och-self-scan.yml, osv.yml, scorecard.yml, semgrep.yml. release-please.yml is being rewritten in parallel by the release-hardening track and already carries SHA pins as part of that rewrite.

Resolves Scorecard `Token-Permissions` HIGH alerts by demoting the top-level workflow scope to `contents: read` and lifting the write-scopes onto the single job that needs them. CodeQL's analyze job keeps `security-events: write` for the SARIF upload; semgrep's job keeps the same plus `contents: read`. Same effective permissions, but any unrelated step in either workflow now runs read-only. Files: codeql.yml, semgrep.yml. Out of scope here: - sbom.yml — file removed in the parallel release-hardening track (SBOM generation moved into the new release.yml). - release-please.yml — rewritten in the parallel release-hardening track with the same hoist already applied.

…ign signing Single tag-triggered workflow that anchors every job to the released commit SHA. Listens on `release: published`, `workflow_call`, and `workflow_dispatch` so it works with default GITHUB_TOKEN (via inline workflow_call from release-please.yml), with a PAT-driven release-please publish, and as a manual hotfix path. Each release ships: - opencodehub-pack.tar.gz (deterministic 100k-token code-pack BOM) - SBOM.cdx.json (CycloneDX 1.5) - och-scan.sarif (OCH self-scan at the released SHA) - *.sig.bundle (cosign keyless Sigstore bundles for each blob) Top-level permissions are read-only; per-job grants escalate where strictly required (id-token: write for OIDC -> Fulcio + SLSA, contents: write for release uploads, security-events: write for SARIF upload). npm-publish job is gated by OCH_NPM_PUBLISH_ENABLED repo variable so the dry-run scaffolding stays inert until packages flip to public. All third-party actions pinned to commit SHAs with version comments; the SLSA generator reusable workflow is the single tag-pinned exception (the SLSA project's trust model relies on the tag).

…tion Adds the release-time-only checks that don't belong in everyday CI: - npm-audit at high+ severity - pnpm lockfile integrity (--frozen-lockfile --ignore-scripts) - detect-secrets full sweep against .secrets.baseline - license allowlist re-assertion Each job is gated `if: startsWith(github.head_ref, 'release-please--')` so non-release PRs are no-ops. The aggregator job (`pre-release-gate`) runs `if: always()` and treats skipped dependencies as pass — so the required-status-check name resolves uniformly on every PR while actually gating only release-please PRs. Configure branch protection on main to require the `Pre-release gate (aggregate)` job. Documented in docs/RELEASE.md.

Re-apply the analysis-package changes from 050acd7 that were lost when c47286d (the parallel ci-pinning track) committed an old tree snapshot. - git.ts: replace the `+++` header regex with non-regex startsWith + slice scan so polynomial backtracking on tab-padded diff headers is impossible. - http-patterns.ts:normalizeHttpPath: replace `\?.*$` and `\/+$` with deterministic indexOf/charCodeAt loops. - http-patterns.ts:PY_ROUTE_DECORATOR_RE: cap path and methods literals at 256 chars to bound regex work. Behaviour preserved; existing analysis tests (127) still pass. Fixes alerts #41 #119 #120 from CodeQL.

Two changes wired together: 1. release-please.yml hands off to release.yml via uses / workflow_call after `release_created` is true, instead of inlining the artifact pipeline. This sidesteps the GITHUB_TOKEN downstream-event suppression rule (default token does NOT fire downstream `release: published` events). The inline call works regardless of token type. 2. sbom.yml retired. SBOM generation now lives in release.yml's `build` job alongside the code-pack, so SBOM + code-pack + scan output share a single anchored SHA and are co-signed in lockstep. Eliminates the drift class where SBOM and code-pack could reference different commits. The split surface is now: push:main -> release-please.yml (open/update PR, cut tag) pull_request -> pre-release-gate.yml (block merge if scans fail) workflow_call -> release.yml (inline post-tag pipeline) release:published -> release.yml (PAT-driven flow + manual) workflow_dispatch -> release.yml (operator hotfix path)

Documents the trigger model (push -> release-please-action -> PR -> gate -> merge -> tag -> release.yml builds + signs), the artifacts that ship with each release, downstream-consumer cosign + SLSA verification commands, the manual hotfix override path, and the environment configuration the pipeline expects (no long-lived secrets except GITHUB_TOKEN; cosign keyless uses OIDC; SLSA generator uses the same). Calls out two operator-facing decisions: - Optional `RELEASE_PLEASE_PAT` if you prefer one-workflow-per-concern over the workflow_call inline path. - Optional `production-release` environment for a manual approval gate before any artifact is built / signed / attached. Includes the verification recipe for slsa-verifier and cosign with worked examples of `--certificate-identity` for both the direct release.yml entry point and the release-please.yml workflow_call entry point.

- pipeline/phases/scan.ts: replace path-based `fs.stat` + `fs.readFile` with a single `fs.open` handle, then `handle.stat()` and `handle.readFile()`. Operations now share one file descriptor — closes the TOCTOU window flagged by js/file-system-race. - extract/tool-detector.ts:relaxedToJson: insert a `\\` -> `\\\\` escape pass before escaping `"` so JS literals containing a lone backslash (e.g. `'foo\"bar'`) no longer produce malformed JSON. - extract/property-access.ts: drop the redundant `A-Za-z` ranges inside `[A-Za-z_$\w]` lookbehinds — `\w` already covers them and the overlap was tripping js/overly-large-range. Use `[\w$]` instead. - pipeline/phases/markdown.test.ts: replace `.includes("example.com")` with a strict `new URL(...).hostname === "example.com"` check so a crafted `example.com.evil.test` host could not slip past the assertion (js/incomplete-url-substring-sanitization). Existing 607 ingestion tests still pass. Fixes alerts #38 #39 #40 #44 #131 from CodeQL.

- doctor.ts:registryPathCheck — drop the `access` probe and branch on `ENOENT` from the `readFile` itself, so the missing-file warn path and the read share one syscall. - setup.test.ts — replace the `stat` then `readFile` pair with a single `readFile`; existence is inferred from a non-empty body. Both paths previously opened a TOCTOU window between the existence check and the read (js/file-system-race). Existing 236 cli tests still pass. Fixes alerts #42 #43 from CodeQL.

…mitters - resources/repos.ts:yaml — escape `\` -> `\\` before escaping `"`, so a literal backslash in a registry value cannot pair with the appended `\"` to produce a malformed YAML escape. - tools/sql.ts:formatCell — escape `\` -> `\\` before escaping `|`, so a pre-existing backslash in a SQL cell value cannot combine with the appended `\|` to break the markdown table escape (e.g. `foo\|bar` rendering as `foo\` + literal pipe). Both paths previously triggered js/incomplete-sanitization. Existing 167 mcp tests still pass. Fixes alerts #36 #37 from CodeQL.

`escapePipe` previously only escaped `|` for markdown table cells, which meant a value like `foo\|bar` (literal backslash followed by pipe) became `foo\\|bar` — a `\\` escape (rendered as `\`) followed by an unescaped pipe, breaking the table layout. Escape `\` -> `\\` first, then `|` -> `\|`, so pre-existing backslashes survive intact as literal `\` and the pipe stays escaped. Fixes alert #176 from CodeQL.

The chained `replace(/\\'/g, "'").replace(/\\/g, "\\\\").replace(/"/g, '\\"')` approach incorrectly doubled valid JS escapes like `\n` and `\t`, turning a JS source `'foo\nbar'` into a literal `\n` in the JSON output instead of a newline character. Replace with `jsSingleQuotedToJsonInner`, a character-walking pass that: - drops the JS-only `\'` escape, - passes JSON-recognized escapes (`\"`, `\\`, `\/`, `\b`, `\f`, `\n`, `\r`, `\t`, `\uXXXX`) through unchanged, - escapes a bare `"` to `\"`, - doubles any other lone `\` so the literal backslash survives the JSON parser. Adds a regression test covering `\\`, `\n`, and `\"` inputs. This refines the alert #131 (js/incomplete-sanitization) fix from a16dcee — same defect class, more accurate fix.

…section (#87) ## Summary The OpenCodeHub Starlight docs site was deleted in PR #53 (May 4, commit `4431b53`) under T-M2-3 with the explicit promise to spin it up as `theagenticguy/opencodehub-docs`. That separate repo was never created. The site at https://theagenticguy.github.io/opencodehub/ has been serving the May 1 snapshot ever since — 28-tool / DuckDB-default / Node 20 / 14-language prose, missing every milestone since (M3-M7, Track A-D, parse-runtime flip, 20-scanner inventory, supply-chain hardening). This PR restores `packages/docs/` + `.github/workflows/pages.yml` from `4431b53^`, refreshes every page against v1 reality, adds a deep agent-friendly `agents/` section, ships a machine-readable tool catalog, hardens the workflow, and lifts `LadybugDB` out of the banned-strings policy now that it's a first-class product name. Three deep specialists ran in parallel after the bulk-restore, with one polish pass at the end. ## What's in here ### Restoration (`f801f1a`) 56 files restored from history. Build clean out of the box: 47 pages, links valid, Pagefind index, llm-nav banners. ### Content refresh (8 commits, `00a0fce` → `c0376d8`) - **Start here** — install (Node 22 or 24, mise, `codehub init`), quick-start (first MCP call), what-is-opencodehub, codehub-init, first-query — all v1. - **MCP** — `mcp/overview.md` reframes 29 tools across five families (exploration, group/federation, scan/findings/verdict, HTTP/routing, meta). `mcp/tools.md` rewritten as full per-tool catalog with when-to-use / when-not-to-use / signature / example. `mcp/resources.md` + `mcp/prompts.md` updated. - **Reference** — `cli.md` verified against `packages/cli/src/index.ts` shape; `configuration.md` env-var inventory + `AMBIGUOUS_REPO` envelope + `EMBEDDER_MISMATCH` from ADR 0014; `languages.md` 15-language table; `error-codes.md` current set. - **Architecture** — overview, monorepo-map (17 packages, dropped eval/gym, added cobol-proleap/frameworks/pack/policy/wiki), embeddings (3-backend precedence), parsing-and-resolution (WASM-default + native opt-in), determinism (graphHash invariant), scanners-and-sarif (20-scanner inventory), scip-reconciliation, supply-chain, adrs (0001-0014 index). - **New architecture pages** — `storage-backend.md` (LadybugDB + DuckDB segregation, IGraphStore/ITemporalStore, community-adapter escape hatch); `cross-repo-federation.md` (repo-as-typed-node, AMBIGUOUS_REPO, group_* tools); `lessons.md` (pointer to `.erpaval/solutions/`). - **New guides** — `migrating-from-duckdb.md` (three migration paths). - **Index hero** — splash with three CTAs (Install / Use / Develop) using Starlight `<Card>` / `<CardGrid>` — no marketing tiles. - **Sidebar IA** — Start here · Agents · MCP · Reference · Guides · Architecture · Skills · Contributing. - **astro.config llms-txt** — `description` + `details` rewritten with current 29-tool / 15-language / LadybugDB-default reality (per the durable lesson `llms-txt-as-ground-truth.md`). ### Tool catalog as data (`b112b67`) `packages/docs/public/tool-catalog.json` — machine-readable canonical catalog of all 29 tools. Schema: `{ tools: [{ name, family, description, when_to_use, when_not_to_use, signature_sketch, example }] }`. Agents can `fetch('https://theagenticguy.github.io/opencodehub/tool-catalog.json')`. ### Agents section (4 commits, `4e55203` → `3547b74`) A new `packages/docs/src/content/docs/agents/` section, 14 pages, dedicated to AI-coding-agent discovery + usage: - `agents/index.md` — section landing with 90-second setup + 5-editor card grid. - `agents/why-mcp.md` — what an agent can't see without the graph; three failure modes; four MCP tool families. - `agents/install.md` — generic install for any MCP-speaking agent: prereqs, `mise run cli:link`, `codehub init` (writes `.mcp.json` + plugin link), `codehub analyze`, `codehub doctor`, per-editor handoff. - `agents/editors/claude-code.md` — deepest editor page: `.mcp.json` shape, 5 slash commands, `code-analyst` subagent, all 11 skills tabled, `hooks.json`. - `agents/editors/cursor.md` — `.cursor/mcp.json` (project + global), absolute-path fallback, verification. - `agents/editors/codex.md` — `~/.codex/config.toml` + CLI helper, stdio-only caveat. - `agents/editors/windsurf.md` — `~/.codeium/windsurf/mcp_config.json`, restart caveat. - `agents/editors/opencode.md` — `opencode.json` with the differing key shape (`mcp` vs `mcpServers`, `command: [...]`, `environment` vs `env`). - `agents/tool-decision-matrix.md` — 21-row single-repo intent → tool table with anti-pattern column, plus 5-row group-mode table and a "When to chain" section. - `agents/idiomatic-prompts.md` — 5 paste-ready prompts (rename audit / auth-flow surfacing / HTTP contract reconstruction / findings-vs-baseline / onboarding) with target editor + expected tool calls + expected output. - `agents/discovery-and-resources.md` — site URL, `/llms.txt`, `/llms-full.txt`, `/llms-small.txt`, `/tool-catalog.json`, `AGENTS.md`, `CLAUDE.md`, registries. - `agents/registries.md` — Official MCP Registry (`server.json` shape), Smithery (`smithery.yaml` shape), Glama, awesome-mcp-servers, aggregator directories. - `agents/llms-txt-cheatsheet.md` — picking guidance for the three core bundles + custom sets. ### Banned-strings policy (`d8dddb2`) Removed `ladybug` and `kuzu` from `BANNED_LITERALS` in `scripts/check-banned-strings.sh`. LadybugDB is the default graph backend (M7) and a first-class product name in docs. The original ban dated from when the project was still deciding which graph engine to vendor; that decision shipped. `kuzu` is retained as historical lineage in cross-link prose ("the open-source successor to the pre-1.0 Kuzu codebase") which already lives in ADR 0011. ### Pages workflow hardening (`c54231d`) - `actions/checkout@v6` → `@de0fac2e...` (v6.0.2) - `jdx/mise-action@v4` → `@c37c9329...` (v2.4.4) - `actions/upload-pages-artifact@v5` → `@fc324d35...` - `actions/deploy-pages@v5` → `@cd2ce8fc...` Top-level `permissions: contents: read`; write scopes (`pages: write` + `id-token: write`) granted only on the `deploy` job. Resolves the same Token-Permissions HIGH pattern fixed in PR #78 for the other 4 workflows. ### LadybugDB polish (`3c7166b`) 38 prose substitutions across 13 files: replace awkward "the graph-database backend" workarounds with plain "LadybugDB" now that the literal is allowed. `@ladybugdb/core` (npm package) and `graph.lbug` (file extension) preserved. ## Validation - `mise run check` exit 0 — 1,339 tests across 8 packages (lint + typecheck + test + banned-strings + verdict) - `pnpm -F @opencodehub/docs build` — **64 pages built, all internal links valid**, Pagefind index ok, llm-nav banners patch all 63 .md files - `actionlint .github/workflows/*.yml` — clean - `bash scripts/check-banned-strings.sh` — PASS - `rg 'AC-[A-Z]-[0-9]|T-M[0-9]+-[0-9]+|W-[A-Z]-[0-9]+|S-[A-Z]-[0-9]+|E-[A-Z]-[0-9]+|CL-[A-Z]+|architecture-revised\.md' packages/docs/src/` — zero hits - Marketing-words sweep (`effortless`, `leverage`, `synergy`, `world-class`, `blazing-fast`, `cutting-edge`) — zero hits in docs prose ## Test plan - [ ] CI green on `docs/site-restore-v1` - [ ] After merge, the Pages workflow at `.github/workflows/pages.yml` triggers on first push to `main` (paths-filter on `packages/docs/**`) - [ ] Deployed site at https://theagenticguy.github.io/opencodehub/ replaces the May 1 snapshot - [ ] Manual verification: visit /agents/, /mcp/tools/, /tool-catalog.json - [ ] Manual verification: `/llms.txt`, `/llms-full.txt`, `/llms-small.txt` all resolve and contain "29 tools" / "LadybugDB" / "WASM" facts ## Out of scope - Submission to `skills.sh`, the official MCP Registry, Smithery, awesome-mcp-servers — research file at `.erpaval/sessions/session-05809d/research-skills-sh.md` and `.erpaval/sessions/session-05809d/research-agent-docs.md` capture the exact shape; PR-able as separate follow-ups. - Importing `.erpaval/solutions/**.md` as a Starlight content collection — investigated, deemed not worth shipping (lessons audience is the agent at edit-time, not docs readers; some lesson titles include literals the docs build's other guardrails reject). The `architecture/lessons.md` stub points readers at the directory.

## Summary Three deep specialists ran in parallel to: (1) close every CodeQL HIGH alert plus low-hanging mediums, (2) resolve every Scorecard HIGH `Token-Permissions` and MEDIUM `Pinned-Dependencies` alert, and (3) build a production-grade release pipeline with SLSA L3 provenance, cosign keyless signing, gated pre-release scans, and an operator runbook. ## Code-scanning alerts resolved ### CodeQL HIGH (15) — `packages/` | Class | Count | Pattern | |---|---|---| | `js/polynomial-redos` | 6 | regex tightened or replaced with deterministic `startsWith` / `charCodeAt` scans; ReDoS-prone alternations bounded | | `js/incomplete-sanitization` | 4 | escape `\\` BEFORE adding new backslashes from quote/pipe/SQL-LIKE escapes | | `js/file-system-race` | 3 | TOCTOU closed by collapsing `stat`+`read` into a single fd handle | | `js/redos` (exponential) | 1 | tightened SCIP descriptor regex char class | | `js/incomplete-url-substring-sanitization` | 1 | `new URL().hostname` in test fixture | ### CodeQL MEDIUM (low-hanging, 6) — `packages/scip-ingest/src/runners/index.ts` + `property-access.ts` - 3× `js/shell-command-*` — explicit `shell: false` in spawn calls; absolute-path resolution before exec - 1× `js/indirect-command-line-injection`, 1× `js/shell-command-injection-from-environment` — same fix - 2× `js/overly-large-range` — drop redundant `A-Z` from `[A-Za-z\\w]` ### Scorecard HIGH — Token-Permissions (4) - `codeql.yml`, `semgrep.yml`, `release-please.yml`, `sbom.yml` (sbom.yml has been retired; SBOM now lives in `release.yml` with proper job-scoped permissions). Top-level hoisted to `contents: read`; write scopes granted only on the job that needs them. ### Scorecard MEDIUM — Pinned-Dependencies (39) Every `uses:` across all 9 workflows pinned to a 40-char SHA with a trailing `# vX.Y.Z` comment. Dependabot updated to group all `github-actions` SHA bumps into a single weekly PR. `npm i -g node-gyp` pinned to `node-gyp@12.3.0`. Single documented exception: `slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@v2.1.0` is intentionally tag-pinned per the SLSA project's trust model (the trusted-builder protocol verifies the tag boundary; SHA-pinning short-circuits SLSA's provenance chain). Documented inline + in `RELEASE.md`. ## Release pipeline (new) ### `.github/workflows/release.yml` (new, 351 lines) Triggered by `release: [published]` and via `workflow_call` from `release-please.yml`. Job graph: 1. **`build`** — pnpm install + `pnpm -r build` + CycloneDX SBOM 1.5 + OCH analyze + OCH code-pack on the released SHA 2. **`scan`** — OCH self-scan + SARIF upload to code-scanning under category `release` 3. **`sign`** — cosign keyless (Sigstore OIDC) signs every artifact (SBOM, code-pack, attestations) and emits `.sig.bundle` files 4. **`provenance`** — SLSA L3 generator-generic-slsa3 reusable workflow emits `slsa.intoto.jsonl` 5. **`upload`** — attaches everything to the GitHub release in lockstep 6. **`publish`** (gated, off by default) — `vars.OCH_NPM_PUBLISH_ENABLED == 'true'` operator switch for future npm publish of `@opencodehub/cli` and `@opencodehub/mcp` ### `.github/workflows/pre-release-gate.yml` (new) Triggers on the release-please PR (`head_ref: ^release-please--`). Adds release-time scans not in the regular CI (npm-audit at high+, lockfile integrity, detect-secrets full sweep, license re-assert) plus an `if: always()` aggregator suitable as a required status check before the release PR can merge. ### `.github/workflows/release-please.yml` (refactor) Reduced to: run `release-please-action` → on `release_created`, hand off via `workflow_call` to `release.yml`. Sidesteps the durable-lesson finding that default `GITHUB_TOKEN` does not fire downstream `release: [published]` events. `release.yml` also listens on `release: published` so PAT-driven and manual `gh release create` flows still trigger the same pipeline. ### `.github/workflows/sbom.yml` (deleted) Consolidated into `release.yml`'s `build` job — SBOM, code-pack, and SARIF now share one anchored SHA. ### `docs/RELEASE.md` (new, 271 lines) Operator runbook: trigger model · asset inventory · cosign + slsa-verifier verification commands · manual hotfix path · environment configuration (cosign keyless requires only OIDC — no secrets to provision). ## Validation - `mise run check` exit 0 (lint + typecheck + 1,339 tests across 8 packages + banned-strings + verdict) - `bash scripts/smoke-mcp.sh` → PASS (29 tools) - `actionlint .github/workflows/*.yml` clean - All 11 YAML files (`workflows/` + `dependabot.yml`) parse via PyYAML - `rg 'AC-[A-Z]-[0-9]' packages/ scripts/` empty (zero spec-coordinate leakage) - Per-package test counts: analysis 127, cli 236, embedder 79+1skip, frameworks 86, ingestion 607, mcp 167, scip-ingest 58, wiki 15 ## Test plan - [ ] CI green on `chore/security-and-release-hardening` - [ ] CodeQL re-scan on PR shows 15 HIGH alerts fixed (CodeQL re-runs automatically; existing alerts auto-close on the next push to main) - [ ] Scorecard re-scan shows TokenPermissions HIGH alerts cleared (next weekly cron) - [ ] After merge, the next `release-please-action` PR exercises the new `pre-release-gate.yml` - [ ] After the next tag, manually verify cosign + slsa-verifier per `docs/RELEASE.md` instructions ## Out of scope - Scorecard `BinaryArtifactsID` (3) — vendored Tree-sitter `.wasm` blobs are intentional and reproducibly built (`scripts/build-vendor-wasms.sh`); no fix. - Scorecard `MaintainedID` / `CodeReviewID` / `VulnerabilitiesID` / `CIIBestPracticesID` / `FuzzingID` / `SASTID` — repo-meta signals, not code fixes; release-grade workflow + Scorecard re-run will improve these naturally. - CodeQL `note` severity (2 alerts) — out of scope per request.

…section (#87) ## Summary The OpenCodeHub Starlight docs site was deleted in PR #53 (May 4, commit `ecc86a3`) under T-M2-3 with the explicit promise to spin it up as `theagenticguy/opencodehub-docs`. That separate repo was never created. The site at https://theagenticguy.github.io/opencodehub/ has been serving the May 1 snapshot ever since — 28-tool / DuckDB-default / Node 20 / 14-language prose, missing every milestone since (M3-M7, Track A-D, parse-runtime flip, 20-scanner inventory, supply-chain hardening). This PR restores `packages/docs/` + `.github/workflows/pages.yml` from `ecc86a3^`, refreshes every page against v1 reality, adds a deep agent-friendly `agents/` section, ships a machine-readable tool catalog, hardens the workflow, and lifts `LadybugDB` out of the banned-strings policy now that it's a first-class product name. Three deep specialists ran in parallel after the bulk-restore, with one polish pass at the end. ## What's in here ### Restoration (`d393ecf`) 56 files restored from history. Build clean out of the box: 47 pages, links valid, Pagefind index, llm-nav banners. ### Content refresh (8 commits, `3148769` → `1eb333d`) - **Start here** — install (Node 22 or 24, mise, `codehub init`), quick-start (first MCP call), what-is-opencodehub, codehub-init, first-query — all v1. - **MCP** — `mcp/overview.md` reframes 29 tools across five families (exploration, group/federation, scan/findings/verdict, HTTP/routing, meta). `mcp/tools.md` rewritten as full per-tool catalog with when-to-use / when-not-to-use / signature / example. `mcp/resources.md` + `mcp/prompts.md` updated. - **Reference** — `cli.md` verified against `packages/cli/src/index.ts` shape; `configuration.md` env-var inventory + `AMBIGUOUS_REPO` envelope + `EMBEDDER_MISMATCH` from ADR 0014; `languages.md` 15-language table; `error-codes.md` current set. - **Architecture** — overview, monorepo-map (17 packages, dropped eval/gym, added cobol-proleap/frameworks/pack/policy/wiki), embeddings (3-backend precedence), parsing-and-resolution (WASM-default + native opt-in), determinism (graphHash invariant), scanners-and-sarif (20-scanner inventory), scip-reconciliation, supply-chain, adrs (0001-0014 index). - **New architecture pages** — `storage-backend.md` (LadybugDB + DuckDB segregation, IGraphStore/ITemporalStore, community-adapter escape hatch); `cross-repo-federation.md` (repo-as-typed-node, AMBIGUOUS_REPO, group_* tools); `lessons.md` (pointer to `.erpaval/solutions/`). - **New guides** — `migrating-from-duckdb.md` (three migration paths). - **Index hero** — splash with three CTAs (Install / Use / Develop) using Starlight `<Card>` / `<CardGrid>` — no marketing tiles. - **Sidebar IA** — Start here · Agents · MCP · Reference · Guides · Architecture · Skills · Contributing. - **astro.config llms-txt** — `description` + `details` rewritten with current 29-tool / 15-language / LadybugDB-default reality (per the durable lesson `llms-txt-as-ground-truth.md`). ### Tool catalog as data (`b3aed17`) `packages/docs/public/tool-catalog.json` — machine-readable canonical catalog of all 29 tools. Schema: `{ tools: [{ name, family, description, when_to_use, when_not_to_use, signature_sketch, example }] }`. Agents can `fetch('https://theagenticguy.github.io/opencodehub/tool-catalog.json')`. ### Agents section (4 commits, `c771f40` → `473cb82`) A new `packages/docs/src/content/docs/agents/` section, 14 pages, dedicated to AI-coding-agent discovery + usage: - `agents/index.md` — section landing with 90-second setup + 5-editor card grid. - `agents/why-mcp.md` — what an agent can't see without the graph; three failure modes; four MCP tool families. - `agents/install.md` — generic install for any MCP-speaking agent: prereqs, `mise run cli:link`, `codehub init` (writes `.mcp.json` + plugin link), `codehub analyze`, `codehub doctor`, per-editor handoff. - `agents/editors/claude-code.md` — deepest editor page: `.mcp.json` shape, 5 slash commands, `code-analyst` subagent, all 11 skills tabled, `hooks.json`. - `agents/editors/cursor.md` — `.cursor/mcp.json` (project + global), absolute-path fallback, verification. - `agents/editors/codex.md` — `~/.codex/config.toml` + CLI helper, stdio-only caveat. - `agents/editors/windsurf.md` — `~/.codeium/windsurf/mcp_config.json`, restart caveat. - `agents/editors/opencode.md` — `opencode.json` with the differing key shape (`mcp` vs `mcpServers`, `command: [...]`, `environment` vs `env`). - `agents/tool-decision-matrix.md` — 21-row single-repo intent → tool table with anti-pattern column, plus 5-row group-mode table and a "When to chain" section. - `agents/idiomatic-prompts.md` — 5 paste-ready prompts (rename audit / auth-flow surfacing / HTTP contract reconstruction / findings-vs-baseline / onboarding) with target editor + expected tool calls + expected output. - `agents/discovery-and-resources.md` — site URL, `/llms.txt`, `/llms-full.txt`, `/llms-small.txt`, `/tool-catalog.json`, `AGENTS.md`, `CLAUDE.md`, registries. - `agents/registries.md` — Official MCP Registry (`server.json` shape), Smithery (`smithery.yaml` shape), Glama, awesome-mcp-servers, aggregator directories. - `agents/llms-txt-cheatsheet.md` — picking guidance for the three core bundles + custom sets. ### Banned-strings policy (`a85a8f4`) Removed `ladybug` and `kuzu` from `BANNED_LITERALS` in `scripts/check-banned-strings.sh`. LadybugDB is the default graph backend (M7) and a first-class product name in docs. The original ban dated from when the project was still deciding which graph engine to vendor; that decision shipped. `kuzu` is retained as historical lineage in cross-link prose ("the open-source successor to the pre-1.0 Kuzu codebase") which already lives in ADR 0011. ### Pages workflow hardening (`808d97f`) - `actions/checkout@v6` → `@de0fac2e...` (v6.0.2) - `jdx/mise-action@v4` → `@c37c9329...` (v2.4.4) - `actions/upload-pages-artifact@v5` → `@fc324d35...` - `actions/deploy-pages@v5` → `@cd2ce8fc...` Top-level `permissions: contents: read`; write scopes (`pages: write` + `id-token: write`) granted only on the `deploy` job. Resolves the same Token-Permissions HIGH pattern fixed in PR #78 for the other 4 workflows. ### LadybugDB polish (`3d78ab8`) 38 prose substitutions across 13 files: replace awkward "the graph-database backend" workarounds with plain "LadybugDB" now that the literal is allowed. `@ladybugdb/core` (npm package) and `graph.lbug` (file extension) preserved. ## Validation - `mise run check` exit 0 — 1,339 tests across 8 packages (lint + typecheck + test + banned-strings + verdict) - `pnpm -F @opencodehub/docs build` — **64 pages built, all internal links valid**, Pagefind index ok, llm-nav banners patch all 63 .md files - `actionlint .github/workflows/*.yml` — clean - `bash scripts/check-banned-strings.sh` — PASS - `rg 'AC-[A-Z]-[0-9]|T-M[0-9]+-[0-9]+|W-[A-Z]-[0-9]+|S-[A-Z]-[0-9]+|E-[A-Z]-[0-9]+|CL-[A-Z]+|architecture-revised\.md' packages/docs/src/` — zero hits - Marketing-words sweep (`effortless`, `leverage`, `synergy`, `world-class`, `blazing-fast`, `cutting-edge`) — zero hits in docs prose ## Test plan - [ ] CI green on `docs/site-restore-v1` - [ ] After merge, the Pages workflow at `.github/workflows/pages.yml` triggers on first push to `main` (paths-filter on `packages/docs/**`) - [ ] Deployed site at https://theagenticguy.github.io/opencodehub/ replaces the May 1 snapshot - [ ] Manual verification: visit /agents/, /mcp/tools/, /tool-catalog.json - [ ] Manual verification: `/llms.txt`, `/llms-full.txt`, `/llms-small.txt` all resolve and contain "29 tools" / "LadybugDB" / "WASM" facts ## Out of scope - Submission to `skills.sh`, the official MCP Registry, Smithery, awesome-mcp-servers — research file at `.erpaval/sessions/session-05809d/research-skills-sh.md` and `.erpaval/sessions/session-05809d/research-agent-docs.md` capture the exact shape; PR-able as separate follow-ups. - Importing `.erpaval/solutions/**.md` as a Starlight content collection — investigated, deemed not worth shipping (lessons audience is the agent at edit-time, not docs readers; some lesson titles include literals the docs build's other guardrails reject). The `architecture/lessons.md` stub points readers at the directory.

theagenticguy added 15 commits May 10, 2026 17:16

theagenticguy merged commit 82fba62 into main May 10, 2026
37 checks passed

theagenticguy deleted the chore/security-and-release-hardening branch May 10, 2026 17:42

theagenticguy mentioned this pull request May 10, 2026

docs: restore Starlight site + refresh for v1 + agent-friendly USAGE section #87

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: security alerts + supply-chain release hardening#78

chore: security alerts + supply-chain release hardening#78
theagenticguy merged 15 commits into
mainfrom
chore/security-and-release-hardening

theagenticguy commented May 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

theagenticguy commented May 10, 2026

Summary

Code-scanning alerts resolved

CodeQL HIGH (15) — packages/

CodeQL MEDIUM (low-hanging, 6) — packages/scip-ingest/src/runners/index.ts + property-access.ts

Scorecard HIGH — Token-Permissions (4)

Scorecard MEDIUM — Pinned-Dependencies (39)

Release pipeline (new)

.github/workflows/release.yml (new, 351 lines)

.github/workflows/pre-release-gate.yml (new)

.github/workflows/release-please.yml (refactor)

.github/workflows/sbom.yml (deleted)

docs/RELEASE.md (new, 271 lines)

Validation

Test plan

Out of scope

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

CodeQL HIGH (15) — `packages/`

CodeQL MEDIUM (low-hanging, 6) — `packages/scip-ingest/src/runners/index.ts` + `property-access.ts`

`.github/workflows/release.yml` (new, 351 lines)

`.github/workflows/pre-release-gate.yml` (new)

`.github/workflows/release-please.yml` (refactor)

`.github/workflows/sbom.yml` (deleted)

`docs/RELEASE.md` (new, 271 lines)