Skip to content

chore: security alerts + supply-chain release hardening#78

Merged
theagenticguy merged 15 commits into
mainfrom
chore/security-and-release-hardening
May 10, 2026
Merged

chore: security alerts + supply-chain release hardening#78
theagenticguy merged 15 commits into
mainfrom
chore/security-and-release-hardening

Conversation

@theagenticguy
Copy link
Copy Markdown
Owner

Summary

Three deep specialists ran in parallel to: (1) close every CodeQL HIGH alert plus low-hanging mediums, (2) resolve every Scorecard HIGH Token-Permissions and MEDIUM Pinned-Dependencies alert, and (3) build a production-grade release pipeline with SLSA L3 provenance, cosign keyless signing, gated pre-release scans, and an operator runbook.

Code-scanning alerts resolved

CodeQL HIGH (15) — packages/

Class Count Pattern
js/polynomial-redos 6 regex tightened or replaced with deterministic startsWith / charCodeAt scans; ReDoS-prone alternations bounded
js/incomplete-sanitization 4 escape \\ BEFORE adding new backslashes from quote/pipe/SQL-LIKE escapes
js/file-system-race 3 TOCTOU closed by collapsing stat+read into a single fd handle
js/redos (exponential) 1 tightened SCIP descriptor regex char class
js/incomplete-url-substring-sanitization 1 new URL().hostname in test fixture

CodeQL MEDIUM (low-hanging, 6) — packages/scip-ingest/src/runners/index.ts + property-access.ts

  • js/shell-command-* — explicit shell: false in spawn calls; absolute-path resolution before exec
  • js/indirect-command-line-injection, 1× js/shell-command-injection-from-environment — same fix
  • js/overly-large-range — drop redundant A-Z from [A-Za-z\\w]

Scorecard HIGH — Token-Permissions (4)

  • codeql.yml, semgrep.yml, release-please.yml, sbom.yml (sbom.yml has been retired; SBOM now lives in release.yml with proper job-scoped permissions). Top-level hoisted to contents: read; write scopes granted only on the job that needs them.

Scorecard MEDIUM — Pinned-Dependencies (39)

Every uses: across all 9 workflows pinned to a 40-char SHA with a trailing # vX.Y.Z comment. Dependabot updated to group all github-actions SHA bumps into a single weekly PR. npm i -g node-gyp pinned to node-gyp@12.3.0.

Single documented exception: slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@v2.1.0 is intentionally tag-pinned per the SLSA project's trust model (the trusted-builder protocol verifies the tag boundary; SHA-pinning short-circuits SLSA's provenance chain). Documented inline + in RELEASE.md.

Release pipeline (new)

.github/workflows/release.yml (new, 351 lines)

Triggered by release: [published] and via workflow_call from release-please.yml. Job graph:

  1. build — pnpm install + pnpm -r build + CycloneDX SBOM 1.5 + OCH analyze + OCH code-pack on the released SHA
  2. scan — OCH self-scan + SARIF upload to code-scanning under category release
  3. sign — cosign keyless (Sigstore OIDC) signs every artifact (SBOM, code-pack, attestations) and emits .sig.bundle files
  4. provenance — SLSA L3 generator-generic-slsa3 reusable workflow emits slsa.intoto.jsonl
  5. upload — attaches everything to the GitHub release in lockstep
  6. publish (gated, off by default) — vars.OCH_NPM_PUBLISH_ENABLED == 'true' operator switch for future npm publish of @opencodehub/cli and @opencodehub/mcp

.github/workflows/pre-release-gate.yml (new)

Triggers on the release-please PR (head_ref: ^release-please--). Adds release-time scans not in the regular CI (npm-audit at high+, lockfile integrity, detect-secrets full sweep, license re-assert) plus an if: always() aggregator suitable as a required status check before the release PR can merge.

.github/workflows/release-please.yml (refactor)

Reduced to: run release-please-action → on release_created, hand off via workflow_call to release.yml. Sidesteps the durable-lesson finding that default GITHUB_TOKEN does not fire downstream release: [published] events. release.yml also listens on release: published so PAT-driven and manual gh release create flows still trigger the same pipeline.

.github/workflows/sbom.yml (deleted)

Consolidated into release.yml's build job — SBOM, code-pack, and SARIF now share one anchored SHA.

docs/RELEASE.md (new, 271 lines)

Operator runbook: trigger model · asset inventory · cosign + slsa-verifier verification commands · manual hotfix path · environment configuration (cosign keyless requires only OIDC — no secrets to provision).

Validation

  • mise run check exit 0 (lint + typecheck + 1,339 tests across 8 packages + banned-strings + verdict)
  • bash scripts/smoke-mcp.sh → PASS (29 tools)
  • actionlint .github/workflows/*.yml clean
  • All 11 YAML files (workflows/ + dependabot.yml) parse via PyYAML
  • rg 'AC-[A-Z]-[0-9]' packages/ scripts/ empty (zero spec-coordinate leakage)
  • Per-package test counts: analysis 127, cli 236, embedder 79+1skip, frameworks 86, ingestion 607, mcp 167, scip-ingest 58, wiki 15

Test plan

  • CI green on chore/security-and-release-hardening
  • CodeQL re-scan on PR shows 15 HIGH alerts fixed (CodeQL re-runs automatically; existing alerts auto-close on the next push to main)
  • Scorecard re-scan shows TokenPermissions HIGH alerts cleared (next weekly cron)
  • After merge, the next release-please-action PR exercises the new pre-release-gate.yml
  • After the next tag, manually verify cosign + slsa-verifier per docs/RELEASE.md instructions

Out of scope

  • Scorecard BinaryArtifactsID (3) — vendored Tree-sitter .wasm blobs are intentional and reproducibly built (scripts/build-vendor-wasms.sh); no fix.
  • Scorecard MaintainedID / CodeReviewID / VulnerabilitiesID / CIIBestPracticesID / FuzzingID / SASTID — repo-meta signals, not code fixes; release-grade workflow + Scorecard re-run will improve these naturally.
  • CodeQL note severity (2 alerts) — out of scope per request.

- git.ts: replace `^\+\+\+\s+(?:b\/)?(.+)$` regex with non-regex
  startsWith + slice scan so `+++\t\t\t...` lines cannot trigger
  polynomial backtracking.
- http-patterns.ts:normalizeHttpPath: replace `\?.*$` and `\/+$`
  with deterministic indexOf/charCodeAt loops.
- http-patterns.ts:PY_ROUTE_DECORATOR_RE: cap the path and methods
  literals at 256 chars; the unbounded `+` quantifier is what made
  the regex slow on `@A.route("!",methods=[\\...`.

Behaviour preserved: same set of matched paths, same hunk parser
contract. Existing analysis tests (127) still pass.

Fixes alerts #41 #119 #120 from CodeQL.
The `cfg.endpointUrl.replace(/\/+$/, "")` call trimmed trailing
slashes via a regex that runs polynomial-time on inputs with many
`/` characters. Replace with a character-by-character loop using
`charCodeAt` — same result, deterministic worst case.

Fixes alert #121 from CodeQL.
The yarn.lock entry regex `^"?([^"\s@][^"\s]*)@[^"\n]*"?:\s*$` had
an inner char class `[^"\s]*` that overlapped with the trailing
`@` delimiter, so an input like `!@@@@@@@@@@` would let the regex
backtrack across every `@` looking for a match. Tighten the inner
class to `[^"\s@]*` so the engine commits to the first `@` it sees.

Behaviour is unchanged for valid yarn.lock entries — the original
regex already forbade `@` in the package-name leading character,
and unscoped names never contain `@` mid-string.

Fixes alert #180 from CodeQL.
Resolves the Scorecard `Pinned-Dependencies` MEDIUM alerts by replacing
every `uses: <action>@<tag>` reference with a SHA-pinned form plus a
trailing comment carrying the original tag for human readability. The
trailing comment is also what Dependabot rewrites on weekly SHA bumps.

Tag-to-SHA mapping (resolved via `gh api /repos/<owner>/<repo>/commits/<tag>`):

  actions/checkout@v6                  -> de0fac2e4500dabe0009e67214ff5f5447ce83dd
  actions/upload-artifact@v7           -> 043fb46d1a93c77aae656e7c1c64a875d1fc6a0a
  jdx/mise-action@v4                   -> 1648a7812b9aeae629881980618f079932869151
  github/codeql-action/* @v4           -> 68bde559dea0fdcac2102bfdf6230c5f70eb485e
  ossf/scorecard-action@v2.4.3         -> 4eaacf0543bb3f2c246792bd56e8cdeffafb205a

Files touched: ci.yml, codeql.yml, commitlint.yml, och-self-scan.yml,
osv.yml, scorecard.yml, semgrep.yml. release-please.yml is being
rewritten in parallel by the release-hardening track and already
carries SHA pins as part of that rewrite.
Resolves Scorecard `Token-Permissions` HIGH alerts by demoting the
top-level workflow scope to `contents: read` and lifting the
write-scopes onto the single job that needs them. CodeQL's analyze job
keeps `security-events: write` for the SARIF upload; semgrep's job
keeps the same plus `contents: read`. Same effective permissions, but
any unrelated step in either workflow now runs read-only.

Files: codeql.yml, semgrep.yml.

Out of scope here:
- sbom.yml — file removed in the parallel release-hardening track
  (SBOM generation moved into the new release.yml).
- release-please.yml — rewritten in the parallel release-hardening
  track with the same hoist already applied.
…ign signing

Single tag-triggered workflow that anchors every job to the released
commit SHA. Listens on `release: published`, `workflow_call`, and
`workflow_dispatch` so it works with default GITHUB_TOKEN
(via inline workflow_call from release-please.yml), with a PAT-driven
release-please publish, and as a manual hotfix path.

Each release ships:

- opencodehub-pack.tar.gz (deterministic 100k-token code-pack BOM)
- SBOM.cdx.json (CycloneDX 1.5)
- och-scan.sarif (OCH self-scan at the released SHA)
- *.sig.bundle (cosign keyless Sigstore bundles for each blob)

Top-level permissions are read-only; per-job grants escalate where
strictly required (id-token: write for OIDC -> Fulcio + SLSA, contents:
write for release uploads, security-events: write for SARIF upload).

npm-publish job is gated by OCH_NPM_PUBLISH_ENABLED repo variable so
the dry-run scaffolding stays inert until packages flip to public.

All third-party actions pinned to commit SHAs with version comments;
the SLSA generator reusable workflow is the single tag-pinned
exception (the SLSA project's trust model relies on the tag).
…tion

Adds the release-time-only checks that don't belong in everyday CI:

- npm-audit at high+ severity
- pnpm lockfile integrity (--frozen-lockfile --ignore-scripts)
- detect-secrets full sweep against .secrets.baseline
- license allowlist re-assertion

Each job is gated `if: startsWith(github.head_ref, 'release-please--')`
so non-release PRs are no-ops. The aggregator job (`pre-release-gate`)
runs `if: always()` and treats skipped dependencies as pass — so the
required-status-check name resolves uniformly on every PR while
actually gating only release-please PRs.

Configure branch protection on main to require the
`Pre-release gate (aggregate)` job. Documented in docs/RELEASE.md.
Re-apply the analysis-package changes from 050acd7 that were lost
when c47286d (the parallel ci-pinning track) committed an old tree
snapshot.

- git.ts: replace the `+++` header regex with non-regex
  startsWith + slice scan so polynomial backtracking on tab-padded
  diff headers is impossible.
- http-patterns.ts:normalizeHttpPath: replace `\?.*$` and `\/+$`
  with deterministic indexOf/charCodeAt loops.
- http-patterns.ts:PY_ROUTE_DECORATOR_RE: cap path and methods
  literals at 256 chars to bound regex work.

Behaviour preserved; existing analysis tests (127) still pass.

Fixes alerts #41 #119 #120 from CodeQL.
Two changes wired together:

1. release-please.yml hands off to release.yml via uses / workflow_call
   after `release_created` is true, instead of inlining the artifact
   pipeline. This sidesteps the GITHUB_TOKEN downstream-event
   suppression rule (default token does NOT fire downstream
   `release: published` events). The inline call works regardless of
   token type.

2. sbom.yml retired. SBOM generation now lives in release.yml's `build`
   job alongside the code-pack, so SBOM + code-pack + scan output
   share a single anchored SHA and are co-signed in lockstep.
   Eliminates the drift class where SBOM and code-pack could reference
   different commits.

The split surface is now:

  push:main          -> release-please.yml   (open/update PR, cut tag)
  pull_request       -> pre-release-gate.yml (block merge if scans fail)
  workflow_call      -> release.yml          (inline post-tag pipeline)
  release:published  -> release.yml          (PAT-driven flow + manual)
  workflow_dispatch  -> release.yml          (operator hotfix path)
Documents the trigger model (push -> release-please-action -> PR ->
gate -> merge -> tag -> release.yml builds + signs), the artifacts
that ship with each release, downstream-consumer cosign + SLSA
verification commands, the manual hotfix override path, and the
environment configuration the pipeline expects (no long-lived
secrets except GITHUB_TOKEN; cosign keyless uses OIDC; SLSA generator
uses the same).

Calls out two operator-facing decisions:

- Optional `RELEASE_PLEASE_PAT` if you prefer one-workflow-per-concern
  over the workflow_call inline path.
- Optional `production-release` environment for a manual approval gate
  before any artifact is built / signed / attached.

Includes the verification recipe for slsa-verifier and cosign with
worked examples of `--certificate-identity` for both the direct
release.yml entry point and the release-please.yml workflow_call
entry point.
- pipeline/phases/scan.ts: replace path-based `fs.stat` + `fs.readFile`
  with a single `fs.open` handle, then `handle.stat()` and
  `handle.readFile()`. Operations now share one file descriptor —
  closes the TOCTOU window flagged by js/file-system-race.
- extract/tool-detector.ts:relaxedToJson: insert a `\\` -> `\\\\`
  escape pass before escaping `"` so JS literals containing a lone
  backslash (e.g. `'foo\"bar'`) no longer produce malformed JSON.
- extract/property-access.ts: drop the redundant `A-Za-z` ranges
  inside `[A-Za-z_$\w]` lookbehinds — `\w` already covers them and
  the overlap was tripping js/overly-large-range. Use `[\w$]` instead.
- pipeline/phases/markdown.test.ts: replace
  `.includes("example.com")` with a strict
  `new URL(...).hostname === "example.com"` check so a crafted
  `example.com.evil.test` host could not slip past the assertion
  (js/incomplete-url-substring-sanitization).

Existing 607 ingestion tests still pass.

Fixes alerts #38 #39 #40 #44 #131 from CodeQL.
- doctor.ts:registryPathCheck — drop the `access` probe and branch on
  `ENOENT` from the `readFile` itself, so the missing-file warn path
  and the read share one syscall.
- setup.test.ts — replace the `stat` then `readFile` pair with a
  single `readFile`; existence is inferred from a non-empty body.

Both paths previously opened a TOCTOU window between the existence
check and the read (js/file-system-race). Existing 236 cli tests
still pass.

Fixes alerts #42 #43 from CodeQL.
…mitters

- resources/repos.ts:yaml — escape `\` -> `\\` before escaping `"`,
  so a literal backslash in a registry value cannot pair with the
  appended `\"` to produce a malformed YAML escape.
- tools/sql.ts:formatCell — escape `\` -> `\\` before escaping `|`,
  so a pre-existing backslash in a SQL cell value cannot combine
  with the appended `\|` to break the markdown table escape (e.g.
  `foo\|bar` rendering as `foo\` + literal pipe).

Both paths previously triggered js/incomplete-sanitization. Existing
167 mcp tests still pass.

Fixes alerts #36 #37 from CodeQL.
`escapePipe` previously only escaped `|` for markdown table cells,
which meant a value like `foo\|bar` (literal backslash followed by
pipe) became `foo\\|bar` — a `\\` escape (rendered as `\`) followed
by an unescaped pipe, breaking the table layout. Escape `\` -> `\\`
first, then `|` -> `\|`, so pre-existing backslashes survive intact
as literal `\` and the pipe stays escaped.

Fixes alert #176 from CodeQL.
The chained `replace(/\\'/g, "'").replace(/\\/g, "\\\\").replace(/"/g, '\\"')`
approach incorrectly doubled valid JS escapes like `\n` and `\t`,
turning a JS source `'foo\nbar'` into a literal `\n` in the JSON
output instead of a newline character.

Replace with `jsSingleQuotedToJsonInner`, a character-walking pass
that:
  - drops the JS-only `\'` escape,
  - passes JSON-recognized escapes (`\"`, `\\`, `\/`, `\b`, `\f`,
    `\n`, `\r`, `\t`, `\uXXXX`) through unchanged,
  - escapes a bare `"` to `\"`,
  - doubles any other lone `\` so the literal backslash survives the
    JSON parser.

Adds a regression test covering `\\`, `\n`, and `\"` inputs.
This refines the alert #131 (js/incomplete-sanitization) fix from
a16dcee — same defect class, more accurate fix.
@theagenticguy theagenticguy merged commit 82fba62 into main May 10, 2026
37 checks passed
@theagenticguy theagenticguy deleted the chore/security-and-release-hardening branch May 10, 2026 17:42
theagenticguy added a commit that referenced this pull request May 10, 2026
…section (#87)

## Summary

The OpenCodeHub Starlight docs site was deleted in PR #53 (May 4, commit
`4431b53`) under T-M2-3 with the explicit promise to spin it up as
`theagenticguy/opencodehub-docs`. That separate repo was never created.
The site at https://theagenticguy.github.io/opencodehub/ has been
serving the May 1 snapshot ever since — 28-tool / DuckDB-default / Node
20 / 14-language prose, missing every milestone since (M3-M7, Track A-D,
parse-runtime flip, 20-scanner inventory, supply-chain hardening).

This PR restores `packages/docs/` + `.github/workflows/pages.yml` from
`4431b53^`, refreshes every page against v1 reality, adds a deep
agent-friendly `agents/` section, ships a machine-readable tool catalog,
hardens the workflow, and lifts `LadybugDB` out of the banned-strings
policy now that it's a first-class product name.

Three deep specialists ran in parallel after the bulk-restore, with one
polish pass at the end.

## What's in here

### Restoration (`f801f1a`)
56 files restored from history. Build clean out of the box: 47 pages,
links valid, Pagefind index, llm-nav banners.

### Content refresh (8 commits, `00a0fce` → `c0376d8`)
- **Start here** — install (Node 22 or 24, mise, `codehub init`),
quick-start (first MCP call), what-is-opencodehub, codehub-init,
first-query — all v1.
- **MCP** — `mcp/overview.md` reframes 29 tools across five families
(exploration, group/federation, scan/findings/verdict, HTTP/routing,
meta). `mcp/tools.md` rewritten as full per-tool catalog with
when-to-use / when-not-to-use / signature / example. `mcp/resources.md`
+ `mcp/prompts.md` updated.
- **Reference** — `cli.md` verified against `packages/cli/src/index.ts`
shape; `configuration.md` env-var inventory + `AMBIGUOUS_REPO` envelope
+ `EMBEDDER_MISMATCH` from ADR 0014; `languages.md` 15-language table;
`error-codes.md` current set.
- **Architecture** — overview, monorepo-map (17 packages, dropped
eval/gym, added cobol-proleap/frameworks/pack/policy/wiki), embeddings
(3-backend precedence), parsing-and-resolution (WASM-default + native
opt-in), determinism (graphHash invariant), scanners-and-sarif
(20-scanner inventory), scip-reconciliation, supply-chain, adrs
(0001-0014 index).
- **New architecture pages** — `storage-backend.md` (LadybugDB + DuckDB
segregation, IGraphStore/ITemporalStore, community-adapter escape
hatch); `cross-repo-federation.md` (repo-as-typed-node, AMBIGUOUS_REPO,
group_* tools); `lessons.md` (pointer to `.erpaval/solutions/`).
- **New guides** — `migrating-from-duckdb.md` (three migration paths).
- **Index hero** — splash with three CTAs (Install / Use / Develop)
using Starlight `<Card>` / `<CardGrid>` — no marketing tiles.
- **Sidebar IA** — Start here · Agents · MCP · Reference · Guides ·
Architecture · Skills · Contributing.
- **astro.config llms-txt** — `description` + `details` rewritten with
current 29-tool / 15-language / LadybugDB-default reality (per the
durable lesson `llms-txt-as-ground-truth.md`).

### Tool catalog as data (`b112b67`)
`packages/docs/public/tool-catalog.json` — machine-readable canonical
catalog of all 29 tools. Schema: `{ tools: [{ name, family, description,
when_to_use, when_not_to_use, signature_sketch, example }] }`. Agents
can
`fetch('https://theagenticguy.github.io/opencodehub/tool-catalog.json')`.

### Agents section (4 commits, `4e55203` → `3547b74`)
A new `packages/docs/src/content/docs/agents/` section, 14 pages,
dedicated to AI-coding-agent discovery + usage:
- `agents/index.md` — section landing with 90-second setup + 5-editor
card grid.
- `agents/why-mcp.md` — what an agent can't see without the graph; three
failure modes; four MCP tool families.
- `agents/install.md` — generic install for any MCP-speaking agent:
prereqs, `mise run cli:link`, `codehub init` (writes `.mcp.json` +
plugin link), `codehub analyze`, `codehub doctor`, per-editor handoff.
- `agents/editors/claude-code.md` — deepest editor page: `.mcp.json`
shape, 5 slash commands, `code-analyst` subagent, all 11 skills tabled,
`hooks.json`.
- `agents/editors/cursor.md` — `.cursor/mcp.json` (project + global),
absolute-path fallback, verification.
- `agents/editors/codex.md` — `~/.codex/config.toml` + CLI helper,
stdio-only caveat.
- `agents/editors/windsurf.md` — `~/.codeium/windsurf/mcp_config.json`,
restart caveat.
- `agents/editors/opencode.md` — `opencode.json` with the differing key
shape (`mcp` vs `mcpServers`, `command: [...]`, `environment` vs `env`).
- `agents/tool-decision-matrix.md` — 21-row single-repo intent → tool
table with anti-pattern column, plus 5-row group-mode table and a "When
to chain" section.
- `agents/idiomatic-prompts.md` — 5 paste-ready prompts (rename audit /
auth-flow surfacing / HTTP contract reconstruction /
findings-vs-baseline / onboarding) with target editor + expected tool
calls + expected output.
- `agents/discovery-and-resources.md` — site URL, `/llms.txt`,
`/llms-full.txt`, `/llms-small.txt`, `/tool-catalog.json`, `AGENTS.md`,
`CLAUDE.md`, registries.
- `agents/registries.md` — Official MCP Registry (`server.json` shape),
Smithery (`smithery.yaml` shape), Glama, awesome-mcp-servers, aggregator
directories.
- `agents/llms-txt-cheatsheet.md` — picking guidance for the three core
bundles + custom sets.

### Banned-strings policy (`d8dddb2`)
Removed `ladybug` and `kuzu` from `BANNED_LITERALS` in
`scripts/check-banned-strings.sh`. LadybugDB is the default graph
backend (M7) and a first-class product name in docs. The original ban
dated from when the project was still deciding which graph engine to
vendor; that decision shipped. `kuzu` is retained as historical lineage
in cross-link prose ("the open-source successor to the pre-1.0 Kuzu
codebase") which already lives in ADR 0011.

### Pages workflow hardening (`c54231d`)
- `actions/checkout@v6` → `@de0fac2e...` (v6.0.2)
- `jdx/mise-action@v4` → `@c37c9329...` (v2.4.4)
- `actions/upload-pages-artifact@v5` → `@fc324d35...`
- `actions/deploy-pages@v5` → `@cd2ce8fc...`

Top-level `permissions: contents: read`; write scopes (`pages: write` +
`id-token: write`) granted only on the `deploy` job. Resolves the same
Token-Permissions HIGH pattern fixed in PR #78 for the other 4
workflows.

### LadybugDB polish (`3c7166b`)
38 prose substitutions across 13 files: replace awkward "the
graph-database backend" workarounds with plain "LadybugDB" now that the
literal is allowed. `@ladybugdb/core` (npm package) and `graph.lbug`
(file extension) preserved.

## Validation

- `mise run check` exit 0 — 1,339 tests across 8 packages (lint +
typecheck + test + banned-strings + verdict)
- `pnpm -F @opencodehub/docs build` — **64 pages built, all internal
links valid**, Pagefind index ok, llm-nav banners patch all 63 .md files
- `actionlint .github/workflows/*.yml` — clean
- `bash scripts/check-banned-strings.sh` — PASS
- `rg
'AC-[A-Z]-[0-9]|T-M[0-9]+-[0-9]+|W-[A-Z]-[0-9]+|S-[A-Z]-[0-9]+|E-[A-Z]-[0-9]+|CL-[A-Z]+|architecture-revised\.md'
packages/docs/src/` — zero hits
- Marketing-words sweep (`effortless`, `leverage`, `synergy`,
`world-class`, `blazing-fast`, `cutting-edge`) — zero hits in docs prose

## Test plan

- [ ] CI green on `docs/site-restore-v1`
- [ ] After merge, the Pages workflow at `.github/workflows/pages.yml`
triggers on first push to `main` (paths-filter on `packages/docs/**`)
- [ ] Deployed site at https://theagenticguy.github.io/opencodehub/
replaces the May 1 snapshot
- [ ] Manual verification: visit /agents/, /mcp/tools/,
/tool-catalog.json
- [ ] Manual verification: `/llms.txt`, `/llms-full.txt`,
`/llms-small.txt` all resolve and contain "29 tools" / "LadybugDB" /
"WASM" facts

## Out of scope

- Submission to `skills.sh`, the official MCP Registry, Smithery,
awesome-mcp-servers — research file at
`.erpaval/sessions/session-05809d/research-skills-sh.md` and
`.erpaval/sessions/session-05809d/research-agent-docs.md` capture the
exact shape; PR-able as separate follow-ups.
- Importing `.erpaval/solutions/**.md` as a Starlight content collection
— investigated, deemed not worth shipping (lessons audience is the agent
at edit-time, not docs readers; some lesson titles include literals the
docs build's other guardrails reject). The `architecture/lessons.md`
stub points readers at the directory.
theagenticguy added a commit that referenced this pull request May 10, 2026
## Summary

Three deep specialists ran in parallel to: (1) close every CodeQL HIGH
alert plus low-hanging mediums, (2) resolve every Scorecard HIGH
`Token-Permissions` and MEDIUM `Pinned-Dependencies` alert, and (3)
build a production-grade release pipeline with SLSA L3 provenance,
cosign keyless signing, gated pre-release scans, and an operator
runbook.

## Code-scanning alerts resolved

### CodeQL HIGH (15) — `packages/`

| Class | Count | Pattern |
|---|---|---|
| `js/polynomial-redos` | 6 | regex tightened or replaced with
deterministic `startsWith` / `charCodeAt` scans; ReDoS-prone
alternations bounded |
| `js/incomplete-sanitization` | 4 | escape `\\` BEFORE adding new
backslashes from quote/pipe/SQL-LIKE escapes |
| `js/file-system-race` | 3 | TOCTOU closed by collapsing `stat`+`read`
into a single fd handle |
| `js/redos` (exponential) | 1 | tightened SCIP descriptor regex char
class |
| `js/incomplete-url-substring-sanitization` | 1 | `new URL().hostname`
in test fixture |

### CodeQL MEDIUM (low-hanging, 6) —
`packages/scip-ingest/src/runners/index.ts` + `property-access.ts`
- 3× `js/shell-command-*` — explicit `shell: false` in spawn calls;
absolute-path resolution before exec
- 1× `js/indirect-command-line-injection`, 1×
`js/shell-command-injection-from-environment` — same fix
- 2× `js/overly-large-range` — drop redundant `A-Z` from `[A-Za-z\\w]`

### Scorecard HIGH — Token-Permissions (4)
- `codeql.yml`, `semgrep.yml`, `release-please.yml`, `sbom.yml`
(sbom.yml has been retired; SBOM now lives in `release.yml` with proper
job-scoped permissions). Top-level hoisted to `contents: read`; write
scopes granted only on the job that needs them.

### Scorecard MEDIUM — Pinned-Dependencies (39)
Every `uses:` across all 9 workflows pinned to a 40-char SHA with a
trailing `# vX.Y.Z` comment. Dependabot updated to group all
`github-actions` SHA bumps into a single weekly PR. `npm i -g node-gyp`
pinned to `node-gyp@12.3.0`.

Single documented exception:
`slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@v2.1.0`
is intentionally tag-pinned per the SLSA project's trust model (the
trusted-builder protocol verifies the tag boundary; SHA-pinning
short-circuits SLSA's provenance chain). Documented inline + in
`RELEASE.md`.

## Release pipeline (new)

### `.github/workflows/release.yml` (new, 351 lines)
Triggered by `release: [published]` and via `workflow_call` from
`release-please.yml`. Job graph:
1. **`build`** — pnpm install + `pnpm -r build` + CycloneDX SBOM 1.5 +
OCH analyze + OCH code-pack on the released SHA
2. **`scan`** — OCH self-scan + SARIF upload to code-scanning under
category `release`
3. **`sign`** — cosign keyless (Sigstore OIDC) signs every artifact
(SBOM, code-pack, attestations) and emits `.sig.bundle` files
4. **`provenance`** — SLSA L3 generator-generic-slsa3 reusable workflow
emits `slsa.intoto.jsonl`
5. **`upload`** — attaches everything to the GitHub release in lockstep
6. **`publish`** (gated, off by default) — `vars.OCH_NPM_PUBLISH_ENABLED
== 'true'` operator switch for future npm publish of `@opencodehub/cli`
and `@opencodehub/mcp`

### `.github/workflows/pre-release-gate.yml` (new)
Triggers on the release-please PR (`head_ref: ^release-please--`). Adds
release-time scans not in the regular CI (npm-audit at high+, lockfile
integrity, detect-secrets full sweep, license re-assert) plus an `if:
always()` aggregator suitable as a required status check before the
release PR can merge.

### `.github/workflows/release-please.yml` (refactor)
Reduced to: run `release-please-action` → on `release_created`, hand off
via `workflow_call` to `release.yml`. Sidesteps the durable-lesson
finding that default `GITHUB_TOKEN` does not fire downstream `release:
[published]` events. `release.yml` also listens on `release: published`
so PAT-driven and manual `gh release create` flows still trigger the
same pipeline.

### `.github/workflows/sbom.yml` (deleted)
Consolidated into `release.yml`'s `build` job — SBOM, code-pack, and
SARIF now share one anchored SHA.

### `docs/RELEASE.md` (new, 271 lines)
Operator runbook: trigger model · asset inventory · cosign +
slsa-verifier verification commands · manual hotfix path · environment
configuration (cosign keyless requires only OIDC — no secrets to
provision).

## Validation

- `mise run check` exit 0 (lint + typecheck + 1,339 tests across 8
packages + banned-strings + verdict)
- `bash scripts/smoke-mcp.sh` → PASS (29 tools)
- `actionlint .github/workflows/*.yml` clean
- All 11 YAML files (`workflows/` + `dependabot.yml`) parse via PyYAML
- `rg 'AC-[A-Z]-[0-9]' packages/ scripts/` empty (zero spec-coordinate
leakage)
- Per-package test counts: analysis 127, cli 236, embedder 79+1skip,
frameworks 86, ingestion 607, mcp 167, scip-ingest 58, wiki 15

## Test plan

- [ ] CI green on `chore/security-and-release-hardening`
- [ ] CodeQL re-scan on PR shows 15 HIGH alerts fixed (CodeQL re-runs
automatically; existing alerts auto-close on the next push to main)
- [ ] Scorecard re-scan shows TokenPermissions HIGH alerts cleared (next
weekly cron)
- [ ] After merge, the next `release-please-action` PR exercises the new
`pre-release-gate.yml`
- [ ] After the next tag, manually verify cosign + slsa-verifier per
`docs/RELEASE.md` instructions

## Out of scope

- Scorecard `BinaryArtifactsID` (3) — vendored Tree-sitter `.wasm` blobs
are intentional and reproducibly built
(`scripts/build-vendor-wasms.sh`); no fix.
- Scorecard `MaintainedID` / `CodeReviewID` / `VulnerabilitiesID` /
`CIIBestPracticesID` / `FuzzingID` / `SASTID` — repo-meta signals, not
code fixes; release-grade workflow + Scorecard re-run will improve these
naturally.
- CodeQL `note` severity (2 alerts) — out of scope per request.
theagenticguy added a commit that referenced this pull request May 10, 2026
…section (#87)

## Summary

The OpenCodeHub Starlight docs site was deleted in PR #53 (May 4, commit
`ecc86a3`) under T-M2-3 with the explicit promise to spin it up as
`theagenticguy/opencodehub-docs`. That separate repo was never created.
The site at https://theagenticguy.github.io/opencodehub/ has been
serving the May 1 snapshot ever since — 28-tool / DuckDB-default / Node
20 / 14-language prose, missing every milestone since (M3-M7, Track A-D,
parse-runtime flip, 20-scanner inventory, supply-chain hardening).

This PR restores `packages/docs/` + `.github/workflows/pages.yml` from
`ecc86a3^`, refreshes every page against v1 reality, adds a deep
agent-friendly `agents/` section, ships a machine-readable tool catalog,
hardens the workflow, and lifts `LadybugDB` out of the banned-strings
policy now that it's a first-class product name.

Three deep specialists ran in parallel after the bulk-restore, with one
polish pass at the end.

## What's in here

### Restoration (`d393ecf`)
56 files restored from history. Build clean out of the box: 47 pages,
links valid, Pagefind index, llm-nav banners.

### Content refresh (8 commits, `3148769` → `1eb333d`)
- **Start here** — install (Node 22 or 24, mise, `codehub init`),
quick-start (first MCP call), what-is-opencodehub, codehub-init,
first-query — all v1.
- **MCP** — `mcp/overview.md` reframes 29 tools across five families
(exploration, group/federation, scan/findings/verdict, HTTP/routing,
meta). `mcp/tools.md` rewritten as full per-tool catalog with
when-to-use / when-not-to-use / signature / example. `mcp/resources.md`
+ `mcp/prompts.md` updated.
- **Reference** — `cli.md` verified against `packages/cli/src/index.ts`
shape; `configuration.md` env-var inventory + `AMBIGUOUS_REPO` envelope
+ `EMBEDDER_MISMATCH` from ADR 0014; `languages.md` 15-language table;
`error-codes.md` current set.
- **Architecture** — overview, monorepo-map (17 packages, dropped
eval/gym, added cobol-proleap/frameworks/pack/policy/wiki), embeddings
(3-backend precedence), parsing-and-resolution (WASM-default + native
opt-in), determinism (graphHash invariant), scanners-and-sarif
(20-scanner inventory), scip-reconciliation, supply-chain, adrs
(0001-0014 index).
- **New architecture pages** — `storage-backend.md` (LadybugDB + DuckDB
segregation, IGraphStore/ITemporalStore, community-adapter escape
hatch); `cross-repo-federation.md` (repo-as-typed-node, AMBIGUOUS_REPO,
group_* tools); `lessons.md` (pointer to `.erpaval/solutions/`).
- **New guides** — `migrating-from-duckdb.md` (three migration paths).
- **Index hero** — splash with three CTAs (Install / Use / Develop)
using Starlight `<Card>` / `<CardGrid>` — no marketing tiles.
- **Sidebar IA** — Start here · Agents · MCP · Reference · Guides ·
Architecture · Skills · Contributing.
- **astro.config llms-txt** — `description` + `details` rewritten with
current 29-tool / 15-language / LadybugDB-default reality (per the
durable lesson `llms-txt-as-ground-truth.md`).

### Tool catalog as data (`b3aed17`)
`packages/docs/public/tool-catalog.json` — machine-readable canonical
catalog of all 29 tools. Schema: `{ tools: [{ name, family, description,
when_to_use, when_not_to_use, signature_sketch, example }] }`. Agents
can
`fetch('https://theagenticguy.github.io/opencodehub/tool-catalog.json')`.

### Agents section (4 commits, `c771f40` → `473cb82`)
A new `packages/docs/src/content/docs/agents/` section, 14 pages,
dedicated to AI-coding-agent discovery + usage:
- `agents/index.md` — section landing with 90-second setup + 5-editor
card grid.
- `agents/why-mcp.md` — what an agent can't see without the graph; three
failure modes; four MCP tool families.
- `agents/install.md` — generic install for any MCP-speaking agent:
prereqs, `mise run cli:link`, `codehub init` (writes `.mcp.json` +
plugin link), `codehub analyze`, `codehub doctor`, per-editor handoff.
- `agents/editors/claude-code.md` — deepest editor page: `.mcp.json`
shape, 5 slash commands, `code-analyst` subagent, all 11 skills tabled,
`hooks.json`.
- `agents/editors/cursor.md` — `.cursor/mcp.json` (project + global),
absolute-path fallback, verification.
- `agents/editors/codex.md` — `~/.codex/config.toml` + CLI helper,
stdio-only caveat.
- `agents/editors/windsurf.md` — `~/.codeium/windsurf/mcp_config.json`,
restart caveat.
- `agents/editors/opencode.md` — `opencode.json` with the differing key
shape (`mcp` vs `mcpServers`, `command: [...]`, `environment` vs `env`).
- `agents/tool-decision-matrix.md` — 21-row single-repo intent → tool
table with anti-pattern column, plus 5-row group-mode table and a "When
to chain" section.
- `agents/idiomatic-prompts.md` — 5 paste-ready prompts (rename audit /
auth-flow surfacing / HTTP contract reconstruction /
findings-vs-baseline / onboarding) with target editor + expected tool
calls + expected output.
- `agents/discovery-and-resources.md` — site URL, `/llms.txt`,
`/llms-full.txt`, `/llms-small.txt`, `/tool-catalog.json`, `AGENTS.md`,
`CLAUDE.md`, registries.
- `agents/registries.md` — Official MCP Registry (`server.json` shape),
Smithery (`smithery.yaml` shape), Glama, awesome-mcp-servers, aggregator
directories.
- `agents/llms-txt-cheatsheet.md` — picking guidance for the three core
bundles + custom sets.

### Banned-strings policy (`a85a8f4`)
Removed `ladybug` and `kuzu` from `BANNED_LITERALS` in
`scripts/check-banned-strings.sh`. LadybugDB is the default graph
backend (M7) and a first-class product name in docs. The original ban
dated from when the project was still deciding which graph engine to
vendor; that decision shipped. `kuzu` is retained as historical lineage
in cross-link prose ("the open-source successor to the pre-1.0 Kuzu
codebase") which already lives in ADR 0011.

### Pages workflow hardening (`808d97f`)
- `actions/checkout@v6` → `@de0fac2e...` (v6.0.2)
- `jdx/mise-action@v4` → `@c37c9329...` (v2.4.4)
- `actions/upload-pages-artifact@v5` → `@fc324d35...`
- `actions/deploy-pages@v5` → `@cd2ce8fc...`

Top-level `permissions: contents: read`; write scopes (`pages: write` +
`id-token: write`) granted only on the `deploy` job. Resolves the same
Token-Permissions HIGH pattern fixed in PR #78 for the other 4
workflows.

### LadybugDB polish (`3d78ab8`)
38 prose substitutions across 13 files: replace awkward "the
graph-database backend" workarounds with plain "LadybugDB" now that the
literal is allowed. `@ladybugdb/core` (npm package) and `graph.lbug`
(file extension) preserved.

## Validation

- `mise run check` exit 0 — 1,339 tests across 8 packages (lint +
typecheck + test + banned-strings + verdict)
- `pnpm -F @opencodehub/docs build` — **64 pages built, all internal
links valid**, Pagefind index ok, llm-nav banners patch all 63 .md files
- `actionlint .github/workflows/*.yml` — clean
- `bash scripts/check-banned-strings.sh` — PASS
- `rg
'AC-[A-Z]-[0-9]|T-M[0-9]+-[0-9]+|W-[A-Z]-[0-9]+|S-[A-Z]-[0-9]+|E-[A-Z]-[0-9]+|CL-[A-Z]+|architecture-revised\.md'
packages/docs/src/` — zero hits
- Marketing-words sweep (`effortless`, `leverage`, `synergy`,
`world-class`, `blazing-fast`, `cutting-edge`) — zero hits in docs prose

## Test plan

- [ ] CI green on `docs/site-restore-v1`
- [ ] After merge, the Pages workflow at `.github/workflows/pages.yml`
triggers on first push to `main` (paths-filter on `packages/docs/**`)
- [ ] Deployed site at https://theagenticguy.github.io/opencodehub/
replaces the May 1 snapshot
- [ ] Manual verification: visit /agents/, /mcp/tools/,
/tool-catalog.json
- [ ] Manual verification: `/llms.txt`, `/llms-full.txt`,
`/llms-small.txt` all resolve and contain "29 tools" / "LadybugDB" /
"WASM" facts

## Out of scope

- Submission to `skills.sh`, the official MCP Registry, Smithery,
awesome-mcp-servers — research file at
`.erpaval/sessions/session-05809d/research-skills-sh.md` and
`.erpaval/sessions/session-05809d/research-agent-docs.md` capture the
exact shape; PR-able as separate follow-ups.
- Importing `.erpaval/solutions/**.md` as a Starlight content collection
— investigated, deemed not worth shipping (lessons audience is the agent
at edit-time, not docs readers; some lesson titles include literals the
docs build's other guardrails reject). The `architecture/lessons.md`
stub points readers at the directory.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant