release: To Prod by suisuss · Pull Request #1337 · KeeperHub/keeperhub

suisuss · 2026-05-21T12:52:56Z

Summary

Release staging -> prod. Notable security-critical content in this release:

Security incident 2026-05-21 follow-up (PR #1335 merged into staging)

NetworkPolicy on workflow runner pods: kubernetes egress block to 169.254.0.0/16, fe80::/10, fc00::/7 (IMDS + link-local + ULA).
AI Prompt off in prod: gated behind NEXT_PUBLIC_AI_PROMPT_ENABLED (defaults false). UI + /api/ai/generate both gated.
DB-level kill switch (drizzle migration 0082):
- block_database_integrations_security_2026_05_21 declared in migrations (the trigger was already ad-hoc on prod).
- NEW block_executions_security_2026_05_21 rejects INSERT into workflow_executions when the workflow owner is deactivated_at IS NOT NULL or the workflow is deleted_at IS NOT NULL. Raises 42501.
Both triggers are idempotent (CREATE OR REPLACE FUNCTION + DROP TRIGGER IF EXISTS / CREATE TRIGGER).
This closes the live attacker leak where deactivated accounts continued running workflows.

Test plan

CI green on PR hotfix(security): defence-in-depth - NetworkPolicy, AI Prompt off, DB kill switch #1335 before merge to staging
Staging deploy succeeded (run 26234015878)
CI green on this PR before merge to prod
After merge to prod: verify both triggers present on prod DB (SELECT tgname FROM pg_trigger WHERE tgname LIKE '%security_2026_05_21%')
Sample insert into workflow_executions for a soft-deleted workflow on prod returns 42501 error
No spike in legitimate execution errors after deploy (Joel's canaries, Dumitru's Sky monitoring continue running)

Extracts the SSRF blocklist data (CIDR ranges, hostname patterns, NAT64 well-known prefix) to a new lib/ssrf-blocklist.ts module. Both consumers read from this one file: - lib/safe-fetch.ts imports the data and builds its node:net BlockList and Set<string> at module-load time. - lib/sandbox/child-source.ts interpolates the data into the grandchild source template via JSON.stringify, so the inlined literals in the rendered "node -e" string are automatically in sync with the main-app blocklist. Closes the drift this PR was originally responding to. Before this commit the sandbox guard was a hand-maintained copy that stopped at KEEP-314 - missing the four IPv6 ranges added in the KEEP-603 PR (64:ff9b:1::/48 site-local NAT64, 2001::/32 Teredo, 2002::/16 6to4, 2001:db8::/32 documentation) and missing the entire pre-DNS hostname denylist (localhost, *.local, *.internal, *.svc.cluster.local, *.pod.cluster.local). It now picks up the same set automatically. The sandbox docblock is updated: imports are now allowed in this module, but restricted to pure data consumed at template-render time; runtime behaviour still cannot cross into the grandchild because the grandchild has no access to npm or to the rest of the codebase. Validation: pnpm vitest passes (189/189). Lint and type-check clean on the changed files. The rendered SANDBOX_CHILD_SOURCE string was piped through node --check to confirm the interpolated template is syntactically valid JavaScript, and the key CIDR and host-pattern markers are present in the rendered output. No public API change. The sandbox guard remains always-enforce; no SAFE_FETCH_ENFORCE wiring is needed (child-source.ts does not read the env var).

Fixes the standalone @keeperhub/sandbox package's tsc build, which failed on the previous commit's @/-prefixed import because that package's tsconfig maps `@/*` to its own `./src/*` (not to the keeperhub workspace root). The new import in child-source.ts is switched to a relative path that resolves identically from both build contexts; an inline comment notes the asymmetry with safe-fetch.ts. Adds tests/unit/sandbox-child-source.test.ts that asserts every entry from lib/ssrf-blocklist.ts appears inline in the rendered SANDBOX_CHILD_SOURCE string, plus a parse-only vm.Script check. This locks the SoT contract: a future regression that reverts to hardcoded literals in child-source.ts would fail the test, not slip through. Validation: sandbox tsc build now emits dist/lib/ssrf-blocklist.js alongside dist/lib/sandbox/child-source.js, with the compiled require target resolving correctly at runtime. sandbox package tests 25/25 pass; keeperhub touched suites 207/207 pass (was 197 before this file). Lint clean on changed files; no new type errors.

…ontext The sandbox Dockerfile's build stage takes a "copy only what is needed" approach and explicitly named `lib/sandbox/child-source.ts` as the one shared file pulled in from the keeperhub workspace. After this branch's SoT extraction, child-source.ts also imports `../ssrf-blocklist`, but that file wasn't being copied into the build context - so the sandbox tsc step failed with "Cannot find module '../ssrf-blocklist'" inside the Docker build (local pnpm builds passed because the full workspace tree was available). Adds the file as an explicit COPY alongside the existing child-source.ts copy, with a comment explaining the dependency. The sandbox tsc emits dist/lib/ssrf-blocklist.js next to dist/lib/sandbox/child-source.js; the latter's compiled `require("../ssrf-blocklist")` resolves at runtime.

CI's test-unit-sandbox-remote still failed after the Dockerfile COPY addition: the build stage now succeeded but the runtime container hit Error [ERR_MODULE_NOT_FOUND]: Cannot find module '/app/dist/lib/ssrf-blocklist' imported from /app/dist/lib/sandbox/child-source.js The standalone @keeperhub/sandbox package is `"type": "module"` and runs its compiled output via plain `node`, whose strict ESM loader requires explicit file extensions in import specifiers. The extension- less `"../ssrf-blocklist"` import worked at tsc time (sandbox tsconfig uses moduleResolution: "bundler") and on the keeperhub side (Next.js bundler) but blew up at sandbox runtime. Switches the import to `"../ssrf-blocklist.js"` - the NodeNext / strict ESM convention. This matches how `sandbox/src/run-code.ts` already imports `"../../lib/sandbox/child-source.js"`. TS still resolves to the matching .ts source at compile time; the emitted output keeps the .js suffix verbatim so Node's resolver finds the compiled file. Validated locally: sandbox tsc emits the .js suffix into dist/lib/sandbox/child-source.js, and `node dist/sandbox/src/index.js` launches without module-resolution errors. Keeperhub side: 197/197 tests pass; type-check unchanged.

… import CI surfaced an irreducible conflict between the two consumers of the shared SSRF blocklist: - lib/safe-fetch.ts is bundled by Next.js Turbopack, which resolves extension-less relative imports to .ts source files but refuses .js-suffixed imports back to the .ts source ("Module not found: Can't resolve '../ssrf-blocklist.js'" during the keeperhub build). - lib/sandbox/child-source.ts is also compiled by the standalone @keeperhub/sandbox package, whose strict ESM runtime requires explicit .js extensions and rejects extension-less imports with ERR_MODULE_NOT_FOUND at startup. A .ts data file is unreachable from one consumer or the other with any single import specifier. Switches the data file to .json, which both bundlers resolve identically. - lib/ssrf-blocklist.json owns the data. - lib/ssrf-blocklist.ts is now a thin typed wrapper that imports the JSON and re-exports the values with proper Cidr tuple types. The keeperhub side imports paths are unchanged; safe-fetch.ts and connection-host-guard.ts keep their existing module specifiers. - lib/sandbox/child-source.ts imports the JSON directly with `with { type: "json" }`, skipping the TS wrapper entirely - the sandbox doesn't need TS types because the values are interpolated into the template literal as JSON.stringify(...). - sandbox/Dockerfile copies the .json (not the .ts) into the build context. - sandbox/package.json's build script copies the .json into dist after tsc, because tsc does not emit .json files even with resolveJsonModule enabled. Validated locally: sandbox tsc + JSON copy emits dist/lib/ssrf-blocklist.json alongside dist/lib/sandbox/child-source.js; `node dist/sandbox/src/index.js` starts without module-resolution errors; keeperhub unit suites 197/197 pass; lint clean; sandbox typecheck clean.

The workflow execution path that actually runs user step code (and therefore calls safeFetch) is the ephemeral workflow-* runner pod spawned by the executor controller via createWorkflowJob. These pods do not read Helm values directly - their env is constructed in keeperhub-executor/k8s-job.ts from a hardcoded explicit list plus the RUNNER_SYSTEM_ENV_VARS allowlist. Pre-fix posture: SAFE_FETCH_ENFORCE was on neither path, so runner pods got the env unset, isShadowMode() returned true, and the SSRF guard logged blocks but let requests through. The IMDS-401 case originally reported in production was almost certainly a runner pod - the controller pod can be hardened via Helm (PR #1331) but that does not propagate to its children. Hardcodes `{ name: "SAFE_FETCH_ENFORCE", value: "true" }` in the explicit list rather than adding to RUNNER_SYSTEM_ENV_VARS. The hardcoded form makes the runner enforce regardless of controller config drift; flipping it back to shadow mode now requires a code change here, which is the intended posture for an SSRF guard in an incident-response context. Tests added: - runner pod gets SAFE_FETCH_ENFORCE=true when controller env is unset - runner pod still gets SAFE_FETCH_ENFORCE=true (hardcoded value wins) when controller env says "false", and the entry appears exactly once

…etch-forward fix: force-enable SAFE_FETCH_ENFORCE on workflow runner pods

The six test files under keeperhub-executor/ (k8s-job.test.ts, runner-env.test.ts, startup-checks.test.ts, api-execute.test.ts, lib/metrics-shipping.test.ts, lib/ship-metrics.test.ts - 44 tests total) have not been running in CI. They are discoverable by the root vitest config, but the pr-checks test-unit job invokes `pnpm test:unit` which is `vitest run tests/unit` - a positional path filter that excludes everything outside tests/unit/. Adds a new `test:executor` package.json script (`vitest run keeperhub-executor`) and a follow-on step in the test-unit job that invokes it. The new step runs in the same job as the existing unit tests so it reuses the Node + pnpm setup and does not add a fresh runner; failure attribution stays clean because GitHub Actions reports each step separately. Verified locally: `pnpm test:executor` discovers and passes all 44 tests in ~10s. Why not a separate workflow file like pr-checks-scheduler.yml or pr-checks-events.yml: keeperhub-scheduler and keeperhub-events are separate pnpm workspaces with their own lockfiles, tsconfigs, and vitest configs. keeperhub-executor is part of the root workspace and shares dependencies, so a follow-on step in the existing job is the lighter-weight match. Why hardcoded as a separate script rather than expanding test:unit: `test:unit` is reused by the sandbox-remote job (`pnpm test:unit tests/unit/code-run-code.test.ts`) which appends a path arg to narrow the run. Expanding test:unit's base path to include keeperhub-executor would broaden that job too. Keeping the executor invocation in its own script preserves the existing test:unit semantics.

Mirrors the pr-checks-events.yml pattern: a standalone workflow file that fires only when keeperhub-executor/** (or the workflow itself) change, instead of an extra step on every PR's main test-unit job. The previous commit's approach (a follow-on step in pr-checks.yml test-unit) ran the executor suite on every PR regardless of whether any executor code changed, which wastes runner time and clutters the status checks UI. The dedicated workflow is cheap to skip when nothing matches the path filter. Differences from pr-checks-events.yml: events is a separate pnpm workspace with its own lockfile and setup-events-deps action; the executor lives in the root workspace, so the existing setup-node-pnpm action is reused. Executor tests are pure unit (no docker, no anvil), so only a single test-unit job is needed - no test-integration mirror. Verified locally: pnpm test:executor still passes 44/44. The test:executor package.json script added in the previous commit is retained because it is the entry point this workflow invokes.

…ntain permissions' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

…n-ci ci: run keeperhub-executor unit tests in pr-checks

Spawn the grandchild via node -e and assert await fetch(...) is rejected for every blocklist category (IMDSv2, RFC 1918, loopback, ULA, link-local, Teredo, 6to4, doc range, site-local NAT64, multicast, and pre-DNS hostname patterns). Existing data-only tests proved the list was inlined but not that the guard actually fires. Also tighten CIDR data tests to assert the full tuple, add a guard that locks child-source.ts to a single data-only import, and list the JSON dependency in sandbox/tsconfig.json.

…-parity hotfix: single source of truth for SSRF blocklist

… kill switch (#1335)

suisuss added 7 commits May 21, 2026 21:04

Merge pull request #1336 from KeeperHub/hotfix/KEEP-603-runner-safe-f…

0980c84

…etch-forward fix: force-enable SAFE_FETCH_ENFORCE on workflow runner pods

suisuss temporarily deployed to staging May 21, 2026 12:53 — with GitHub Actions Inactive

suisuss and others added 3 commits May 21, 2026 22:58

Potential fix for pull request finding 'CodeQL / Workflow does not co…

2e40edb

…ntain permissions' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

suisuss temporarily deployed to staging May 21, 2026 13:04 — with GitHub Actions Inactive

Merge pull request #1338 from KeeperHub/chore/enable-executor-tests-i…

7a45727

…n-ci ci: run keeperhub-executor unit tests in pr-checks

suisuss temporarily deployed to staging May 21, 2026 13:05 — with GitHub Actions Inactive

suisuss temporarily deployed to staging May 21, 2026 13:20 — with GitHub Actions Inactive

suisuss had a problem deploying to staging May 21, 2026 13:20 — with GitHub Actions Failure

suisuss had a problem deploying to staging May 21, 2026 13:42 — with GitHub Actions Failure

Merge branch 'staging' into hotfix/KEEP-603-sandbox-guard-parity

7c22025

suisuss had a problem deploying to staging May 21, 2026 14:03 — with GitHub Actions Failure

joelorzet added 2 commits May 21, 2026 11:13

Merge pull request #1334 from KeeperHub/hotfix/KEEP-603-sandbox-guard…

0c270ca

…-parity hotfix: single source of truth for SSRF blocklist

joelorzet temporarily deployed to staging May 21, 2026 14:27 — with GitHub Actions Inactive

joelorzet had a problem deploying to staging May 21, 2026 14:31 — with GitHub Actions Failure

joelorzet had a problem deploying to staging May 21, 2026 14:33 — with GitHub Actions Failure

joelorzet temporarily deployed to staging May 21, 2026 14:39 — with GitHub Actions Inactive

hotfix(security): defence-in-depth - NetworkPolicy, AI Prompt off, DB…

90f2d7a

… kill switch (#1335)

OleksandrUA temporarily deployed to staging May 21, 2026 14:57 — with GitHub Actions Inactive

OleksandrUA temporarily deployed to staging May 21, 2026 15:10 — with GitHub Actions Inactive

OleksandrUA temporarily deployed to staging May 21, 2026 15:11 — with GitHub Actions Inactive

joelorzet merged commit 28f8c91 into prod May 21, 2026
37 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: To Prod#1337

release: To Prod#1337
joelorzet merged 15 commits into
prodfrom
staging

suisuss commented May 21, 2026 •

edited by OleksandrUA

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

suisuss commented May 21, 2026 • edited by OleksandrUA Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Security incident 2026-05-21 follow-up (PR #1335 merged into staging)

Other content

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

suisuss commented May 21, 2026 •

edited by OleksandrUA

Loading