release: to prod by joelorzet · Pull Request #1455 · KeeperHub/keeperhub

joelorzet · 2026-06-03T22:26:16Z

No description provided.

Raw fetch() in plugin step and connection-test files bypassed the SSRF guard in lib/safe-fetch.ts, so a user- or attacker-controlled destination could reach link-local (169.254.169.254) and RFC1918 hosts even with SAFE_FETCH_ENFORCE on. Replace every bare fetch() under plugins/ with safeFetch(url, { plugin }) so the guard validates each request and blocks attribute to the calling plugin. Prioritized the two user-controlled-host paths: the discord webhookUrl and the blockscout custom-instance URL. Also harden the discord webhook check: the old substring match on "discord.com/api/webhooks/" passed a URL that carried that string in its path while pointing the host at an internal address (e.g. http://169.254.169.254/discord.com/api/webhooks/x). Validate by parsed hostname over https with an /api/webhooks/ path instead. Update blockscout-steps unit tests to mock safeFetch, and add a discord send-message test covering the bypass URLs and the safeFetch wiring.

Add a lint-job step that git-greps for bare fetch()/axios/http.request under plugins/ (excluding *.test.ts, *.md, *.txt) and fails the build, so plugin egress cannot regress off safeFetch. Pattern avoids \b so it runs the same on the BSD and GNU regex engines.

Update the plugin generator template, authoring guides, and agent/command definitions to use safeFetch(url, { plugin }) instead of raw fetch(), so newly created plugins route egress through the SSRF guard and pass the new CI egress check by default.

The build failed because lib/safe-fetch.ts is "server-only" and the connection-test files (plugins/*/test.ts) are reachable from the client-bundled plugin registry (plugins/index.ts), so importing safeFetch into them pulled server-only into the client graph. Connection tests run server-side but cannot import the server-only guard from the client-reachable registry. Revert the nine test.ts files to the raw fetch global, exclude plugins/*/test.ts from the egress CI guard, and note the step-file vs connection-test distinction in the scaffolding and authoring guides. Workflow step files (the SSRF surface this ticket targets) remain routed through safeFetch and are unaffected.

Connection-test files (plugins/*/test.ts) are reachable from the client-bundled plugin registry, so they cannot import the server-only SSRF guard and fetch user-supplied instance URLs (e.g. blockscout's BLOCKSCOUT_API_URL) with raw fetch. A user could point that at 169.254.169.254 or an RFC1918 host and use the Test Connection result as a reachability oracle. Validate every user-supplied url-type field on the server in handlePluginTest, before the test runs, via assertUrlIsPublic -- mirroring the assertConnectionUrlIsPublic pre-flight already used for database connection tests. Generic across all plugins; no change to the client-reachable test.ts files.

Document, in the handlePluginTest guard comment and the plugin authoring guide, that assertUrlIsPublic is always-on (it does not honor SAFE_FETCH_ENFORCE), so Test Connection blocks a localhost/private instance URL in local dev even though workflow execution against it works under safeFetch shadow mode. Also normalize prose to avoid double hyphens.

…workpolicy comments This is a public repo. The networkpolicy.yaml comments described the staging/prod VPC ranges, DB-subnet CIDRs by AZ, internal file paths, env var names, and the egress threat model - unnecessary network-recon detail for a public file. Remove all comments; the policy spec is unchanged (the applied object is byte-identical), so there is no functional or deploy impact. Rationale now lives only in the private infra runbook.

…omments chore(deploy): removing comments from documentation references

Runner Job pods previously ran with serviceAccountName: default, a mounted SA token, an empty securityContext (root, gid=0), and high-value secrets passed as literal env values. This widened blast radius if a workflow step was compromised. Changes (TECH-29): - Dedicated zero-RBAC ServiceAccount keeperhub-workflow-runner with no RoleBinding and automountServiceAccountToken: false (manifests applied by deploy-executor.yaml, mirroring the sandbox SA pattern). - Pod securityContext: runAsNonRoot, runAsUser/Group 1000, fsGroup 1000, seccompProfile RuntimeDefault. Container securityContext: allowPrivilegeEscalation false, readOnlyRootFilesystem true, capabilities drop ALL. Writable /tmp emptyDir + TMPDIR so tsx/esbuild still works under the read-only root. - All 12 high-value credentials now reach the runner via valueFrom.secretKeyRef against the executor's External Secrets-synced Secrets instead of literal values; secret-valued vars are no longer relayed as plaintext through the controller env. DATABASE_URL and INTEGRATION_ENCRYPTION_KEY are required refs, the rest optional.

The PR environment is a different topology than staging: the executor is helm release executor-pr-<N> and DATABASE_URL is a kv literal pointing at the ephemeral per-PR Postgres. Without extra wiring the hardened runner cannot come up there: - the dedicated keeperhub-workflow-runner SA does not exist in pr-<N> (staging creates it via deploy-executor.yaml, never run for PR envs) - the runner's secret refs target keeperhub-executor-common-<slug>, but the PR executor's secrets are executor-pr-<N>-common-<slug> - DATABASE_URL has no chart-managed Secret since it is kv in the PR env Wiring added: - runner-sa.template.yaml: per-PR zero-RBAC runner SA, applied in the deploy job's prerequisite block - executor.template.yaml: RUNNER_SERVICE_ACCOUNT + RUNNER_SECRET_PREFIX (executor-pr-<N>-common) so the runner resolves the PR namespace's SA and secrets - deploy job creates executor-pr-<N>-common-db-url from the same ephemeral connection string, so the runner's required DATABASE_URL secretKeyRef resolves

…r-job-pods feat(executor): harden workflow-runner Job pods (dedicated SA, no token, non-root, secret refs)

Adds an ioredis-backed client for the app tier as a cross-replica coordination layer. getRedis() returns null when REDIS_HOST or REDIS_PASSWORD is unset and never opens an unauthenticated connection, so callers degrade to local behaviour and a Redis outage cannot block a request. Also adds a central Redis key registry.

… replicas The IP gate emailed the account owner the first time a request arrived from an untrusted network, deduped by a per-pod in-memory set. With more than one replica each pod deduped independently, so a single new network produced one email per pod. The dedup now uses a Redis SET NX claim with a 7 day TTL so exactly one replica sends per network. The notification runs fire-and-forget and never blocks the gate; when Redis is unavailable it skips the courtesy email rather than risk one send per request.

A per-pod cache of untrusted gate results outlived the moment another replica recorded the IP as trusted via verify-ip, so a verified user was bounced back through verification on a stale pod. Untrusted is now read from the database on every request, the shared source of truth, so any replica sees a freshly trusted IP immediately. Trusted results stay cached per pod because trust only ever grows.

Sources REDIS_HOST, REDIS_PORT, and REDIS_PASSWORD from the keeperhub SSM namespace for staging, prod, and PR environments.

staging and every PR environment share one Redis instance, so keys are prefixed with a per-deployment REDIS_KEY_PREFIX to keep them isolated. NODE_ENV cannot serve this since staging and PR envs are both development. Adds a deploymentKey helper that every key builder uses, with unit coverage for the helper and the prefix.

… key prefix Reuses the redis-host, redis-port, and redis-password parameters already provisioned under the keeperhub-events SSM namespace, so no new parameters are needed. Sets REDIS_KEY_PREFIX per deployment (staging, prod, pr-PR_NUMBER).

Updates the verify-ip comment that referenced the removed UNTRUSTED_TTL_MS constant, and removes the freshUser test helper that guarded against a per-pod dedup set the notification module no longer has.

…ess-through-safefetch fix(security): route all plugin egress through safeFetch

fix(security): make new-sign-in-location email and IP gate replica-safe

Replaces the per-pod in-memory TRUST_CACHE with a read-through Redis cache keyed per deployment. The gate reads Redis first and falls back to Postgres, the durable source of truth, on a miss, re-caching the result so the next request on any replica is a hit. /verify-ip warms the cache after the upsert so a just-verified IP is trusted fleet-wide immediately. Redis is best-effort: a missing or failing Redis falls straight through to Postgres, so the gate never fails and no one is locked out. Untrusted is still never cached. Removing the per-pod Map ends the cross-replica inconsistency where one pod cached a result the next pod did not have.

refactor(security): back the IP-trust gate with a shared Redis cache

joelorzet and others added 22 commits June 2, 2026 21:27

Merge pull request #1450 from KeeperHub/TECH-30-strip-networkpolicy-c…

59f359e

…omments chore(deploy): removing comments from documentation references

Merge pull request #1451 from KeeperHub/TECH-29-harden-workflow-runne…

a82d463

…r-job-pods feat(executor): harden workflow-runner Job pods (dedicated SA, no token, non-root, secret refs)

chore(deploy): wire REDIS_* env into app values

0d0dad2

Sources REDIS_HOST, REDIS_PORT, and REDIS_PASSWORD from the keeperhub SSM namespace for staging, prod, and PR environments.

chore(security): drop stale untrusted-cache comment and dead test helper

3a00f45

Updates the verify-ip comment that referenced the removed UNTRUSTED_TTL_MS constant, and removes the freshUser test helper that guarded against a per-pod dedup set the notification module no longer has.

Merge pull request #1441 from KeeperHub/fix/KEEP-674-route-plugin-egr…

5a1f7b6

…ess-through-safefetch fix(security): route all plugin egress through safeFetch

Merge pull request #1453 from KeeperHub/fix/ip-gate-cross-pod-dedup

014db1f

fix(security): make new-sign-in-location email and IP gate replica-safe

Merge pull request #1454 from KeeperHub/fix/redis-shared-trust-cache

08ad64f

refactor(security): back the IP-trust gate with a shared Redis cache

joelorzet temporarily deployed to staging June 3, 2026 22:26 — with GitHub Actions Inactive

joelorzet requested review from a team, OleksandrUA, eskp and suisuss and removed request for a team June 3, 2026 22:26

joelorzet merged commit cdfa2dd into prod Jun 3, 2026
25 checks passed

OleksandrUA mentioned this pull request Jun 4, 2026

release: promote staging to prod #1458

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: to prod#1455

release: to prod#1455
joelorzet merged 22 commits into
prodfrom
staging

joelorzet commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

joelorzet commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants