Conversation
Raw fetch() in plugin step and connection-test files bypassed the SSRF
guard in lib/safe-fetch.ts, so a user- or attacker-controlled destination
could reach link-local (169.254.169.254) and RFC1918 hosts even with
SAFE_FETCH_ENFORCE on. Replace every bare fetch() under plugins/ with
safeFetch(url, { plugin }) so the guard validates each request and blocks
attribute to the calling plugin.
Prioritized the two user-controlled-host paths: the discord webhookUrl and
the blockscout custom-instance URL. Also harden the discord webhook check:
the old substring match on "discord.com/api/webhooks/" passed a URL that
carried that string in its path while pointing the host at an internal
address (e.g. http://169.254.169.254/discord.com/api/webhooks/x). Validate
by parsed hostname over https with an /api/webhooks/ path instead.
Update blockscout-steps unit tests to mock safeFetch, and add a discord
send-message test covering the bypass URLs and the safeFetch wiring.
Add a lint-job step that git-greps for bare fetch()/axios/http.request under plugins/ (excluding *.test.ts, *.md, *.txt) and fails the build, so plugin egress cannot regress off safeFetch. Pattern avoids \b so it runs the same on the BSD and GNU regex engines.
Update the plugin generator template, authoring guides, and agent/command
definitions to use safeFetch(url, { plugin }) instead of raw fetch(), so
newly created plugins route egress through the SSRF guard and pass the
new CI egress check by default.
The build failed because lib/safe-fetch.ts is "server-only" and the connection-test files (plugins/*/test.ts) are reachable from the client-bundled plugin registry (plugins/index.ts), so importing safeFetch into them pulled server-only into the client graph. Connection tests run server-side but cannot import the server-only guard from the client-reachable registry. Revert the nine test.ts files to the raw fetch global, exclude plugins/*/test.ts from the egress CI guard, and note the step-file vs connection-test distinction in the scaffolding and authoring guides. Workflow step files (the SSRF surface this ticket targets) remain routed through safeFetch and are unaffected.
Connection-test files (plugins/*/test.ts) are reachable from the client-bundled plugin registry, so they cannot import the server-only SSRF guard and fetch user-supplied instance URLs (e.g. blockscout's BLOCKSCOUT_API_URL) with raw fetch. A user could point that at 169.254.169.254 or an RFC1918 host and use the Test Connection result as a reachability oracle. Validate every user-supplied url-type field on the server in handlePluginTest, before the test runs, via assertUrlIsPublic -- mirroring the assertConnectionUrlIsPublic pre-flight already used for database connection tests. Generic across all plugins; no change to the client-reachable test.ts files.
Document, in the handlePluginTest guard comment and the plugin authoring guide, that assertUrlIsPublic is always-on (it does not honor SAFE_FETCH_ENFORCE), so Test Connection blocks a localhost/private instance URL in local dev even though workflow execution against it works under safeFetch shadow mode. Also normalize prose to avoid double hyphens.
…workpolicy comments This is a public repo. The networkpolicy.yaml comments described the staging/prod VPC ranges, DB-subnet CIDRs by AZ, internal file paths, env var names, and the egress threat model - unnecessary network-recon detail for a public file. Remove all comments; the policy spec is unchanged (the applied object is byte-identical), so there is no functional or deploy impact. Rationale now lives only in the private infra runbook.
…omments chore(deploy): removing comments from documentation references
Runner Job pods previously ran with serviceAccountName: default, a mounted SA token, an empty securityContext (root, gid=0), and high-value secrets passed as literal env values. This widened blast radius if a workflow step was compromised. Changes (TECH-29): - Dedicated zero-RBAC ServiceAccount keeperhub-workflow-runner with no RoleBinding and automountServiceAccountToken: false (manifests applied by deploy-executor.yaml, mirroring the sandbox SA pattern). - Pod securityContext: runAsNonRoot, runAsUser/Group 1000, fsGroup 1000, seccompProfile RuntimeDefault. Container securityContext: allowPrivilegeEscalation false, readOnlyRootFilesystem true, capabilities drop ALL. Writable /tmp emptyDir + TMPDIR so tsx/esbuild still works under the read-only root. - All 12 high-value credentials now reach the runner via valueFrom.secretKeyRef against the executor's External Secrets-synced Secrets instead of literal values; secret-valued vars are no longer relayed as plaintext through the controller env. DATABASE_URL and INTEGRATION_ENCRYPTION_KEY are required refs, the rest optional.
The PR environment is a different topology than staging: the executor is helm release executor-pr-<N> and DATABASE_URL is a kv literal pointing at the ephemeral per-PR Postgres. Without extra wiring the hardened runner cannot come up there: - the dedicated keeperhub-workflow-runner SA does not exist in pr-<N> (staging creates it via deploy-executor.yaml, never run for PR envs) - the runner's secret refs target keeperhub-executor-common-<slug>, but the PR executor's secrets are executor-pr-<N>-common-<slug> - DATABASE_URL has no chart-managed Secret since it is kv in the PR env Wiring added: - runner-sa.template.yaml: per-PR zero-RBAC runner SA, applied in the deploy job's prerequisite block - executor.template.yaml: RUNNER_SERVICE_ACCOUNT + RUNNER_SECRET_PREFIX (executor-pr-<N>-common) so the runner resolves the PR namespace's SA and secrets - deploy job creates executor-pr-<N>-common-db-url from the same ephemeral connection string, so the runner's required DATABASE_URL secretKeyRef resolves
…r-job-pods feat(executor): harden workflow-runner Job pods (dedicated SA, no token, non-root, secret refs)
Adds an ioredis-backed client for the app tier as a cross-replica coordination layer. getRedis() returns null when REDIS_HOST or REDIS_PASSWORD is unset and never opens an unauthenticated connection, so callers degrade to local behaviour and a Redis outage cannot block a request. Also adds a central Redis key registry.
… replicas The IP gate emailed the account owner the first time a request arrived from an untrusted network, deduped by a per-pod in-memory set. With more than one replica each pod deduped independently, so a single new network produced one email per pod. The dedup now uses a Redis SET NX claim with a 7 day TTL so exactly one replica sends per network. The notification runs fire-and-forget and never blocks the gate; when Redis is unavailable it skips the courtesy email rather than risk one send per request.
A per-pod cache of untrusted gate results outlived the moment another replica recorded the IP as trusted via verify-ip, so a verified user was bounced back through verification on a stale pod. Untrusted is now read from the database on every request, the shared source of truth, so any replica sees a freshly trusted IP immediately. Trusted results stay cached per pod because trust only ever grows.
Sources REDIS_HOST, REDIS_PORT, and REDIS_PASSWORD from the keeperhub SSM namespace for staging, prod, and PR environments.
staging and every PR environment share one Redis instance, so keys are prefixed with a per-deployment REDIS_KEY_PREFIX to keep them isolated. NODE_ENV cannot serve this since staging and PR envs are both development. Adds a deploymentKey helper that every key builder uses, with unit coverage for the helper and the prefix.
… key prefix Reuses the redis-host, redis-port, and redis-password parameters already provisioned under the keeperhub-events SSM namespace, so no new parameters are needed. Sets REDIS_KEY_PREFIX per deployment (staging, prod, pr-PR_NUMBER).
Updates the verify-ip comment that referenced the removed UNTRUSTED_TTL_MS constant, and removes the freshUser test helper that guarded against a per-pod dedup set the notification module no longer has.
…ess-through-safefetch fix(security): route all plugin egress through safeFetch
fix(security): make new-sign-in-location email and IP gate replica-safe
Replaces the per-pod in-memory TRUST_CACHE with a read-through Redis cache keyed per deployment. The gate reads Redis first and falls back to Postgres, the durable source of truth, on a miss, re-caching the result so the next request on any replica is a hit. /verify-ip warms the cache after the upsert so a just-verified IP is trusted fleet-wide immediately. Redis is best-effort: a missing or failing Redis falls straight through to Postgres, so the gate never fails and no one is locked out. Untrusted is still never cached. Removing the per-pod Map ends the cross-replica inconsistency where one pod cached a result the next pod did not have.
refactor(security): back the IP-trust gate with a shared Redis cache
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.