release: To Prod by suisuss · Pull Request #1446 · KeeperHub/keeperhub

suisuss · 2026-06-03T08:33:13Z

No description provided.

…bstitutions Legacy-mode assertResolved previously called logSystemError, which paged on-call via Sentry and bumped the generic engine-errors counter. That made the would-have-aborted exposure question unanswerable from a dashboard without alert noise. Switch the legacy branch to logSystemWarn and emit keeperhub_template_resolve_legacy_substitution_total once per call, with action_type and reason labels (single reason or "mixed"). Dashboards can then run sum by (action_type, reason) (rate(...)) to quantify the exposure window before the env var is hard-deprecated.

KEEPERHUB_TEMPLATE_RESOLVE_MODE was never set in any deployment config, Helm value, or .env file. Strict has been the default since KEEP-468 (PR #1187) with no observed regressions, so the staged migration window the env var was protecting is no longer needed. Delete the env var read, the legacy branch in assertResolved, the TemplateResolveMode type, the dedicated exposure counter added in the previous commit (now dead code), and the legacy-mode test cases. Update the user-facing error message and templating doc to drop the opt-out hint. assertResolved now always throws TemplateResolutionError on any unresolved reference. Completes the KEEP-525 scope. PR #1305 stays draft until reviewed.

Replaces the bearer-token-only X-Service-Key scheme with HMAC-SHA256 request authentication bound to method, path, caller, body digest, and unix timestamp. 300-second replay window matches the agentic-wallet precedent; no server-side nonce cache for the same latency-cost reason. Signing secrets move into a new versioned internal_service_hmac_secrets table modelled on agentic_wallet_hmac_secrets: AES-256-GCM at rest, AAD-bound to (caller, key_version), 24h rotation grace via expires_at. v1 reuses AGENTIC_WALLET_HMAC_KMS_KEY (a dedicated key is a documented follow-up) and stores one shared secret under caller='*shared*' (per- producer split is a one-row follow-up). Verifier accepts both schemes during the dual-accept rollout window: - presence of any X-KH-* header commits to the HMAC path with NO fallback (prevents an attacker who knows the legacy bearer from appending a junk signature to slip through the looser path), - INTERNAL_AUTH_REQUIRE_HMAC=true rejects all legacy bearers and is the per-environment kill-switch, - X-Service-Key or X-Internal-Token falls back to the existing timingSafeEqual compare against the four *_SERVICE_API_KEY env vars plus KEEPERHUB_API_KEY. Consolidates the /api/workflows/events bespoke X-Internal-Token check into authenticateInternalService. The 500-on-misconfigured-env path is intentionally dropped because it leaks operational state to clients. All 12 internal-service consumer routes migrated to a raw-body pattern so the HMAC verifier hashes the same bytes that arrive over the wire. The workflow execute route was the highest-risk migration because it consumes the body twice; raw body is captured once at the top and parsed from the captured string thereafter. Audit log via logInternalAuthEvent fires once per request with caller, scheme, route, ip, key_version, latency. Caller field sanitized to the InternalCaller enum or 'unknown' to bound log cardinality. Logs-only for v1; Prometheus counter and Sentry breadcrumb deferred. Drizzle migration 0094 hand-written since drizzle-kit generate is broken in this repo; journal idx 94 appended with when=1779843900001 (monotonic). Verified locally: - pnpm exec biome check: clean on touched files - pnpm type-check: clean across the whole codebase - pnpm vitest run tests/unit tests/integration: 6132/6132 pass - pnpm db:migrate against the dev DB: applies cleanly, table exists with PK on (caller, key_version)

…sibility Add a pluggable geo-IP resolver with a provider fallback chain and use its coordinates to flag physically implausible logins, plus a country-coarse per-member session view for org owners and admins. - geo-IP strategy chain (ipinfo, ipapi.co, ipwho.is, freeipapi) behind one ResolvedLocation interface; keyless providers keep resolution working with no credentials configured - login-risk now records coordinates in risk_flags_json and flags impossible_travel when the velocity between consecutive sessions exceeds 800 km/h over at least 100 km, catching same-country city hops - org owners and admins can view a member's recent sessions narrowed to country plus the anomaly verdict; IP, city, and coordinates are filtered out server-side and never reach the client - shared risk-flags decoder and session formatters deduped across the self-service and admin views

…avel-admin

Make the per-member Sessions trigger and the member Remove control icon-only with accessible labels, and add a title to the member email so truncated addresses stay distinguishable. Frees horizontal room in the members panel so emails are no longer cut to ambiguity in the narrow modal.

Resolve drizzle/meta/_journal.json conflict: staging owns 0094_chain_status and 0095_keep_669_reconcile_incident_indexes; renumber the internal-service HMAC migration to 0096 (when=1779843900003) to preserve monotonic ordering. events/route.ts auto-merge verified clean (non-overlapping auth vs chains.status). type-check passes on the merged tree.

The internal-service HMAC store decrypts seeded secrets with AGENTIC_WALLET_HMAC_KMS_KEY (shared with agentic-wallet for v1). Staging and prod app values set it; the PR-environment app template did not, so a seeded internal_service_hmac_secrets row could never be decrypted and every HMAC request fell through to "No active signing secret". Mirror the staging entry so PR envs can validate the HMAC accept-path once INTERNAL_AUTH_REQUIRE_HMAC is flipped.

Standalone single-replica service that serves the DB-sourced Prometheus gauges (the former /api/metrics/db scrape) off the request-serving pods. Executor-style: build-context = repo root, imports lib/metrics verbatim via tsx, so the exposed gauge families are identical to today's endpoint. - index.ts boots a node:http server (PORT, default 9090) with graceful shutdown - server.ts serves GET /metrics (updateDbMetrics -> getDbMetrics, TTL- and pool-gated in lib/) and GET /health - server.test.ts mocks the metrics module and exercises the HTTP wiring on an ephemeral port (200 + gauges, /health, 500 on refresh failure, 404) Containerization, CI, and deploy wiring follow in later commits (TECH-6484).

Dockerfile metrics-collector stage (executor-style: reuses lib/ + root node_modules via tsx, shims server-only, serves on :9090) and a docker-bake.hcl target/group + METRICS_COLLECTOR_ECR_REPO var. Bake config validated with 'buildx bake --print'. Image build/push runs once the ECR repo (keeperhub-metrics-collector-{env}) exists and a deploy workflow wires the bake target -- following commits.

…(Stage 3) - deploy/metrics-collector/{staging,prod}/values.yaml: replicaCount 1, serves /metrics on :9090, ServiceMonitor scrapes the DB gauges from this one pod (deterministic, no hashmod). Minimal env -- only DATABASE_URL; no SQS/Turnkey. - .github/workflows/deploy-metrics-collector.yaml: standalone (events-style) trigger on staging/prod + dispatch; bakes the metrics-collector target and helm-deploys via the shared techops-services/common chart, release name keeperhub-metrics-collector, namespace keeperhub. Does NOT cut over: the app's db-metrics ServiceMonitor and /api/metrics/db route stay in place until the collector is verified in staging (Stage 4). Requires the ECR repo + TFC workspace (terraform drafted in techops_infrastructure, applied by infra) before the first deploy.

Validated by building the image: the metrics-collector stage now (1) is included in the Docker source stage (COPY keeperhub-metrics-collector/) so the build doesn't fail, and (2) drops the COPY --from=builder generated-file lines -- the metrics import graph references no builder-generated file at runtime (lib/db/schema only uses lib/types/integration via import type, which tsx erases). The stage now depends on source only, so the image builds without the expensive Next builder stage. Confirmed: image builds, prometheus-api + db-metrics import cleanly, /health 200, /metrics runs the real queries.

…avel-admin

Replace url.includes(host) substring checks in the geo-IP resolver test with exact hostname comparison, clearing four js/incomplete-url-substring- sanitization high alerts and removing the freeipapi/ipapi.co substring collision workaround.

…mpossible-travel-admin feat(security): impossible-travel detection and per-member session visibility

…-telemetry refactor(template-resolver): remove legacy template-resolution opt-out

…uild-check CI Review feedback on PR #1439 (suisuss): - #3 Startup defined twice: drop the helm command/args override from deploy/metrics-collector/{staging,prod}/values.yaml; rely on the Dockerfile CMD (tsx keeperhub-metrics-collector/index.ts) as the single source of truth. - #4 Nothing builds this stage in CI: add .github/workflows/collector-build-check.yml -- a PR smoke job that builds the metrics-collector target and runs keeperhub-metrics-collector/import-check.ts inside the image, asserting the runtime import graph resolves. Catches a future value-import of a builder-generated module (server.test.ts mocks the module, so it can't). Validated locally: image builds, import-check prints IMPORTS_OK, exit 0. - #1 ServiceMonitor port name: matches the executor pattern; will verify the actual Prometheus target in staging (replied on the PR).

Staging advanced 40 commits and added 0096_keep_683_workflow_gas_columns, colliding with our hand-numbered migration. Renumber the internal-service HMAC migration to 0097 (when=1779843900004) and keep staging's 0096. type-check passes on the merged tree.

…al-service-auth hotfix(security): harden internal service auth with HMAC

…lector-service feat(metrics): dedicated keeperhub-metrics-collector service (TECH-6484)

Marketplace indexers (thespawn) score the agent "not a callable endpoint" because their liveness probe needs an anonymous tools/list, while /mcp is OAuth-gated. Verified the top-rated agents pass by exposing an anonymous, x402-priced MCP surface; KeeperHub couldn't follow without re-exposing the org-scoped tools the recent security work locked down. Add a dedicated anonymous endpoint that serves only public, already-public data and never touches the hardened /mcp path: - app/mcp/public/route.ts: anonymous initialize + tools/list + tools/call, plus a GET health probe. Never reads Authorization; pins a non-empty org sentinel (__anon_public__) and the terminal mcp:public scope; per-IP rate limits (cf-connecting-ip), stricter on tools/call. - oauth-scopes: SCOPE_MCP_PUBLIC + hand-curated PUBLIC_TOOLS allowlist (7 read tools + call_workflow). isToolAllowed treats mcp:public as terminal deny-by-default -- never widens to read/write/admin. The allowlist excludes validate_workflow and prepare_test_pin_data (org-scoped reads that masquerade as read-only). - server: createMcpServer registers ONLY public tools under mcp:public (tools/list reflects registrations, so non-public tools must be dropped at registration, not just call-gated) and skips org-scoped resources. - rate-limit: getClientIp prefers the unspoofable cf-connecting-ip. - .well-known/mcp.json: advertise /mcp/public as the primary (anonymous) endpoint with the public tool list; keep the OAuth /mcp under authenticatedEndpoint. No onchain re-pin needed. call_workflow stays listed-only via the existing /api/mcp/workflows/[slug]/call guards (isListed + reachable; paid -> 402). /mcp is byte-for-byte unchanged. Tests: deny-by-default scope + registration filter (no catalog leak), plus route-level coverage for cross-surface session isolation, per-IP rate-limit buckets, and anon-identity pinning. Dev-verified: anon tools/list returns exactly the 9 public tools, search_workflows executes anonymously, delete_workflow is not found, non-SSE-Accept POST succeeds, /mcp still 401s.

Update the ERC-8004 agent card's agentWallet and platformFeeRecipient, the served /api/agent-registry route, and the docs example from the deprecated 0xC300...362B51 wallet to the current 0xAA70...8Fee address. The old wallet is no longer in use; the public card (and its onchain-pinned copy) and docs still advertised it, so an agent reading agentWallet / platformFeeRecipient directly, or following the doc example, could route funds to a stale address. The active x402 402 payTo is unaffected (it is built per-workflow from creatorWalletAddress, not this address). Propagation note: the onchain ERC-8004 card is IPFS-pinned; this repo change updates the served endpoints, but the pinned card requires a re-pin (pin-agent-card.ts -> new CID, update-agent-uri.ts -> setAgentURI) to propagate to indexers (8004scan/thespawn). Tracked under the security remediation epic.

hotfix(agent): rotate advertised agent wallet to current address

feat(mcp): anonymous /mcp/public marketplace endpoint

joelorzet and others added 21 commits May 19, 2026 19:25

Merge branch 'staging' into feat/KEEP-663-login-anomaly-impossible-tr…

ab36d2b

…avel-admin

Merge branch 'staging' into feat/KEEP-663-login-anomaly-impossible-tr…

012ee0a

…avel-admin

Merge pull request #1418 from KeeperHub/feat/KEEP-663-login-anomaly-i…

ab4e65d

…mpossible-travel-admin feat(security): impossible-travel detection and per-member session visibility

Merge pull request #1305 from KeeperHub/feat/keep-525-template-legacy…

283d494

…-telemetry refactor(template-resolver): remove legacy template-resolution opt-out

Merge pull request #1405 from KeeperHub/hotfix/KEEP-665-harden-intern…

5fb5b3e

…al-service-auth hotfix(security): harden internal service auth with HMAC

Merge pull request #1439 from KeeperHub/feature/TECH-6484-metrics-col…

b612334

…lector-service feat(metrics): dedicated keeperhub-metrics-collector service (TECH-6484)

suisuss temporarily deployed to staging June 3, 2026 08:33 — with GitHub Actions Inactive

eskp and others added 2 commits June 3, 2026 19:10

Merge pull request #1447 from KeeperHub/hotfix/rotate-agent-wallet

11fbba7

hotfix(agent): rotate advertised agent wallet to current address

eskp had a problem deploying to staging June 3, 2026 09:43 — with GitHub Actions Failure

eskp temporarily deployed to staging June 3, 2026 09:43 — with GitHub Actions Inactive

Merge pull request #1445 from KeeperHub/fix/mcp-marketplace-anon-probes

1f0c3ad

feat(mcp): anonymous /mcp/public marketplace endpoint

eskp temporarily deployed to staging June 3, 2026 09:44 — with GitHub Actions Inactive

eskp temporarily deployed to staging June 3, 2026 09:57 — with GitHub Actions Inactive

suisuss merged commit 5baf490 into prod Jun 3, 2026
35 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: To Prod#1446

release: To Prod#1446
suisuss merged 24 commits into
prodfrom
staging

suisuss commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

suisuss commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants