release: promote staging to prod by OleksandrUA · Pull Request #1458 · KeeperHub/keeperhub

OleksandrUA · 2026-06-04T15:19:06Z

Release: staging -> prod

Promotes the current staging HEAD to prod (10 commits, 4 PRs).

Included

PR	Change	Type
#1442	TECH-6484 - cut over `/api/metrics/db` to the collector + PR-env wiring	functional
#1452	KEEP-676 - resolve client IP via `CF-Connecting-IP` on direct session writes	fix
#1456	KEEP-713 - align staging `keeperhub-common` CPU request with prod	chore
#1457	TECH-29 - comment noting runner UID 1000 is coupled to the node base image	docs (no runtime change)

Notes for the reviewer

The workflow-runner pod hardening (TECH-29, feat(executor): harden workflow-runner Job pods (dedicated SA, no token, non-root, secret refs) #1451) is already live on prod (shipped in release release: to prod #1455). This release only carries the follow-up doc comment for it.
The one functional change here is TECH-6484 (feat(metrics): cut over /api/metrics/db to the collector + PR-env wiring (TECH-6484) #1442) - the /api/metrics/db collector cutover. Worth a look from that owner before merge.
All four PRs already passed their checks on merge to staging and were validated there.

Deploy

Merging this triggers the prod CI pipeline (build -> deploy) on the prod branch.

…ing (TECH-6484) Stage 4 (cutover): - Gate the app's /api/metrics/db route to 404 via METRICS_DB_OFFLOADED so the heavy aggregate scan never runs on the request-serving pods. - Remove the db-metrics ServiceMonitor from deploy/keeperhub/{staging,prod} and set METRICS_DB_OFFLOADED=true. /api/metrics/api is unchanged. Stage 5 (PR-env wiring + docs): - deploy/pr-environment/metrics-collector.template.yaml (single replica, PR DB, ServiceMonitor off). - deploy-pr-environment.yaml: opt-in deploy-pr-metrics label -> build-collector-image job + a gated deploy step. Default off, so existing PR envs are unaffected. - METRICS_REFERENCE.md note on the collector + offload. Depends on the collector being live + verified in staging (PR #1439). Cutover must merge only after that, else a DB-metrics gap.

Consistency with the #1439 review (#3): rely on the Dockerfile CMD, no helm command/args override. Matches deploy/metrics-collector/{staging,prod}.

…or lands Found during B-now validation: adding deploy-pr-metrics to an already-deployed PR built the collector image but did not deploy it - the collector deploy step sits inside the should-deploy-gated deploy job. Set should-deploy=true on the metrics-only path (mirroring deploy-pr-executor) so the deploy re-runs and the collector step executes. The both-labels path was already correct.

The OAuth-MFA finalize and TOTP enrollment routes mint sessions directly and derived ip_address from the leftmost X-Forwarded-For hop, which is caller-controlled and can be rewritten by intermediate hops, so a subset of sessions stored an unreliable address rather than the real client IP. Better Auth's own session writes already resolve CF-Connecting-IP, but these direct writes bypassed that. Extract the existing CF-aware resolver in login-risk into a shared resolveClientIpFromHeaders helper and use it in both routes. In production only CF-Connecting-IP is trusted; outside production X-Forwarded-For then X-Real-IP stay as local-dev fallbacks. Sessions now store the attested client IP or null.

…utover feat(metrics): cut over /api/metrics/db to the collector + PR-env wiring (TECH-6484)

…ip-better-auth-sessions fix: resolve client IP via CF-Connecting-IP on direct session writes

Staging requested 1m CPU while prod requests 100m. The Grafana alert "KeeperHub High CPU Usage (Staging)" evaluates cpu_usage / cpu_request > 2 per container, so a 1m request made idle pods (~15-25m CPU) sit permanently at 15-25x and re-fire the P3 on every rollout. 14d per-pod usage (5m-rate): avg 19m, p95 29m, p99 48m, max 109m. Setting the request to 100m matches prod, covers p99 with headroom, and moves the alert bar to 2x = 200m (above the 109m observed max).

…mmon-cpu-request fix: align staging keeperhub-common CPU request with prod

The runner Job pins runAsUser/Group to 1000 (RUNNER_UID/RUNNER_GID in keeperhub-executor/k8s-job.ts), which only works because node:*-alpine ships a "node" user at 1000 and the copied app files are world-readable. That coupling spans two files that change independently: the Dockerfile picks the base image, the executor hardcodes the UID. Add cross- referencing comments in both so swapping the runner base image triggers a check of UID 1000 (or an update to RUNNER_UID/RUNNER_GID). Comment-only, no behavior change.

…ling-note docs(executor): note runner UID 1000 is coupled to the node base image

chong-techops and others added 10 commits June 3, 2026 10:30

chore(metrics): drop startup override from PR-env collector template

1e6d2eb

Consistency with the #1439 review (#3): rely on the Dockerfile CMD, no helm command/args override. Matches deploy/metrics-collector/{staging,prod}.

Merge pull request #1442 from KeeperHub/feature/TECH-6484-collector-c…

76dc769

…utover feat(metrics): cut over /api/metrics/db to the collector + PR-env wiring (TECH-6484)

Merge pull request #1452 from KeeperHub/KEEP-676-capture-real-client-…

ff119b0

…ip-better-auth-sessions fix: resolve client IP via CF-Connecting-IP on direct session writes

Merge pull request #1456 from KeeperHub/KEEP-713-staging-keeperhub-co…

bfe552f

…mmon-cpu-request fix: align staging keeperhub-common CPU request with prod

Merge pull request #1457 from KeeperHub/TECH-29-runner-uid-image-coup…

81ee38a

…ling-note docs(executor): note runner UID 1000 is coupled to the node base image

OleksandrUA temporarily deployed to staging June 4, 2026 15:19 — with GitHub Actions Inactive

OleksandrUA temporarily deployed to staging June 4, 2026 15:23 — with GitHub Actions Inactive

OleksandrUA added the metrics-db-reviewed Reviewer sign-off: metrics aggregate queries optimised + tables indexed (KEEP-680) label Jun 4, 2026

OleksandrUA temporarily deployed to staging June 4, 2026 15:25 — with GitHub Actions Inactive

OleksandrUA merged commit 8dc708f into prod Jun 4, 2026
37 of 38 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: promote staging to prod#1458

release: promote staging to prod#1458
OleksandrUA merged 10 commits into
prodfrom
staging

OleksandrUA commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

OleksandrUA commented Jun 4, 2026

Release: staging -> prod

Included

Notes for the reviewer

Deploy

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants