release: promote staging to prod#1458
Merged
Merged
Conversation
…ing (TECH-6484)
Stage 4 (cutover):
- Gate the app's /api/metrics/db route to 404 via METRICS_DB_OFFLOADED so the
heavy aggregate scan never runs on the request-serving pods.
- Remove the db-metrics ServiceMonitor from deploy/keeperhub/{staging,prod} and
set METRICS_DB_OFFLOADED=true. /api/metrics/api is unchanged.
Stage 5 (PR-env wiring + docs):
- deploy/pr-environment/metrics-collector.template.yaml (single replica, PR DB,
ServiceMonitor off).
- deploy-pr-environment.yaml: opt-in deploy-pr-metrics label -> build-collector-image
job + a gated deploy step. Default off, so existing PR envs are unaffected.
- METRICS_REFERENCE.md note on the collector + offload.
Depends on the collector being live + verified in staging (PR #1439). Cutover
must merge only after that, else a DB-metrics gap.
…or lands Found during B-now validation: adding deploy-pr-metrics to an already-deployed PR built the collector image but did not deploy it - the collector deploy step sits inside the should-deploy-gated deploy job. Set should-deploy=true on the metrics-only path (mirroring deploy-pr-executor) so the deploy re-runs and the collector step executes. The both-labels path was already correct.
The OAuth-MFA finalize and TOTP enrollment routes mint sessions directly and derived ip_address from the leftmost X-Forwarded-For hop, which is caller-controlled and can be rewritten by intermediate hops, so a subset of sessions stored an unreliable address rather than the real client IP. Better Auth's own session writes already resolve CF-Connecting-IP, but these direct writes bypassed that. Extract the existing CF-aware resolver in login-risk into a shared resolveClientIpFromHeaders helper and use it in both routes. In production only CF-Connecting-IP is trusted; outside production X-Forwarded-For then X-Real-IP stay as local-dev fallbacks. Sessions now store the attested client IP or null.
…utover feat(metrics): cut over /api/metrics/db to the collector + PR-env wiring (TECH-6484)
…ip-better-auth-sessions fix: resolve client IP via CF-Connecting-IP on direct session writes
Staging requested 1m CPU while prod requests 100m. The Grafana alert "KeeperHub High CPU Usage (Staging)" evaluates cpu_usage / cpu_request > 2 per container, so a 1m request made idle pods (~15-25m CPU) sit permanently at 15-25x and re-fire the P3 on every rollout. 14d per-pod usage (5m-rate): avg 19m, p95 29m, p99 48m, max 109m. Setting the request to 100m matches prod, covers p99 with headroom, and moves the alert bar to 2x = 200m (above the 109m observed max).
…mmon-cpu-request fix: align staging keeperhub-common CPU request with prod
The runner Job pins runAsUser/Group to 1000 (RUNNER_UID/RUNNER_GID in keeperhub-executor/k8s-job.ts), which only works because node:*-alpine ships a "node" user at 1000 and the copied app files are world-readable. That coupling spans two files that change independently: the Dockerfile picks the base image, the executor hardcodes the UID. Add cross- referencing comments in both so swapping the runner base image triggers a check of UID 1000 (or an update to RUNNER_UID/RUNNER_GID). Comment-only, no behavior change.
…ling-note docs(executor): note runner UID 1000 is coupled to the node base image
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Release: staging -> prod
Promotes the current
stagingHEAD toprod(10 commits, 4 PRs).Included
/api/metrics/dbto the collector + PR-env wiringCF-Connecting-IPon direct session writeskeeperhub-commonCPU request with prodNotes for the reviewer
/api/metrics/dbcollector cutover. Worth a look from that owner before merge.stagingand were validated there.Deploy
Merging this triggers the prod CI pipeline (build -> deploy) on the
prodbranch.