Scope Prometheus dashboard queries by environment#95
Merged
Conversation
Production and staging share one Prometheus, so the uptime and probe-status dashboards, the live status page, and the rollup job all showed production data on staging. Filter every probe metric query by the environment label the OTel collector stamps (production/staging), leaving local/test queries unscoped.
Local metrics now carry an environment label too (development), via the relabel_configs on the dev Prometheus scrape and the seeded series, so the queries no longer need to special-case local.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Production and staging share a single Prometheus instance, but the engine's metric queries never filtered by environment. So on staging, the uptime dashboard, probe-status dashboard, public status page, and the daily rollup job all showed production data.
The OTel collector already stamps every probe series with an
environmentlabel (productionby default,stagingfor the staging Kamal destination) — the queries just weren't using it.Fix
Added
Upright.metrics_environment/Upright.environment_matcherand applied the matcher to every probe-metric query:Probes::Uptime(uptime dashboard)Probes::Status(probe-status dashboard)Services::LiveStatus(public status page)Rollups::ProbeRollup(daily rollup job)Local/test queries stay unscoped (
Rails.env.local?), since local metrics carry noenvironmentlabel.Deploy ordering
The Uptime / LiveStatus / ProbeRollup queries read the
upright:probe_uptime_daily/upright:probe_down_fractionrecording rules, which are currently hardcoded toenvironment="production"and drop the label. Those rules must be updated to emit per-environment series (inapp_prometheus_scraper) before/with this change, or those queries will match nothing in production. The probe-status dashboard reads the rawupright_probe_upmetric and is correct immediately.