Skip to content

Scope Prometheus dashboard queries by environment#95

Merged
lewispb merged 4 commits into
mainfrom
scope-metrics-queries-by-environment
Jun 29, 2026
Merged

Scope Prometheus dashboard queries by environment#95
lewispb merged 4 commits into
mainfrom
scope-metrics-queries-by-environment

Conversation

@lewispb

@lewispb lewispb commented Jun 29, 2026

Copy link
Copy Markdown
Member

Problem

Production and staging share a single Prometheus instance, but the engine's metric queries never filtered by environment. So on staging, the uptime dashboard, probe-status dashboard, public status page, and the daily rollup job all showed production data.

The OTel collector already stamps every probe series with an environment label (production by default, staging for the staging Kamal destination) — the queries just weren't using it.

Fix

Added Upright.metrics_environment / Upright.environment_matcher and applied the matcher to every probe-metric query:

  • Probes::Uptime (uptime dashboard)
  • Probes::Status (probe-status dashboard)
  • Services::LiveStatus (public status page)
  • Rollups::ProbeRollup (daily rollup job)

Local/test queries stay unscoped (Rails.env.local?), since local metrics carry no environment label.

Deploy ordering

The Uptime / LiveStatus / ProbeRollup queries read the upright:probe_uptime_daily / upright:probe_down_fraction recording rules, which are currently hardcoded to environment="production" and drop the label. Those rules must be updated to emit per-environment series (in app_prometheus_scraper) before/with this change, or those queries will match nothing in production. The probe-status dashboard reads the raw upright_probe_up metric and is correct immediately.

Production and staging share one Prometheus, so the uptime and
probe-status dashboards, the live status page, and the rollup job all
showed production data on staging. Filter every probe metric query by
the environment label the OTel collector stamps (production/staging),
leaving local/test queries unscoped.
Copilot AI review requested due to automatic review settings June 29, 2026 11:27

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

Copilot AI review requested due to automatic review settings June 29, 2026 11:35

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

Local metrics now carry an environment label too (development), via the
relabel_configs on the dev Prometheus scrape and the seeded series, so
the queries no longer need to special-case local.
@lewispb lewispb merged commit 1b0457c into main Jun 29, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants