Skip to content

feat(monitoring): split mega-dashboard into 4 standalone Grafana dashboards#1137

Merged
GrammaTonic merged 1 commit intodevelopfrom
feature/grafana-dashboards-phase4
Mar 2, 2026
Merged

feat(monitoring): split mega-dashboard into 4 standalone Grafana dashboards#1137
GrammaTonic merged 1 commit intodevelopfrom
feature/grafana-dashboards-phase4

Conversation

@GrammaTonic
Copy link
Owner

Summary

Split the combined github-runner.json mega-dashboard (24 panels across 4 rows) into 4 focused, standalone Grafana dashboards, aligning with the Phase 4 specification in Issue #1062.

Type of Change

  • New feature (non-breaking change which adds functionality)

Related Issues

Changes Made

Files Created

  • monitoring/grafana/dashboards/runner-overview.json - Runner Overview dashboard (12 panels, 3 rows)
  • monitoring/grafana/dashboards/performance-trends.json - Performance Trends dashboard (14 panels, 4 rows)
  • monitoring/grafana/provisioning/dashboards/dashboards.yml - Grafana dashboard auto-loading provisioning config

Files Deleted

  • monitoring/grafana/dashboards/github-runner.json - replaced by runner-overview.json

Files Modified

  • plan/feature-prometheus-monitoring-1.md - updated Phase 4 task descriptions and dates
  • docs/features/GRAFANA_DASHBOARD_METRICS.md - updated dashboard file listing to 4-dashboard table

Files Unchanged

  • monitoring/grafana/dashboards/dora-metrics.json - already standalone
  • monitoring/grafana/dashboards/job-analysis.json - already standalone

Key Changes

  1. Runner Overview (runner-overview.json): Slimmed to runner status, health, uptime panels. Added Runner Health timeseries row and Quick Links navigation panel.

  2. Performance Trends (performance-trends.json): New standalone dashboard expanded from 3 to 14 panels: summary stats, cache hit rate by type/runner, CPU/memory over time (cAdvisor), job duration percentiles (p50/p90/p95/p99), queue time trend, avg duration by runner type.

  3. Dashboard Provisioning: Added dashboards.yml provisioner config for zero-config auto-loading.

  4. Inter-dashboard Navigation: Runner Overview includes links panel to DORA Metrics, Job Analysis, and Performance Trends.

Dashboard Inventory

Dashboard File UID Panels Refresh Range
Runner Overview runner-overview.json github-runner-runner-overview 12 15s 24h
DORA Metrics dora-metrics.json github-runner-dora 12 30s 7d
Performance Trends performance-trends.json github-runner-performance 14 15s 24h
Job Analysis job-analysis.json github-runner-job-analysis 16 15s 24h

All dashboards: schemaVersion 39, Grafana >= 9.0.0, DS_PROMETHEUS datasource, runner_name/runner_type multi-select variables.

Testing

  • All 4 JSON files validate with python3 json.tool
  • All PromQL queries reference only defined metrics
  • All dashboards use portable DS_PROMETHEUS input variable
  • YAML provisioning config validates

Checklist

  • I have performed a self-review of my code
  • My changes generate no new warnings

/cc @copilot

…boards

Replace the combined github-runner.json (24 panels, 4 rows) with 4
focused, standalone dashboards per the Phase 4 spec (Issue #1062):

- runner-overview.json: Runner status, health, uptime, Quick Links
- dora-metrics.json: Unchanged - DF, Lead Time, CFR, MTTR, trends
- performance-trends.json: NEW - Cache, CPU/Memory, build percentiles
- job-analysis.json: Unchanged - Summary, histogram, runner comparison

Also adds Grafana dashboard provisioning config, inter-dashboard
navigation links, and updated plan tracking and docs.

Resolves #1062
@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the Grafana monitoring setup by breaking down a large, combined dashboard into several smaller, more specialized ones. This change improves clarity and usability by providing focused views for different aspects of GitHub Actions runner monitoring, such as overall status, performance trends, and DORA metrics. The introduction of automatic provisioning simplifies deployment and management of these dashboards.

Highlights

  • Dashboard Restructuring: The monolithic github-runner.json mega-dashboard has been split into four distinct, focused Grafana dashboards: 'Runner Overview', 'Performance Trends', 'DORA Metrics', and 'Job Analysis'.
  • New Runner Overview Dashboard: A new 'Runner Overview' dashboard (runner-overview.json) was created, focusing on runner status, health, uptime, and including a 'Quick Links' navigation panel.
  • Enhanced Performance Trends Dashboard: A new standalone 'Performance Trends' dashboard (performance-trends.json) was introduced, expanding from 3 to 14 panels to cover detailed cache hit rates, CPU/memory usage over time, job duration percentiles, queue time trends, and average duration by runner type.
  • Grafana Dashboard Provisioning: Grafana provisioning configuration (dashboards.yml) was added to enable zero-config auto-loading of the new dashboards.
  • Inter-dashboard Navigation: Navigation links have been implemented within the 'Runner Overview' dashboard to facilitate easy access to the 'DORA Metrics', 'Job Analysis', and 'Performance Trends' dashboards.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • docs/features/GRAFANA_DASHBOARD_METRICS.md
    • Updated the documentation to reflect the new structure of four standalone Grafana dashboards instead of a single combined one.
    • Added a table detailing each new dashboard's file, panel count, and focus.
    • Included information on inter-dashboard links, provisioning, and datasource portability.
  • monitoring/grafana/dashboards/github-runner.json
    • Removed the previous mega-dashboard file, as its functionality has been distributed among new, specialized dashboards.
  • monitoring/grafana/dashboards/performance-trends.json
    • Added a new Grafana dashboard JSON file dedicated to performance trends, including panels for cache hit rates, CPU/memory usage, job duration percentiles, and queue times.
  • monitoring/grafana/dashboards/runner-overview.json
    • Added a new Grafana dashboard JSON file providing an overview of runner status, health, and uptime, along with quick navigation links to other dashboards.
  • monitoring/grafana/provisioning/dashboards/dashboards.yml
    • Added a new YAML configuration file for Grafana dashboard provisioning, enabling automatic loading of dashboard JSON files.
  • plan/feature-prometheus-monitoring-1.md
    • Updated task descriptions and completion dates to reflect the implementation of standalone dashboards and Grafana provisioning.
    • Modified TASK-037 to specify the creation of runner-overview.json and the replacement of the mega-dashboard.
    • Revised TASK-040 to detail the creation of the standalone performance-trends.json dashboard.
    • Updated TASK-042 to emphasize consistent metadata and inter-dashboard navigation links across all four dashboards.
    • Adjusted TASK-043 and TASK-045 to confirm validation and auto-loading provisioning for all new dashboard files.
Activity
  • The author performed a self-review of the code.
  • The changes generate no new warnings.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@GrammaTonic GrammaTonic merged commit a3a3e03 into develop Mar 2, 2026
22 checks passed
@GrammaTonic GrammaTonic deleted the feature/grafana-dashboards-phase4 branch March 2, 2026 01:34
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a great improvement, refactoring a single large Grafana dashboard into four focused, standalone dashboards: Runner Overview, Performance Trends, DORA Metrics, and Job Analysis. This modular approach enhances usability and maintainability. The addition of a provisioning configuration file for auto-loading dashboards is also a welcome feature. My review focuses on the new dashboard definitions. I've identified a couple of areas for improvement in the performance-trends.json dashboard to make it more user-friendly by clarifying the nature of hardcoded memory thresholds.

{
"id": 4,
"title": "Memory Usage",
"description": "Current average memory usage across runner containers (requires cAdvisor)",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The memory usage thresholds in this panel are hardcoded to specific values (2GiB, 4GiB, 6GiB). While these might be reasonable defaults for some environments, they are not universally applicable and may cause confusion. To improve clarity and maintainability, I suggest updating the panel description to mention these default values and advise users to adjust them according to their specific runner configurations.

Suggested change
"description": "Current average memory usage across runner containers (requires cAdvisor)",
"description": "Current average memory usage across runner containers (requires cAdvisor). Thresholds are example values (2GiB, 4GiB, 6GiB) and should be adjusted to match your environment.",

{
"id": 22,
"title": "Memory Usage Over Time",
"description": "Memory usage per runner container over time (requires cAdvisor)",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the "Memory Usage" stat panel, this timeseries panel contains hardcoded memory thresholds (2GiB, 6GiB) that may not be suitable for all users. To make the dashboard more user-friendly and easier to configure, please update the panel's description to clarify that these are example thresholds and should be customized.

Suggested change
"description": "Memory usage per runner container over time (requires cAdvisor)",
"description": "Memory usage per runner container over time (requires cAdvisor). Thresholds are example values (e.g., 2GiB, 6GiB) and should be adjusted for your environment.",

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant