Skip to content

chore: promote develop to main — release v2.6.0#1143

Merged
GrammaTonic merged 7 commits intomainfrom
develop
Mar 2, 2026
Merged

chore: promote develop to main — release v2.6.0#1143
GrammaTonic merged 7 commits intomainfrom
develop

Conversation

@GrammaTonic
Copy link
Owner

Release v2.6.0 — Prometheus Monitoring Complete

Promotes develop to main for production release v2.6.0.

What's Included

Prometheus Monitoring (Phases 2–6):

Security:

Merge Strategy

⚠️ Use REGULAR merge (not squash) — preserves shared history between develop and main.

Checklist

  • VERSION bumped to 2.6.0
  • CHANGELOG updated
  • All CI checks green on develop
  • No merge conflicts

fix: improve security-advisories.yml - fix severity filter logic, add Chrome-Go scan, add concurrency group, remove excess permissions, add critical vuln notification
…#1135)

feat(prometheus): Phase 2 - fix Chrome/Chrome-Go metrics gaps (#1060)

Add netcat-openbsd to Chrome Dockerfiles, reorder entrypoint-chrome.sh to start metrics before token validation, update prometheus.yml scrape targets, and add metrics env vars to config examples.
feat: Phase 3 DORA metrics - job lifecycle hooks, duration histogram, queue time, cache stubs, 3 Grafana dashboards. Closes #1061
…boards (#1137)

feat(monitoring): split mega-dashboard into 4 standalone Grafana dashboards (Resolves #1062)
Phase 5 Prometheus documentation: 6 new docs/features files, 4 new wiki pages, updated README/API/env examples, fixed wiki port references
Phase 6 Testing & Validation: 6 Prometheus monitoring test suites (149 assertions), CI integration, shellcheck compliance fixes
chore: bump version to 2.6.0 and update changelog
@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This release, v2.6.0, represents a significant upgrade focused on observability and operational insights for the self-hosted GitHub Actions runners. It introduces a complete Prometheus monitoring solution, enabling users to gain deep visibility into runner performance, job execution, and DORA metrics through new metrics, dedicated Grafana dashboards, and comprehensive documentation. This enhancement aims to empower users with better tools for managing and optimizing their CI/CD infrastructure.

Highlights

  • Prometheus Monitoring (Phases 2-6) Complete: This release finalizes the comprehensive Prometheus monitoring integration, covering metrics gaps, DORA metrics, Grafana dashboards, extensive documentation, and a robust integration test suite.
  • Enhanced DORA Metrics & Job Lifecycle Tracking: Implemented job-started.sh and job-completed.sh hooks to track job durations, queue times, and other DORA-related metrics.
  • New Grafana Dashboards: Introduced four specialized Grafana dashboards: Runner Overview, Job Performance, Cache Efficiency, and DORA Metrics, replacing a single mega-dashboard.
  • Comprehensive Monitoring Documentation: Added extensive user guides for quick start, setup, usage, metrics reference, architecture, and troubleshooting for the Prometheus monitoring system.
  • Robust Integration Test Suite: Developed six new integration test scripts with 149 assertions to validate endpoint, performance, persistence, scaling, security, and documentation.
  • Security Workflow Improvement: Enhanced the logic and coverage of the security-advisories.yml workflow.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • README.md
    • Updated the monitoring section to reflect the new Prometheus metrics endpoint, Grafana dashboards, and links to detailed documentation.
  • VERSION
    • Bumped the project version from 2.5.0 to 2.6.0.
  • config/chrome-go-runner.env.example
    • Added optional Prometheus metrics configuration variables (RUNNER_TYPE, METRICS_PORT, METRICS_UPDATE_INTERVAL).
  • config/chrome-runner.env.example
    • Added optional Prometheus metrics configuration variables (RUNNER_TYPE, METRICS_PORT, METRICS_UPDATE_INTERVAL).
  • config/runner.env.example
    • Added optional Prometheus metrics configuration variables (RUNNER_TYPE, METRICS_PORT, METRICS_UPDATE_INTERVAL).
  • docker/Dockerfile
    • Added job-started.sh and job-completed.sh scripts for job lifecycle hooks.
  • docker/Dockerfile.chrome
    • Added netcat-openbsd dependency and job-started.sh/job-completed.sh scripts.
  • docker/Dockerfile.chrome-go
    • Added netcat-openbsd dependency and job-started.sh/job-completed.sh scripts.
  • docker/entrypoint-chrome.sh
    • Reordered environment variable checks.
    • Configured job lifecycle hook environment variables (ACTIONS_RUNNER_HOOK_JOB_STARTED, ACTIONS_RUNNER_HOOK_JOB_COMPLETED) and created a job state directory.
  • docker/entrypoint.sh
    • Configured job lifecycle hook environment variables (ACTIONS_RUNNER_HOOK_JOB_STARTED, ACTIONS_RUNNER_HOOK_JOB_COMPLETED) and created a job state directory.
  • docker/job-completed.sh
    • Added a new script to record job completion events, calculate duration and queue time, determine status, and append final entries to /tmp/jobs.log.
  • docker/job-started.sh
    • Added a new script to record job start events, create a preliminary "running" entry in /tmp/jobs.log, and save the start timestamp.
  • docker/metrics-collector.sh
    • Updated to include job duration histogram buckets, average queue time calculation, and stubbed cache hit rate metrics, along with label enhancements for existing metrics.
  • docs/API.md
    • Updated the /metrics endpoint documentation to detail exposed metrics, their types, labels, and links to comprehensive references.
  • docs/README.md
    • Added a new "Prometheus Monitoring" section with links to all related documentation files.
  • docs/features/GRAFANA_DASHBOARD_METRICS.md
    • Updated to describe the new four-dashboard structure and provisioning details.
  • docs/features/PHASE3_DORA_METRICS.md
    • Added new documentation detailing the architecture, data flow, log format, new metrics, PromQL examples, and DORA classification for Phase 3.
  • docs/features/PROMETHEUS_ARCHITECTURE.md
    • Added new documentation outlining the system architecture, component descriptions, data flow, design decisions, and scalability considerations for Prometheus monitoring.
  • docs/features/PROMETHEUS_METRICS_REFERENCE.md
    • Added new comprehensive documentation detailing all Prometheus metrics exposed, including types, labels, sources, and PromQL examples.
  • docs/features/PROMETHEUS_QUICKSTART.md
    • Added new quick start guide for setting up Prometheus monitoring in 5 minutes.
  • docs/features/PROMETHEUS_SETUP.md
    • Added new detailed setup guide for deploying runners with metrics, configuring Prometheus scrape targets, and importing Grafana dashboards.
  • docs/features/PROMETHEUS_TROUBLESHOOTING.md
    • Added new troubleshooting guide for common issues with Prometheus monitoring, including endpoint, update, and Grafana dashboard problems.
  • docs/releases/CHANGELOG.md
    • Updated the changelog to include details for release v2.6.0, specifically listing Prometheus Monitoring phases 2-6 and security improvements.
  • monitoring/grafana/dashboards/dora-metrics.json
    • Added a new Grafana dashboard JSON for DORA metrics, including deployment frequency, lead time, change failure rate, and MTTR.
  • monitoring/grafana/dashboards/github-runner.json
    • Removed the old, monolithic Grafana dashboard.
  • monitoring/grafana/dashboards/job-analysis.json
    • Added a new Grafana dashboard JSON for detailed job analysis, including duration histograms and status breakdowns.
  • monitoring/grafana/dashboards/performance-trends.json
    • Added a new Grafana dashboard JSON for performance trends, covering cache hit rates, CPU/memory usage, and build times.
  • monitoring/grafana/dashboards/runner-overview.json
    • Added a new Grafana dashboard JSON for a high-level runner overview, including status, uptime, and job success rates.
  • monitoring/grafana/provisioning/dashboards/dashboards.yml
    • Added new Grafana provisioning configuration to automatically load the new dashboards.
  • monitoring/prometheus-scrape-example.yml
    • Added a new example Prometheus scrape configuration for all three runner types.
  • monitoring/prometheus.yml
    • Updated Prometheus scrape configurations to target specific runner types on their respective metrics ports.
  • plan/feature-prometheus-monitoring-1.md
    • Updated the status of all Prometheus monitoring implementation phases (3, 4, 5, 6) to "✅ Complete".
  • tests/README.md
    • Updated the test suite structure to include new Prometheus metrics tests and documentation validation tests.
  • tests/integration/test-docs-validation.sh
    • Added a new integration test script to validate documentation consistency and file existence.
  • tests/integration/test-job-lifecycle.sh
    • Added a new integration test script to validate job lifecycle hooks and jobs.log entries.
  • tests/integration/test-metrics-endpoint.sh
    • Added a new integration test script to validate the Prometheus metrics endpoint's format, content, and runtime behavior.
  • tests/integration/test-metrics-performance.sh
    • Added a new integration test script to measure metrics endpoint response time and collector update interval accuracy.
  • tests/integration/test-metrics-persistence.sh
    • Added a new integration test script to validate metrics data persistence across container restarts.
  • tests/integration/test-metrics-scaling.sh
    • Added a new integration test script to validate multi-runner deployment with unique metrics and port mappings.
  • tests/integration/test-metrics-security.sh
    • Added a new integration test script to scan for sensitive data leakage in metrics output and scripts.
  • tests/unit/test-metrics-phase1.sh
    • Updated the unit test script to include a "SKIP" result option.
  • wiki-content/Chrome-Runner.md
    • Updated the wiki page to include information about Prometheus metrics for the Chrome runner.
  • wiki-content/Docker-Configuration.md
    • Updated the wiki page to include a link to the new monitoring setup documentation.
  • wiki-content/Grafana-Dashboards.md
    • Added a new wiki page detailing the four pre-built Grafana dashboards, import instructions, and variables.
  • wiki-content/Home.md
    • Updated the home wiki page to include a new "Monitoring & Observability" section with links to all new monitoring documentation and updated the project status.
  • wiki-content/Metrics-Reference.md
    • Added a new wiki page providing a comprehensive reference for all Prometheus metrics exposed by the runners.
  • wiki-content/Monitoring-Setup.md
    • Added a new wiki page providing a quick start guide and detailed setup instructions for Prometheus monitoring.
  • wiki-content/Monitoring-Troubleshooting.md
    • Added a new wiki page for troubleshooting common issues with Prometheus monitoring.
  • wiki-content/Production-Deployment.md
    • Updated the Prometheus metrics port and added a link to the full monitoring guide.
  • wiki-content/Quick-Start.md
    • Updated the quick start wiki page to include a link to the new monitoring setup documentation.
Ignored Files
  • Ignored by pattern: .github/workflows/** (2)
    • .github/workflows/ci-cd.yml
    • .github/workflows/security-advisories.yml
Activity
  • The pull request author completed all items in the provided checklist, including bumping the version, updating the changelog, ensuring CI checks were green on develop, and resolving merge conflicts.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request promotes the develop branch to main for the v2.6.0 release, which notably completes the Prometheus monitoring feature. The changes are extensive, including the addition of job lifecycle tracking for DORA metrics, new Grafana dashboards, and comprehensive documentation and testing for the monitoring system. My review focused on the new monitoring scripts. I found a high-severity issue in the histogram calculation logic within metrics-collector.sh that would lead to incorrect metrics. The rest of the implementation, including the new job hook scripts and entrypoint modifications, appears solid and well-designed.

Comment on lines +130 to +135
# Make buckets cumulative (each bucket includes all smaller buckets)
# The above loop already counts per-bucket, but Prometheus requires cumulative
# So we need to accumulate: bucket[i] += bucket[i-1]
for ((i = 1; i < ${#HISTOGRAM_BUCKETS[@]}; i++)); do
bucket_counts_ref[i]=$((bucket_counts_ref[i] + bucket_counts_ref[i - 1]))
done

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic for calculating histogram buckets is incorrect. The loop on lines 121-125 already correctly computes the cumulative counts for the histogram buckets. This second loop re-accumulates these already-cumulative values, which will result in incorrect histogram metrics. For example, a value that falls into the le="60" bucket will be counted again in every subsequent bucket's total by this loop, inflating the counts.

The comment on line 131 is also misleading; the preceding loop is already cumulative, not 'per-bucket'.

To fix this, this redundant loop should be removed.

@GrammaTonic GrammaTonic merged commit a6d40f5 into main Mar 2, 2026
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant