feat(prometheus): Phase 2 - fix Chrome/Chrome-Go metrics gaps#1135
feat(prometheus): Phase 2 - fix Chrome/Chrome-Go metrics gaps#1135GrammaTonic merged 1 commit intodevelopfrom
Conversation
Add netcat-openbsd to Dockerfile.chrome and Dockerfile.chrome-go apt-get blocks so the metrics-server.sh nc dependency is satisfied. Reorder entrypoint-chrome.sh to start the Prometheus metrics collector and HTTP server BEFORE the GITHUB_TOKEN/GITHUB_REPOSITORY checks, matching the standard entrypoint pattern and enabling standalone metrics testing without runner registration. Update monitoring/prometheus.yml scrape targets from the stale runner:8080 placeholder to per-variant jobs: github-runner-main:9091, github-runner-chrome:9091, and github-runner-chrome-go:9091. Add RUNNER_TYPE, METRICS_PORT, and METRICS_UPDATE_INTERVAL entries to config/chrome-runner.env.example and config/chrome-go-runner.env.example so users discover the metrics configuration options. Closes #1060 (TASK-013 through TASK-019 code changes)
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses and resolves several critical gaps in the Phase 2 Prometheus metrics implementation for Chrome and Chrome-Go runners. It ensures that metrics collection functions correctly by installing necessary dependencies, reordering initialization logic for proper startup, and updating Prometheus configuration to accurately target the different runner types. The changes aim to provide complete and reliable observability for these runner environments. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request addresses several gaps in the Prometheus metrics implementation for Chrome and Chrome-Go runners. It correctly adds netcat-openbsd to the Docker images, reorders the entrypoint script logic to allow standalone metrics testing, and updates the example environment files with metrics-related variables. The changes are well-structured and align with the goals described. I have one suggestion for the prometheus.yml file to improve maintainability by reducing configuration duplication using YAML anchors. Overall, this is a solid contribution to enhance the monitoring capabilities.
| # GitHub Runner application metrics - Standard runner | ||
| - job_name: "github-runner-standard" | ||
| static_configs: | ||
| - targets: ["runner:8080"] | ||
| - targets: ["github-runner-main:9091"] | ||
| scrape_interval: 15s | ||
| metrics_path: /metrics | ||
| scrape_timeout: 10s | ||
|
|
||
| # GitHub Runner application metrics - Chrome runner | ||
| - job_name: "github-runner-chrome" | ||
| static_configs: | ||
| - targets: ["github-runner-chrome:9091"] | ||
| scrape_interval: 15s | ||
| metrics_path: /metrics | ||
| scrape_timeout: 10s | ||
|
|
||
| # GitHub Runner application metrics - Chrome-Go runner | ||
| - job_name: "github-runner-chrome-go" | ||
| static_configs: | ||
| - targets: ["github-runner-chrome-go:9091"] | ||
| scrape_interval: 15s | ||
| metrics_path: /metrics | ||
| scrape_timeout: 10s |
There was a problem hiding this comment.
This is a great update to the scrape configurations. To improve maintainability and reduce duplication, you could consider using YAML anchors. This would allow you to define common scrape parameters like scrape_interval once and reuse them across all runner jobs. This makes future updates to these parameters easier as you'd only need to change them in one place.
# GitHub Runner application metrics - Standard runner
- job_name: "github-runner-standard"
static_configs:
- targets: ["github-runner-main:9091"]
scrape_interval: &scrape_interval 15s
metrics_path: &metrics_path /metrics
scrape_timeout: &scrape_timeout 10s
# GitHub Runner application metrics - Chrome runner
- job_name: "github-runner-chrome"
static_configs:
- targets: ["github-runner-chrome:9091"]
scrape_interval: *scrape_interval
metrics_path: *metrics_path
scrape_timeout: *scrape_timeout
# GitHub Runner application metrics - Chrome-Go runner
- job_name: "github-runner-chrome-go"
static_configs:
- targets: ["github-runner-chrome-go:9091"]
scrape_interval: *scrape_interval
metrics_path: *metrics_path
scrape_timeout: *scrape_timeout
Summary
Fixes the remaining gaps in the Phase 2 Prometheus metrics implementation for Chrome and Chrome-Go runners (Issue #1060, TASK-013 through TASK-019).
All code-level tasks were already implemented on develop, but four blockers/gaps prevented successful build and runtime validation (TASK-020 through TASK-026). This PR addresses all four.
Type of Change
Related Issues
Changes Made
1. Add netcat-openbsd to Chrome Dockerfiles (BLOCKER fix)
metrics-server.sh requires nc (netcat). The standard Dockerfile already installs netcat-openbsd, but both Chrome variants were missing it. Without this, the metrics HTTP server silently fails at runtime.
2. Reorder Chrome entrypoint token validation
entrypoint-chrome.sh checked GITHUB_TOKEN before starting metrics, despite a comment saying Start metrics services BEFORE token validation. The standard entrypoint.sh correctly starts metrics first. This reordering matches that pattern and enables standalone metrics testing.
3. Update Prometheus scrape targets
monitoring/prometheus.yml still had the stale placeholder runner:8080. Updated to per-variant scrape jobs matching the actual Docker Compose service names and ports.
4. Document metrics env vars in config examples
Added RUNNER_TYPE, METRICS_PORT, and METRICS_UPDATE_INTERVAL entries to config/chrome-runner.env.example and config/chrome-go-runner.env.example.
Checklist