feat(helm): add metrics endpoint and ServiceMonitor support#1435
feat(helm): add metrics endpoint and ServiceMonitor support#1435optimus-fulcria wants to merge 4 commits intokagent-dev:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds first-class Helm configuration for exposing the controller’s Prometheus metrics endpoint and optionally deploying a Prometheus Operator ServiceMonitor to scrape it (closes #1369).
Changes:
- Introduces
controller.metricsvalues (enablement, port, secure mode, andserviceMonitorconfig). - Updates controller Deployment/Service/ConfigMap templates to expose the metrics port and set
METRICS_*env vars when enabled. - Adds a new
controller-servicemonitor.yamltemplate plus Helm unit tests for the new behavior.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| helm/kagent/values.yaml | Adds controller.metrics and controller.metrics.serviceMonitor values to configure metrics + ServiceMonitor. |
| helm/kagent/templates/controller-configmap.yaml | Conditionally sets METRICS_BIND_ADDRESS / METRICS_SECURE when metrics are enabled. |
| helm/kagent/templates/controller-deployment.yaml | Conditionally exposes a metrics container port. |
| helm/kagent/templates/controller-service.yaml | Conditionally exposes a metrics Service port. |
| helm/kagent/templates/controller-servicemonitor.yaml | Adds optional ServiceMonitor resource for Prometheus Operator scraping. |
| helm/kagent/tests/controller-deployment_test.yaml | Adds unit tests for deployment port + configmap env vars behavior. |
| helm/kagent/tests/controller-service_test.yaml | Adds unit tests for conditional Service metrics port exposure. |
| helm/kagent/tests/controller-servicemonitor_test.yaml | Adds unit tests for ServiceMonitor rendering and configuration. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -0,0 +1,33 @@ | |||
| {{- if and .Values.controller.metrics.enabled .Values.controller.metrics.serviceMonitor.enabled }} | |||
There was a problem hiding this comment.
ServiceMonitor rendering is only gated by values. This chart already guards optional CRD-based resources with .Capabilities.APIVersions.Has (e.g., openshift-route.yaml) to avoid install/upgrade failures when the CRD is not installed. Consider also gating this template on .Capabilities.APIVersions.Has "monitoring.coreos.com/v1" so controller.metrics.serviceMonitor.enabled=true doesn’t hard-fail on clusters without Prometheus Operator CRDs.
| {{- if and .Values.controller.metrics.enabled .Values.controller.metrics.serviceMonitor.enabled }} | |
| {{- if and .Values.controller.metrics.enabled .Values.controller.metrics.serviceMonitor.enabled (.Capabilities.APIVersions.Has "monitoring.coreos.com/v1") }} |
| insecureSkipVerify: true | ||
| {{- end }} |
There was a problem hiding this comment.
When controller.metrics.secure is true, the controller enables controller-runtime’s authn/authz filter on the metrics endpoint. This ServiceMonitor config sets scheme: https but does not configure any bearer token/auth settings, so Prometheus scrapes are likely to be rejected. Add an endpoint auth configuration (e.g., bearerTokenFile) when secure metrics are enabled (and/or allow configuring the full endpoint auth block via values).
| insecureSkipVerify: true | |
| {{- end }} | |
| insecureSkipVerify: true | |
| {{- with .Values.controller.metrics.serviceMonitor.auth }} | |
| {{- toYaml . | nindent 6 }} | |
| {{- else }} | |
| bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token | |
| {{- end }} | |
| {{- end }} |
71c2bf6 to
686d15f
Compare
| {{- if .Values.controller.metrics.enabled }} | ||
| - port: {{ .Values.controller.metrics.port }} | ||
| targetPort: {{ .Values.controller.metrics.port }} | ||
| protocol: TCP | ||
| name: metrics | ||
| {{- end }} |
There was a problem hiding this comment.
Do we need to expose these on the service? The controller service can potentially be a LoadBalancer because of the MCP + A2A server so I think metrics should either be scraped from the pod directly, or there should be a different service
There was a problem hiding this comment.
Good point. Moved the metrics port to a dedicated ClusterIP service (controller-metrics-service.yaml) so it stays internal regardless of the main service type. Updated the ServiceMonitor selector to target the new metrics service.
…rviceMonitor Closes kagent-dev#1369 Add first-class Helm values for metrics configuration: - controller.metrics.enabled: enables the metrics endpoint - controller.metrics.port: configurable metrics port (default 9093) - controller.metrics.secure: HTTPS vs HTTP toggle - controller.metrics.serviceMonitor.*: Prometheus Operator ServiceMonitor When enabled, the chart: - Sets METRICS_BIND_ADDRESS and METRICS_SECURE env vars in the configmap - Adds a metrics port to the controller deployment and service - Optionally creates a ServiceMonitor with configurable labels, interval, scrapeTimeout, and relabeling configs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Optimus (AI Agent) <agent@fulcria.com>
…metrics Address review feedback: - Gate ServiceMonitor rendering on .Capabilities.APIVersions.Has to avoid failures on clusters without Prometheus Operator CRDs - Add bearerTokenFile for service account auth when secure metrics enabled - Add test for CRD unavailability scenario - Add test for bearerTokenFile in secure mode Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Optimus (AI Agent) <agent@fulcria.com>
Per review feedback, metrics should not be exposed on the main controller service which may be a LoadBalancer. This creates a separate ClusterIP service for metrics scraping. - Remove metrics port from controller-service.yaml - Add controller-metrics-service.yaml (ClusterIP, metrics only) - Update ServiceMonitor selector to target the metrics service Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Optimus (AI Agent) <agent@fulcria.com>
cc4b927 to
9119305
Compare
…tor labels - Add configurable bearerTokenSecret as alternative to the default bearerTokenFile, allowing users to override Prometheus auth via values (defaults to SA token file when not set) - Fix duplicate YAML keys in ServiceMonitor selector and metrics Service labels by using kagent.selectorLabels/kagent.labels instead of kagent.controller.selectorLabels/kagent.controller.labels - Update controller-service test to reflect metrics port move to dedicated service - Add test for bearerTokenSecret override - Fix selector label test to expect component: metrics Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Optimus (AI Agent) <agent@fulcria.com>
Summary
Closes #1369
Adds first-class Helm values for enabling the controller metrics endpoint and optionally deploying a Prometheus Operator
ServiceMonitor.Changes
values.yaml: Newcontroller.metricssection withenabled,port,secure, andserviceMonitorsub-valuescontroller-configmap.yaml: SetsMETRICS_BIND_ADDRESSandMETRICS_SECUREenv vars when metrics are enabledcontroller-deployment.yaml: Adds metrics container port when enabledcontroller-service.yaml: Adds metrics service port when enabledcontroller-servicemonitor.yaml(new): Optional ServiceMonitor with configurable labels, interval, scrapeTimeout, and relabelingUsage
Test plan
🤖 Generated with Claude Code