CNTRLPLANE-3584: Add RBAC for kube-scheduler /metrics/resources endpoint by dhgautam99 · Pull Request #8680 · openshift/hypershift

dhgautam99 · 2026-06-05T07:57:56Z

What this PR does / why we need it:

Adds guest-cluster RBAC to allow the metrics-client certificate identity (system:serviceaccount:hypershift:prometheus) to access the kube-scheduler /metrics/resources endpoint.

The built-in system:monitoring ClusterRole only grants GET on /metrics and /metrics/slis. The /metrics/resources endpoint (KEP-1748, exposing kube_pod_resource_request and kube_pod_resource_limit) requires a separate authorization grant. Without this, prometheus-user-workload on the management cluster receives a 403 Forbidden when scraping /metrics/resources via the ServiceMonitor.

This PR creates a new hypershift-metrics-resources-reader ClusterRole and ClusterRoleBinding in the guest cluster via HCCO, granting GET on /metrics/resources to the metrics-client identity.

Which issue(s) this PR fixes:

Fixes CNTRLPLANE-3584

Depends on #8489

Special notes for your reviewer:

The RBAC is reconciled by HCCO (Hosted Cluster Config Operator) in the guest cluster, following the same pattern as the existing system:monitoring ClusterRoleBinding
kube-scheduler uses --authorization-kubeconfig pointing to the guest cluster KAS, so RBAC checks for metrics endpoints happen on the guest cluster even though the scheduler runs on the management cluster
The /metrics/resources endpoint only returns data on the leader scheduler pod

Checklist:

Subject and description added to both, commit and PR.
Relevant issues have been referenced.
This change includes docs.
This change includes unit tests.

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Kube-scheduler now uses TLS certificates for secure client connections on port 10259.
- Kube-scheduler metrics are now exposed via HTTPS monitoring with ServiceMonitor configuration.
- Added RBAC resources granting prometheus metrics resource access.
Tests
- Added tests for scheduler server certificate reconciliation, idempotency, and ServiceMonitor configuration.

…erts The kube-scheduler previously auto-generated self-signed serving certificates via --cert-dir. This change adds a CA-signed serving certificate, a Service, and a ServiceMonitor to enable Prometheus metrics scraping with proper mTLS authentication.

Add unit tests for scheduler component options, ServiceMonitor adapter, and server certificate reconciliation. Regenerate test fixtures after adding Service, ServiceMonitor, and CA-signed serving certificate support.

The built-in system:monitoring ClusterRole only grants access to /metrics. This adds a ClusterRole and ClusterRoleBinding in the guest cluster to allow the metrics client to scrape /metrics/resources on kube-scheduler, which exposes kube_pod_resource_request and kube_pod_resource_limit metrics.

openshift-merge-bot · 2026-06-05T07:57:59Z

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

openshift-ci · 2026-06-05T07:58:00Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci-robot · 2026-06-05T07:58:00Z

@dhgautam99: This pull request references CNTRLPLANE-3584 which is a valid jira issue.

Details

In response to this:

What this PR does / why we need it:

Adds guest-cluster RBAC to allow the metrics-client certificate identity (system:serviceaccount:hypershift:prometheus) to access the kube-scheduler /metrics/resources endpoint.

The built-in system:monitoring ClusterRole only grants GET on /metrics and /metrics/slis. The /metrics/resources endpoint (KEP-1748, exposing kube_pod_resource_request and kube_pod_resource_limit) requires a separate authorization grant. Without this, prometheus-user-workload on the management cluster receives a 403 Forbidden when scraping /metrics/resources via the ServiceMonitor.

This PR creates a new hypershift-metrics-resources-reader ClusterRole and ClusterRoleBinding in the guest cluster via HCCO, granting GET on /metrics/resources to the metrics-client identity.

Which issue(s) this PR fixes:

Fixes CNTRLPLANE-3584

Depends on #8489

Special notes for your reviewer:

The RBAC is reconciled by HCCO (Hosted Cluster Config Operator) in the guest cluster, following the same pattern as the existing system:monitoring ClusterRoleBinding

kube-scheduler uses --authorization-kubeconfig pointing to the guest cluster KAS, so RBAC checks for metrics endpoints happen on the guest cluster even though the scheduler runs on the management cluster

The /metrics/resources endpoint only returns data on the leader scheduler pod

Checklist:

Subject and description added to both, commit and PR.

Relevant issues have been referenced.

This change includes docs.

This change includes unit tests.

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai · 2026-06-05T07:58:11Z

📝 Walkthrough

Walkthrough

This PR enables TLS serving on kube-scheduler and metrics scraping via HTTPS. It adds scheduler server certificate PKI reconciliation, updates the Deployment to use explicit TLS certificate files, exposes a new service port for secure client communication, configures a ServiceMonitor for Prometheus metrics collection with TLS, and establishes the necessary metrics relabel configuration and RBAC permissions for the Prometheus instance to scrape scheduler metrics.

🚥 Pre-merge checks | ✅ 11

✅ Passed checks (11 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately identifies the main feature: adding RBAC for kube-scheduler /metrics/resources endpoint, which aligns with the primary objective of the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	All new test names use static, descriptive strings with no dynamic content, generated suffixes, timestamps, UUIDs, or other unstable values.
Test Structure And Quality	✅ Passed	Tests meet quality standards: single-responsibility subtests, proper setup via helpers, in-memory resources (no cleanup), consistent assertions, and alignment with codebase Go testing conventions.
Topology-Aware Scheduling Compatibility	✅ Passed	Deployment manifest has no replicas, affinity, nodeSelector, or topologySpreadConstraints; no topology-breaking scheduling constraints introduced.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	The PR adds only standard Go unit tests using testing.T and Gomega, not Ginkgo e2e tests (which use It(), Describe(), etc.). The custom check does not apply to this PR.
No-Weak-Crypto	✅ Passed	No weak crypto algorithms (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or non-constant-time comparisons found in the 15 modified files.
Container-Privileges	✅ Passed	No privileged, hostPID, hostNetwork, hostIPC, SYS_ADMIN, or allowPrivilegeEscalation settings in new manifests; root is justified for kube-scheduler control plane component.
No-Sensitive-Data-In-Logs	✅ Passed	No sensitive data logging found in PR. Code adds PKI/ServiceMonitor/RBAC logic without exposing passwords, tokens, keys, PII, or customer data in logs.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-06-05T07:59:23Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dhgautam99
Once this PR has been reviewed and has the lgtm label, please assign sjenning for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

codecov · 2026-06-05T08:05:40Z

Codecov Report

❌ Patch coverage is 29.57746% with 50 lines in your changes missing coverage. Please review.
✅ Project coverage is 41.43%. Comparing base (fce95fc) to head (0b0a67a).
⚠️ Report is 76 commits behind head on main.

Files with missing lines	Patch %	Lines
...igoperator/controllers/resources/rbac/reconcile.go	0.00%	22 Missing ⚠️
...igoperator/controllers/resources/manifests/rbac.go	0.00%	12 Missing ⚠️
support/metrics/sets.go	0.00%	8 Missing ⚠️
.../hostedcontrolplane/v2/kube_scheduler/component.go	0.00%	4 Missing ⚠️
...ostedcontrolplane/hostedcontrolplane_controller.go	50.00%	2 Missing and 1 partial ⚠️
...or/controllers/hostedcontrolplane/manifests/pki.go	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #8680      +/-   ##
==========================================
+ Coverage   40.69%   41.43%   +0.74%     
==========================================
  Files         755      758       +3     
  Lines       93373    93718     +345     
==========================================
+ Hits        37994    38831     +837     
+ Misses      52646    52165     -481     
+ Partials     2733     2722      -11

Files with missing lines	Coverage Δ
...or/controllers/hostedcontrolplane/pki/scheduler.go	`100.00% <100.00%> (ø)`
...edcontrolplane/v2/kube_scheduler/servicemonitor.go	`100.00% <100.00%> (ø)`
...rconfigoperator/controllers/resources/resources.go	`56.73% <100.00%> (+0.12%)`	⬆️
...or/controllers/hostedcontrolplane/manifests/pki.go	`0.00% <0.00%> (ø)`
...ostedcontrolplane/hostedcontrolplane_controller.go	`45.71% <50.00%> (+0.68%)`	⬆️
.../hostedcontrolplane/v2/kube_scheduler/component.go	`26.08% <0.00%> (+26.08%)`	⬆️
support/metrics/sets.go	`2.29% <0.00%> (-0.06%)`	⬇️
...igoperator/controllers/resources/manifests/rbac.go	`0.00% <0.00%> (ø)`
...igoperator/controllers/resources/rbac/reconcile.go	`0.00% <0.00%> (ø)`

... and 46 files with indirect coverage changes

Flag	Coverage Δ
cmd-support	`34.87% <0.00%> (+0.16%)`	⬆️
cpo-hostedcontrolplane	`43.59% <70.37%> (+1.78%)`	⬆️
cpo-other	`42.63% <5.55%> (+1.24%)`	⬆️
hypershift-operator	`51.57% <ø> (+0.72%)`	⬆️
other	`31.64% <ø> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

hypershift-jira-solve-ci · 2026-06-05T08:10:05Z

I have all the information needed. Here is the analysis:

Test Failure Analysis Complete

Job Information

Prow Job: codecov/patch (GitHub check, not a Prow CI job)
Build ID: Check Run ID 79688744447
PR: #8680 — CNTRLPLANE-3584: Add RBAC for kube-scheduler /metrics/resources endpoint
Branch: enable-kube-scheduler-resource-metrics → main
Status: codecov/patch ❌ FAIL | codecov/project ✅ PASS

Test Failure Analysis

Error

Patch coverage is 29.57% with 50 lines in your changes missing coverage.
Target: 40.69% (project-level baseline). Patch coverage fell 11.12 percentage points below the target.

Summary

The codecov/patch check failed because 50 out of ~71 new executable lines added by this PR have zero unit-test coverage, yielding a patch coverage of only 29.57% — well below the project baseline of 40.69%. The uncovered code is concentrated in three areas: the new RBAC manifest/reconcile functions for hypershift-metrics-resources-reader (34 lines, 0%), the new SchedulerRelabelConfigs function in support/metrics/sets.go (8 lines, 0%), and minor gaps in component.go, pki.go, and hostedcontrolplane_controller.go. Notably, codecov/project passed — overall project coverage actually increased from 40.69% to 41.43% — so this is strictly a patch-level (new-code-only) coverage gap, not a regression of the broader codebase.

Root Cause

The PR introduces new Go functions that lack corresponding unit tests:

manifests/rbac.go — 12 lines, 0% covered: Two new manifest constructor functions (MetricsResourcesClusterRole() and MetricsResourcesClusterRoleBinding()) that create the hypershift-metrics-resources-reader ClusterRole and ClusterRoleBinding. These are simple factory functions but are not exercised by any test.
rbac/reconcile.go — 22 lines, 0% covered: Two new reconcile functions (ReconcileMetricsResourcesClusterRole and ReconcileMetricsResourcesClusterRoleBinding) that set the RBAC policy rules and subject bindings. These contain the core RBAC logic (NonResourceURL /metrics/resources with get verb, bound to system:serviceaccount:hypershift:prometheus) and are the most important uncovered code.
support/metrics/sets.go — 8 lines, 0% covered: The new SchedulerRelabelConfigs() function (a switch on MetricsSet returning relabel configs for telemetry/SRE sets) has no tests. This follows the exact same pattern as the existing KASRelabelConfigs(), KCMRelabelConfigs(), etc., which also have 0% coverage — so the file has a pre-existing coverage debt pattern.
v2/kube_scheduler/component.go — 4 lines, 0% covered: New code adding the scheduler-server secret as a volume source to the kube-scheduler component.
manifests/pki.go — 1 line, 0% covered: The new SchedulerServerCertSecret() factory function.
hostedcontrolplane_controller.go — 3 lines, 50% covered: The new cert reconciliation block for schedulerServerSecret — partially covered because the controller reconcile path is partially exercised by existing integration-style tests, but the new block's error path is not.

The PR does include good test coverage for some components — pki/scheduler.go (100%), v2/kube_scheduler/servicemonitor.go (100%), and resources/resources.go (100%) — but the RBAC manifest/reconcile layer and the metrics-set helper were shipped without tests.

Codecov uses the project-level coverage (40.69%) as the default patch target when no explicit target is set in codecov.yml. Since the patch coverage (29.57%) falls below this threshold, the check fails.

Recommendations

Add unit tests for the RBAC reconcile functions (highest impact — 22 lines):
- Test ReconcileMetricsResourcesClusterRole — verify it sets the correct NonResourceURLs (/metrics/resources) and Verbs (get).
- Test ReconcileMetricsResourcesClusterRoleBinding — verify the RoleRef points to hypershift-metrics-resources-reader and the subject is system:serviceaccount:hypershift:prometheus.
- These are simple struct-setting functions; the tests would be straightforward table-driven tests similar to existing RBAC reconcile tests in the same file.
Add unit tests for the RBAC manifest constructors (12 lines):
- Test MetricsResourcesClusterRole() and MetricsResourcesClusterRoleBinding() — verify they return objects with the correct name (hypershift-metrics-resources-reader).
Add unit tests for SchedulerRelabelConfigs (8 lines):
- Test the three MetricsSet branches (Telemetry, SRE, default). Note: the existing sibling functions (KASRelabelConfigs, KCMRelabelConfigs) also lack tests, so this is consistent with the pattern — but adding a test would be simple and would clear the coverage check.
Alternatively, if the team considers these functions too trivial to test, the codecov/patch failure can be tolerated as a non-blocking signal — codecov/project already passes, confirming no overall coverage regression. Check with the team whether codecov/patch is a required merge check or informational only.

Evidence

Evidence	Detail
Check result	`codecov/patch`: failure; `codecov/project`: pass
Patch coverage	29.57% (21 of ~71 executable lines covered)
Target	40.69% (project baseline, no explicit override in `codecov.yml`)
Lines missing coverage	50 lines across 6 files
Largest gap	`rbac/reconcile.go` — 22 lines at 0% (new `ReconcileMetricsResourcesClusterRole` and `ReconcileMetricsResourcesClusterRoleBinding`)
Second largest gap	`manifests/rbac.go` — 12 lines at 0% (`MetricsResourcesClusterRole()`, `MetricsResourcesClusterRoleBinding()`)
Third largest gap	`support/metrics/sets.go` — 8 lines at 0% (`SchedulerRelabelConfigs()`)
Well-covered files	`pki/scheduler.go` (100%), `servicemonitor.go` (100%), `resources.go` (100%)
Project coverage delta	+0.74% (40.69% → 41.43%) — overall coverage improved
Codecov config	`codecov.yml` has no explicit patch `target` — defaults to project-level coverage
Codecov report	Full report

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

control-plane-operator/hostedclusterconfigoperator/controllers/resources/rbac/reconcile.go (1)

294-298: ⚡ Quick win

Use manifest-derived role name to prevent drift.

Line 297 hardcodes "hypershift-metrics-resources-reader" instead of referencing the manifest constructor. If either side changes, role binding will silently point to the wrong role.

💡 Proposed fix

 	r.RoleRef = rbacv1.RoleRef{
 		APIGroup: rbacv1.SchemeGroupVersion.Group,
 		Kind:     "ClusterRole",
-		Name:     "hypershift-metrics-resources-reader",
+		Name:     hccomanifests.MetricsResourcesClusterRole().Name,
 	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/resources/rbac/reconcile.go`
around lines 294 - 298, Replace the hardcoded RoleRef.Name string in the
r.RoleRef assignment with the canonical name produced by the manifest
constructor that builds the corresponding ClusterRole (the same code that
currently constructs "hypershift-metrics-resources-reader"); locate that
manifest constructor or exported name/constant used when creating the
ClusterRole and use it to set r.RoleRef.Name in the r.RoleRef assignment so the
RoleBinding always references the manifest-derived role name and cannot drift.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/v2/kube_scheduler/servicemonitor.go`:
- Around line 11-16: The adaptServiceMonitor function currently indexes
sm.Spec.Endpoints[0] without verifying endpoints exist, which can panic; update
adaptServiceMonitor to first check len(sm.Spec.Endpoints) > 0 and return a clear
error (or create/append a default endpoint if desired) when it is zero, then
only set MetricRelabelConfigs and call util.ApplyClusterIDLabel on
sm.Spec.Endpoints[0]; reference the adaptServiceMonitor function,
sm.Spec.Endpoints, MetricRelabelConfigs, and util.ApplyClusterIDLabel when
making the change.

In `@support/metrics/sets.go`:
- Around line 229-237: The Telemetry branch in SchedulerRelabelConfigs
incorrectly returns sreMetricsSetConfig.KubeScheduler causing Telemetry to
depend on the SRE config; change the MetricsSetTelemetry case to return the
Telemetry-specific relabel config (e.g., telemetryMetricsSetConfig.KubeScheduler
or a dedicated telemetry config variable) instead of
sreMetricsSetConfig.KubeScheduler, ensuring MetricsSetSRE still returns
sreMetricsSetConfig.KubeScheduler and the default returns nil.

---

Nitpick comments:
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/resources/rbac/reconcile.go`:
- Around line 294-298: Replace the hardcoded RoleRef.Name string in the
r.RoleRef assignment with the canonical name produced by the manifest
constructor that builds the corresponding ClusterRole (the same code that
currently constructs "hypershift-metrics-resources-reader"); locate that
manifest constructor or exported name/constant used when creating the
ClusterRole and use it to set r.RoleRef.Name in the r.RoleRef assignment so the
RoleBinding always references the manifest-derived role name and cannot drift.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 7d14cef1-a721-4b03-b73a-d3d4e5886071

📥 Commits

Reviewing files that changed from the base of the PR and between f13c62d and 0b0a67a.

⛔ Files ignored due to path filters (20)

control-plane-operator/controllers/hostedcontrolplane/testdata/kube-scheduler/AROSwift/zz_fixture_TestControlPlaneComponents_kube_scheduler_controlplanecomponent.yaml is excluded by !**/testdata/**
control-plane-operator/controllers/hostedcontrolplane/testdata/kube-scheduler/AROSwift/zz_fixture_TestControlPlaneComponents_kube_scheduler_deployment.yaml is excluded by !**/testdata/**
control-plane-operator/controllers/hostedcontrolplane/testdata/kube-scheduler/AROSwift/zz_fixture_TestControlPlaneComponents_kube_scheduler_service.yaml is excluded by !**/testdata/**
control-plane-operator/controllers/hostedcontrolplane/testdata/kube-scheduler/AROSwift/zz_fixture_TestControlPlaneComponents_kube_scheduler_servicemonitor.yaml is excluded by !**/testdata/**
control-plane-operator/controllers/hostedcontrolplane/testdata/kube-scheduler/GCP/zz_fixture_TestControlPlaneComponents_kube_scheduler_controlplanecomponent.yaml is excluded by !**/testdata/**
control-plane-operator/controllers/hostedcontrolplane/testdata/kube-scheduler/GCP/zz_fixture_TestControlPlaneComponents_kube_scheduler_deployment.yaml is excluded by !**/testdata/**
control-plane-operator/controllers/hostedcontrolplane/testdata/kube-scheduler/GCP/zz_fixture_TestControlPlaneComponents_kube_scheduler_service.yaml is excluded by !**/testdata/**
control-plane-operator/controllers/hostedcontrolplane/testdata/kube-scheduler/GCP/zz_fixture_TestControlPlaneComponents_kube_scheduler_servicemonitor.yaml is excluded by !**/testdata/**
control-plane-operator/controllers/hostedcontrolplane/testdata/kube-scheduler/IBMCloud/zz_fixture_TestControlPlaneComponents_kube_scheduler_controlplanecomponent.yaml is excluded by !**/testdata/**
control-plane-operator/controllers/hostedcontrolplane/testdata/kube-scheduler/IBMCloud/zz_fixture_TestControlPlaneComponents_kube_scheduler_deployment.yaml is excluded by !**/testdata/**
control-plane-operator/controllers/hostedcontrolplane/testdata/kube-scheduler/IBMCloud/zz_fixture_TestControlPlaneComponents_kube_scheduler_service.yaml is excluded by !**/testdata/**
control-plane-operator/controllers/hostedcontrolplane/testdata/kube-scheduler/IBMCloud/zz_fixture_TestControlPlaneComponents_kube_scheduler_servicemonitor.yaml is excluded by !**/testdata/**
control-plane-operator/controllers/hostedcontrolplane/testdata/kube-scheduler/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_kube_scheduler_controlplanecomponent.yaml is excluded by !**/testdata/**
control-plane-operator/controllers/hostedcontrolplane/testdata/kube-scheduler/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_kube_scheduler_deployment.yaml is excluded by !**/testdata/**
control-plane-operator/controllers/hostedcontrolplane/testdata/kube-scheduler/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_kube_scheduler_service.yaml is excluded by !**/testdata/**
control-plane-operator/controllers/hostedcontrolplane/testdata/kube-scheduler/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_kube_scheduler_servicemonitor.yaml is excluded by !**/testdata/**
control-plane-operator/controllers/hostedcontrolplane/testdata/kube-scheduler/zz_fixture_TestControlPlaneComponents_kube_scheduler_controlplanecomponent.yaml is excluded by !**/testdata/**
control-plane-operator/controllers/hostedcontrolplane/testdata/kube-scheduler/zz_fixture_TestControlPlaneComponents_kube_scheduler_deployment.yaml is excluded by !**/testdata/**
control-plane-operator/controllers/hostedcontrolplane/testdata/kube-scheduler/zz_fixture_TestControlPlaneComponents_kube_scheduler_service.yaml is excluded by !**/testdata/**
control-plane-operator/controllers/hostedcontrolplane/testdata/kube-scheduler/zz_fixture_TestControlPlaneComponents_kube_scheduler_servicemonitor.yaml is excluded by !**/testdata/**

📒 Files selected for processing (15)

control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
control-plane-operator/controllers/hostedcontrolplane/manifests/pki.go
control-plane-operator/controllers/hostedcontrolplane/pki/scheduler.go
control-plane-operator/controllers/hostedcontrolplane/pki/scheduler_test.go
control-plane-operator/controllers/hostedcontrolplane/v2/assets/kube-scheduler/deployment.yaml
control-plane-operator/controllers/hostedcontrolplane/v2/assets/kube-scheduler/service.yaml
control-plane-operator/controllers/hostedcontrolplane/v2/assets/kube-scheduler/servicemonitor.yaml
control-plane-operator/controllers/hostedcontrolplane/v2/kube_scheduler/component.go
control-plane-operator/controllers/hostedcontrolplane/v2/kube_scheduler/component_test.go
control-plane-operator/controllers/hostedcontrolplane/v2/kube_scheduler/servicemonitor.go
control-plane-operator/controllers/hostedcontrolplane/v2/kube_scheduler/servicemonitor_test.go
control-plane-operator/hostedclusterconfigoperator/controllers/resources/manifests/rbac.go
control-plane-operator/hostedclusterconfigoperator/controllers/resources/rbac/reconcile.go
control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go
support/metrics/sets.go

coderabbitai · 2026-06-05T08:10:21Z

+func adaptServiceMonitor(cpContext component.WorkloadContext, sm *prometheusoperatorv1.ServiceMonitor) error {
+	sm.Spec.NamespaceSelector = prometheusoperatorv1.NamespaceSelector{
+		MatchNames: []string{sm.Namespace},
+	}
+	sm.Spec.Endpoints[0].MetricRelabelConfigs = metrics.SchedulerRelabelConfigs(cpContext.MetricsSet)
+	util.ApplyClusterIDLabel(&sm.Spec.Endpoints[0], cpContext.HCP.Spec.ClusterID)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard endpoint indexing to avoid reconcile panic.

Line 15 and Line 16 dereference sm.Spec.Endpoints[0] without validating length. A zero-endpoint manifest will panic the controller instead of returning a reconcile error.

💡 Proposed fix

import ( + "fmt" + component "github.com/openshift/hypershift/support/controlplane-component" "github.com/openshift/hypershift/support/metrics" "github.com/openshift/hypershift/support/util" @@ func adaptServiceMonitor(cpContext component.WorkloadContext, sm *prometheusoperatorv1.ServiceMonitor) error { sm.Spec.NamespaceSelector = prometheusoperatorv1.NamespaceSelector{ MatchNames: []string{sm.Namespace}, } + if len(sm.Spec.Endpoints) == 0 { + return fmt.Errorf("kube-scheduler ServiceMonitor must define at least one endpoint") + } sm.Spec.Endpoints[0].MetricRelabelConfigs = metrics.SchedulerRelabelConfigs(cpContext.MetricsSet) util.ApplyClusterIDLabel(&sm.Spec.Endpoints[0], cpContext.HCP.Spec.ClusterID)

As per coding guidelines: "Avoid panics in Go except in truly unrecoverable cases."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func adaptServiceMonitor(cpContext component.WorkloadContext, sm *prometheusoperatorv1.ServiceMonitor) error {

sm.Spec.NamespaceSelector = prometheusoperatorv1.NamespaceSelector{

MatchNames: []string{sm.Namespace},

}

sm.Spec.Endpoints[0].MetricRelabelConfigs = metrics.SchedulerRelabelConfigs(cpContext.MetricsSet)

util.ApplyClusterIDLabel(&sm.Spec.Endpoints[0], cpContext.HCP.Spec.ClusterID)

import (

"fmt"

component "github.com/openshift/hypershift/support/controlplane-component"

"github.com/openshift/hypershift/support/metrics"

"github.com/openshift/hypershift/support/util"

prometheusoperatorv1 "github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring/v1"

)

func adaptServiceMonitor(cpContext component.WorkloadContext, sm *prometheusoperatorv1.ServiceMonitor) error {

sm.Spec.NamespaceSelector = prometheusoperatorv1.NamespaceSelector{

MatchNames: []string{sm.Namespace},

}

if len(sm.Spec.Endpoints) == 0 {

return fmt.Errorf("kube-scheduler ServiceMonitor must define at least one endpoint")

}

sm.Spec.Endpoints[0].MetricRelabelConfigs = metrics.SchedulerRelabelConfigs(cpContext.MetricsSet)

util.ApplyClusterIDLabel(&sm.Spec.Endpoints[0], cpContext.HCP.Spec.ClusterID)

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@control-plane-operator/controllers/hostedcontrolplane/v2/kube_scheduler/servicemonitor.go` around lines 11 - 16, The adaptServiceMonitor function currently indexes sm.Spec.Endpoints[0] without verifying endpoints exist, which can panic; update adaptServiceMonitor to first check len(sm.Spec.Endpoints) > 0 and return a clear error (or create/append a default endpoint if desired) when it is zero, then only set MetricRelabelConfigs and call util.ApplyClusterIDLabel on sm.Spec.Endpoints[0]; reference the adaptServiceMonitor function, sm.Spec.Endpoints, MetricRelabelConfigs, and util.ApplyClusterIDLabel when making the change.

coderabbitai · 2026-06-05T08:10:22Z

+func SchedulerRelabelConfigs(set MetricsSet) []prometheusoperatorv1.RelabelConfig {
+	switch set {
+	case MetricsSetTelemetry:
+		return sreMetricsSetConfig.KubeScheduler
+	case MetricsSetSRE:
+		return sreMetricsSetConfig.KubeScheduler
+	default:
+		return nil
+	}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Telemetry branch is incorrectly coupled to SRE config state.

Line 232 returns sreMetricsSetConfig.KubeScheduler for MetricsSetTelemetry, which makes Telemetry output depend on the SRE ConfigMap content. This can silently alter Telemetry scrape behavior whenever SRE config is present.

💡 Proposed fix

func SchedulerRelabelConfigs(set MetricsSet) []prometheusoperatorv1.RelabelConfig { switch set { case MetricsSetTelemetry: - return sreMetricsSetConfig.KubeScheduler + return nil case MetricsSetSRE: return sreMetricsSetConfig.KubeScheduler default: return nil } }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@support/metrics/sets.go` around lines 229 - 237, The Telemetry branch in SchedulerRelabelConfigs incorrectly returns sreMetricsSetConfig.KubeScheduler causing Telemetry to depend on the SRE config; change the MetricsSetTelemetry case to return the Telemetry-specific relabel config (e.g., telemetryMetricsSetConfig.KubeScheduler or a dedicated telemetry config variable) instead of sreMetricsSetConfig.KubeScheduler, ensuring MetricsSetSRE still returns sreMetricsSetConfig.KubeScheduler and the default returns nil.

dhruv-gautam added 3 commits June 4, 2026 13:52

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 5, 2026

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 5, 2026

openshift-ci Bot added the do-not-merge/needs-area label Jun 5, 2026

coderabbitai Bot reviewed Jun 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CNTRLPLANE-3584: Add RBAC for kube-scheduler /metrics/resources endpoint#8680

CNTRLPLANE-3584: Add RBAC for kube-scheduler /metrics/resources endpoint#8680
dhgautam99 wants to merge 3 commits into
openshift:mainfrom
dhgautam99:enable-kube-scheduler-resource-metrics

dhgautam99 commented Jun 5, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

openshift-merge-bot Bot commented Jun 5, 2026

Uh oh!

openshift-ci Bot commented Jun 5, 2026

Uh oh!

openshift-ci-robot commented Jun 5, 2026 •

edited by openshift-ci Bot

Loading

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Checklist:

Uh oh!

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading

Walkthrough

Uh oh!

openshift-ci Bot commented Jun 5, 2026

Uh oh!

codecov Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

hypershift-jira-solve-ci Bot commented Jun 5, 2026 •

edited by openshift-ci Bot

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 5, 2026

Uh oh!

coderabbitai Bot Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dhgautam99 commented Jun 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Checklist:

Summary by CodeRabbit

Uh oh!

openshift-merge-bot Bot commented Jun 5, 2026

Uh oh!

openshift-ci Bot commented Jun 5, 2026

Uh oh!

openshift-ci-robot commented Jun 5, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Checklist:

Uh oh!

coderabbitai Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

openshift-ci Bot commented Jun 5, 2026

Uh oh!

codecov Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hypershift-jira-solve-ci Bot commented Jun 5, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

Summary

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dhgautam99 commented Jun 5, 2026 •

edited by coderabbitai Bot

Loading

openshift-ci-robot commented Jun 5, 2026 •

edited by openshift-ci Bot

Loading

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading

codecov Bot commented Jun 5, 2026 •

edited

Loading

hypershift-jira-solve-ci Bot commented Jun 5, 2026 •

edited by openshift-ci Bot

Loading