Skip to content

Conversation

@DavidHurta
Copy link
Contributor

@DavidHurta DavidHurta commented Dec 10, 2025

This pull request proposes to migrate the CVO metrics endpoint from bearer token authentication to mutual TLS (mTLS) to comply with OpenShift metrics conventions.

Key changes:

  • Replaced custom certificate watching logic with the k8s.io/apiserver/pkg/server/dynamiccertificates package to simplify logic
  • Authenticate clients using mutual TLS (mTLS)
  • Authorize clients by verifying CN
  • Add configurable authentication/authorization options for HyperShift compatibility
  • Update ServiceMonitor to use mTLS instead of bearer tokens

Breaking Changes:

  • Prometheus must now authenticate using client certificates signed by a trusted cluster CA. Clients without valid certificates are rejected during the TLS handshake.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 10, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 10, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link

coderabbitai bot commented Dec 10, 2025

Walkthrough

Switched metrics authentication from bearer tokens to TLS client certificates with CN-based authorization, added dynamic serving-certificate and client-CA controllers, introduced a public MetricsOptions API and updated RunMetrics/createHttpServer signatures; updated ServiceMonitor, go.mod, tests, and startup flag wiring.

Changes

Cohort / File(s) Summary
Go module updates
go.mod
Added direct k8s.io/apiserver v0.34.1; removed gopkg.in/fsnotify.v1 v1.4.7 from the first require block and introduced github.com/fsnotify/fsnotify v1.9.0 in the other block (dependency reorganization).
ServiceMonitor manifest
install/0000_90_cluster-version-operator_02_servicemonitor.yaml
Removed bearerTokenFile from the ServiceMonitor endpoint and configured TLS client-cert scraping via tlsConfig.certFile and tlsConfig.keyFile.
Metrics implementation
pkg/cvo/metrics.go
Replaced token-review/auth flow with TLS client-certificate CN-based authorization; added public MetricsOptions struct; changed RunMetrics and createHttpServer signatures; introduced dynamic serving-cert and client-CA controllers and GetConfigForClient wiring; removed file-watcher/fsnotify and token-review helpers; added coordinated shutdown via contexts.
Metrics tests
pkg/cvo/metrics_test.go
Converted tests from bearer-token/tokenReview to TLS client-certificate-based tests (mocked TLS PeerCertificates and CN checks); removed fake token-review scaffolding and updated expectations.
Startup, flags, and call sites
pkg/start/start.go, cmd/cluster-version-operator/start.go, pkg/start/start_integration_test.go
Replaced flat ListenAddr/ServingCertFile/ServingKeyFile fields with nested MetricsOptions in exported Options; updated flag bindings and call sites to pass MetricsOptions to RunMetrics; updated integration tests to use MetricsOptions.ListenAddress.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

@DavidHurta
Copy link
Contributor Author

/test ?

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 10, 2025

@DavidHurta: The following commands are available to trigger required jobs:

/test e2e-agnostic-operator
/test e2e-agnostic-operator-techpreview
/test e2e-agnostic-ovn
/test e2e-agnostic-ovn-upgrade-into-change
/test e2e-agnostic-ovn-upgrade-into-change-techpreview
/test e2e-agnostic-ovn-upgrade-out-of-change
/test e2e-agnostic-ovn-upgrade-out-of-change-techpreview
/test e2e-aws-ovn-techpreview
/test e2e-hypershift
/test e2e-hypershift-conformance
/test gofmt
/test images
/test lint
/test okd-scos-images
/test unit
/test verify-deps
/test verify-update
/test verify-yaml

The following commands are available to trigger optional jobs:

/test e2e-agnostic-operator-devpreview
/test e2e-extended-tests
/test okd-scos-e2e-aws-ovn

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-cluster-version-operator-main-e2e-agnostic-operator
pull-ci-openshift-cluster-version-operator-main-e2e-agnostic-ovn
pull-ci-openshift-cluster-version-operator-main-e2e-agnostic-ovn-upgrade-into-change
pull-ci-openshift-cluster-version-operator-main-e2e-agnostic-ovn-upgrade-out-of-change
pull-ci-openshift-cluster-version-operator-main-e2e-aws-ovn-techpreview
pull-ci-openshift-cluster-version-operator-main-e2e-hypershift
pull-ci-openshift-cluster-version-operator-main-e2e-hypershift-conformance
pull-ci-openshift-cluster-version-operator-main-gofmt
pull-ci-openshift-cluster-version-operator-main-images
pull-ci-openshift-cluster-version-operator-main-lint
pull-ci-openshift-cluster-version-operator-main-okd-scos-images
pull-ci-openshift-cluster-version-operator-main-unit
pull-ci-openshift-cluster-version-operator-main-verify-deps
pull-ci-openshift-cluster-version-operator-main-verify-update
pull-ci-openshift-cluster-version-operator-main-verify-yaml
Details

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 10, 2025
@DavidHurta
Copy link
Contributor Author

/test e2e-agnostic-ovn
/test e2e-hypershift
/test e2e-hypershift-conformance

1 similar comment
@DavidHurta
Copy link
Contributor Author

/test e2e-agnostic-ovn
/test e2e-hypershift
/test e2e-hypershift-conformance

@DavidHurta
Copy link
Contributor Author

/test all

@DavidHurta
Copy link
Contributor Author

/test all

@DavidHurta
Copy link
Contributor Author

DavidHurta commented Dec 13, 2025

Note: PromeCIeus and the e2e-hypershift job can be used to verify that the CVO metrics in a hosted control plane can be scraped (when HyperShift is specified to use the OCP monitoring).

@DavidHurta
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Dec 13, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/cvo/metrics.go (1)

201-299: Harden the listener TLS config: ensure MinVersion/ciphers apply even if GetConfigForClient returns nil.

The base TLS config is secured via crypto.SecureTLSConfig, but the listener uses a fresh tls.Config literal containing only GetConfigForClient. If the callback returns nil or fails, the listener operates with default TLS settings lacking explicit cipher suite and minimum version constraints. Wrap the listener config with crypto.SecureTLSConfig to guarantee hardened settings at the listener level.

-	tlsConfig := &tls.Config{GetConfigForClient: servingCertController.GetConfigForClient}
+	tlsConfig := crypto.SecureTLSConfig(&tls.Config{GetConfigForClient: servingCertController.GetConfigForClient})
 	go startListening(server, tlsConfig, listenAddress, resultChannel)

Also verify the process has RBAC to read kube-system/extension-apiserver-authentication and that client-ca-file is the intended trust root for Prometheus' client cert chain in all target environments.

🧹 Nitpick comments (3)
pkg/cvo/metrics_test.go (1)

4-6: CN-based TLS auth tests are aligned with the new implementation; consider using httptest.NewRequest for a fully-formed request.
Current http.NewRequest("GET", "url-not-important", nil) is probably fine for this handler, but httptest.NewRequest("GET", "https://example.invalid/metrics", nil) is more idiomatic and avoids edge cases if the handler later reads URL/Host.

Also applies to: 1030-1088

pkg/cvo/metrics.go (2)

126-143: Rename disableAuth to disableAuthorization to avoid confusion (authn vs authz).
Right now createHttpServer(disableAuth bool) is called with metricsOptions.DisableAuthorization, but the parameter name reads like “disable authentication”.

-func createHttpServer(disableAuth bool) *http.Server {
-	if disableAuth {
+func createHttpServer(disableAuthorization bool) *http.Server {
+	if disableAuthorization {
 		handler := http.NewServeMux()
 		handler.Handle("/metrics", promhttp.Handler())
 		server := &http.Server{
 			Handler: handler,
 		}
 		return server
 	}

149-166: CN authorization is very strict; confirm the allowed identity set is complete for all supported deployments.
Hardcoding only system:serviceaccount:openshift-monitoring:prometheus-k8s may be correct, but it will break scraping if the client cert CN differs in hosted control planes / alternate Prometheus setups. Consider allowing a small allowlist (still static) to reduce fragility.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between d4cb3b0 and 920ad71.

⛔ Files ignored due to path filters (69)
  • go.sum is excluded by !**/*.sum
  • vendor/github.com/fsnotify/fsnotify/.cirrus.yml is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/fsnotify/fsnotify/.gitignore is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/fsnotify/fsnotify/.mailmap is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/fsnotify/fsnotify/CHANGELOG.md is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/fsnotify/fsnotify/CONTRIBUTING.md is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/fsnotify/fsnotify/LICENSE is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/fsnotify/fsnotify/README.md is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/fsnotify/fsnotify/backend_fen.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/fsnotify/fsnotify/backend_inotify.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/fsnotify/fsnotify/backend_kqueue.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/fsnotify/fsnotify/backend_other.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/fsnotify/fsnotify/backend_windows.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/fsnotify/fsnotify/fsnotify.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/fsnotify/fsnotify/internal/darwin.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/fsnotify/fsnotify/internal/debug_darwin.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/fsnotify/fsnotify/internal/debug_dragonfly.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/fsnotify/fsnotify/internal/debug_freebsd.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/fsnotify/fsnotify/internal/debug_kqueue.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/fsnotify/fsnotify/internal/debug_linux.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/fsnotify/fsnotify/internal/debug_netbsd.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/fsnotify/fsnotify/internal/debug_openbsd.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/fsnotify/fsnotify/internal/debug_solaris.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/fsnotify/fsnotify/internal/debug_windows.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/fsnotify/fsnotify/internal/freebsd.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/fsnotify/fsnotify/internal/internal.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/fsnotify/fsnotify/internal/unix.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/fsnotify/fsnotify/internal/unix2.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/fsnotify/fsnotify/internal/windows.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/fsnotify/fsnotify/shared.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/fsnotify/fsnotify/staticcheck.conf is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/fsnotify/fsnotify/system_bsd.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/fsnotify/fsnotify/system_darwin.go is excluded by !vendor/**, !**/vendor/**
  • vendor/gopkg.in/fsnotify.v1/.editorconfig is excluded by !vendor/**, !**/vendor/**
  • vendor/gopkg.in/fsnotify.v1/.gitignore is excluded by !vendor/**, !**/vendor/**
  • vendor/gopkg.in/fsnotify.v1/.travis.yml is excluded by !vendor/**, !**/vendor/**
  • vendor/gopkg.in/fsnotify.v1/AUTHORS is excluded by !vendor/**, !**/vendor/**
  • vendor/gopkg.in/fsnotify.v1/CHANGELOG.md is excluded by !vendor/**, !**/vendor/**
  • vendor/gopkg.in/fsnotify.v1/CONTRIBUTING.md is excluded by !vendor/**, !**/vendor/**
  • vendor/gopkg.in/fsnotify.v1/LICENSE is excluded by !vendor/**, !**/vendor/**
  • vendor/gopkg.in/fsnotify.v1/README.md is excluded by !vendor/**, !**/vendor/**
  • vendor/gopkg.in/fsnotify.v1/fen.go is excluded by !vendor/**, !**/vendor/**
  • vendor/gopkg.in/fsnotify.v1/fsnotify.go is excluded by !vendor/**, !**/vendor/**
  • vendor/gopkg.in/fsnotify.v1/inotify.go is excluded by !vendor/**, !**/vendor/**
  • vendor/gopkg.in/fsnotify.v1/inotify_poller.go is excluded by !vendor/**, !**/vendor/**
  • vendor/gopkg.in/fsnotify.v1/kqueue.go is excluded by !vendor/**, !**/vendor/**
  • vendor/gopkg.in/fsnotify.v1/open_mode_bsd.go is excluded by !vendor/**, !**/vendor/**
  • vendor/gopkg.in/fsnotify.v1/open_mode_darwin.go is excluded by !vendor/**, !**/vendor/**
  • vendor/gopkg.in/fsnotify.v1/windows.go is excluded by !vendor/**, !**/vendor/**
  • vendor/k8s.io/apiserver/pkg/server/dynamiccertificates/cert_key.go is excluded by !vendor/**, !**/vendor/**
  • vendor/k8s.io/apiserver/pkg/server/dynamiccertificates/client_ca.go is excluded by !vendor/**, !**/vendor/**
  • vendor/k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go is excluded by !vendor/**, !**/vendor/**
  • vendor/k8s.io/apiserver/pkg/server/dynamiccertificates/dynamic_cafile_content.go is excluded by !vendor/**, !**/vendor/**
  • vendor/k8s.io/apiserver/pkg/server/dynamiccertificates/dynamic_serving_content.go is excluded by !vendor/**, !**/vendor/**
  • vendor/k8s.io/apiserver/pkg/server/dynamiccertificates/dynamic_sni_content.go is excluded by !vendor/**, !**/vendor/**
  • vendor/k8s.io/apiserver/pkg/server/dynamiccertificates/interfaces.go is excluded by !vendor/**, !**/vendor/**
  • vendor/k8s.io/apiserver/pkg/server/dynamiccertificates/named_certificates.go is excluded by !vendor/**, !**/vendor/**
  • vendor/k8s.io/apiserver/pkg/server/dynamiccertificates/static_content.go is excluded by !vendor/**, !**/vendor/**
  • vendor/k8s.io/apiserver/pkg/server/dynamiccertificates/tlsconfig.go is excluded by !vendor/**, !**/vendor/**
  • vendor/k8s.io/apiserver/pkg/server/dynamiccertificates/union_content.go is excluded by !vendor/**, !**/vendor/**
  • vendor/k8s.io/apiserver/pkg/server/dynamiccertificates/util.go is excluded by !vendor/**, !**/vendor/**
  • vendor/k8s.io/client-go/tools/events/OWNERS is excluded by !vendor/**, !**/vendor/**
  • vendor/k8s.io/client-go/tools/events/doc.go is excluded by !vendor/**, !**/vendor/**
  • vendor/k8s.io/client-go/tools/events/event_broadcaster.go is excluded by !vendor/**, !**/vendor/**
  • vendor/k8s.io/client-go/tools/events/event_recorder.go is excluded by !vendor/**, !**/vendor/**
  • vendor/k8s.io/client-go/tools/events/fake.go is excluded by !vendor/**, !**/vendor/**
  • vendor/k8s.io/client-go/tools/events/helper.go is excluded by !vendor/**, !**/vendor/**
  • vendor/k8s.io/client-go/tools/events/interfaces.go is excluded by !vendor/**, !**/vendor/**
  • vendor/modules.txt is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (5)
  • go.mod (2 hunks)
  • install/0000_90_cluster-version-operator_02_servicemonitor.yaml (1 hunks)
  • pkg/cvo/metrics.go (5 hunks)
  • pkg/cvo/metrics_test.go (2 hunks)
  • pkg/start/start.go (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • install/0000_90_cluster-version-operator_02_servicemonitor.yaml
  • pkg/cvo/metrics_test.go
  • pkg/start/start.go
  • pkg/cvo/metrics.go
  • go.mod
🧬 Code graph analysis (1)
pkg/start/start.go (1)
pkg/cvo/metrics.go (2)
  • MetricsOptions (196-199)
  • RunMetrics (208-299)
🪛 ast-grep (0.40.0)
pkg/cvo/metrics.go

[warning] 245-245: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures

(missing-ssl-minversion-go)


[warning] 247-247: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{ClientAuth: tls.RequireAndVerifyClientCert}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures

(missing-ssl-minversion-go)


[warning] 273-273: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{GetConfigForClient: servingCertController.GetConfigForClient}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures

(missing-ssl-minversion-go)

🔇 Additional comments (5)
install/0000_90_cluster-version-operator_02_servicemonitor.yaml (1)

14-21: Verify Prometheus mounts metrics-client-certs at the referenced file paths (otherwise scraping will fail).
ServiceMonitor now expects client cert/key at /etc/prometheus/secrets/metrics-client-certs/{tls.crt,tls.key}; ensure the cluster-monitoring-managed Prometheus pod has that secret mounted and populated in all target topologies (self-managed + hosted).

go.mod (1)

29-29: Dependency changes look consistent with dynamiccertificates adoption; please verify module graph is tidy and conflict-free.
Run a go mod tidy-equivalent in CI and ensure no remaining imports of gopkg.in/fsnotify.v1, and no k8s.io/* version skew introduced by the new direct k8s.io/apiserver requirement.

Also applies to: 42-42

pkg/start/start.go (1)

353-358: Double-check HyperShift semantics: this disables both client-cert auth and CN authorization.
Wiring to o.HyperShift is clean, but please confirm the hosted-control-plane scraping model really expects unauthenticated metrics over TLS (vs. mTLS with different client identity).

pkg/cvo/metrics.go (2)

18-20: Good move switching to dynamiccertificates; verify k8s.io/apiserver integration matches the vendored K8s minor version.
This import-level change is appropriate for hot-reloading certs/CA; just ensure no k8s.io/* skew and that dynamiccertificates APIs used here match v0.34.1 expectations.


196-199: MetricsOptions is a clean API shape for callers.

@DavidHurta
Copy link
Contributor Author

/test e2e-hypershift
/test e2e-agnostic-ovn

@DavidHurta
Copy link
Contributor Author

/test all

@DavidHurta
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Dec 19, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
pkg/cvo/metrics.go (1)

149-166: Certificate-based authorization correctly implemented.

The CN validation logic is sound: it checks for the presence of peer certificates and validates that the Common Name matches the expected Prometheus service account. The HTTP status codes (401 for missing certificate, 403 for unauthorized CN) are appropriate.

Note: The CN "system:serviceaccount:openshift-monitoring:prometheus-k8s" is hardcoded. Consider whether this should be configurable via MetricsOptions if other clients might need access in the future.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 920ad71 and ff6d96e.

📒 Files selected for processing (4)
  • install/0000_90_cluster-version-operator_02_servicemonitor.yaml (1 hunks)
  • pkg/cvo/metrics.go (5 hunks)
  • pkg/cvo/metrics_test.go (2 hunks)
  • pkg/start/start.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/cvo/metrics_test.go
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • install/0000_90_cluster-version-operator_02_servicemonitor.yaml
  • pkg/start/start.go
  • pkg/cvo/metrics.go
🧬 Code graph analysis (1)
pkg/start/start.go (1)
pkg/cvo/metrics.go (2)
  • MetricsOptions (196-199)
  • RunMetrics (208-312)
🪛 ast-grep (0.40.0)
pkg/cvo/metrics.go

[warning] 256-256: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{ClientAuth: clientAuth}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures

(missing-ssl-minversion-go)


[warning] 283-286: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{
ClientAuth: clientAuth,
GetConfigForClient: servingCertController.GetConfigForClient,
}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures

(missing-ssl-minversion-go)

🔇 Additional comments (10)
install/0000_90_cluster-version-operator_02_servicemonitor.yaml (1)

14-21: LGTM! TLS client certificate authentication correctly configured.

The ServiceMonitor now uses TLS client certificates (certFile and keyFile) for authentication instead of bearer tokens, aligning with the mTLS implementation in pkg/cvo/metrics.go that validates the client certificate's CN.

pkg/cvo/metrics.go (8)

18-19: LGTM! Imports correctly updated for certificate-based approach.

The new imports for dynamiccertificates and kubernetes client replace the previous token-based authentication dependencies and support the dynamic certificate loading implementation.


126-143: LGTM! Simplified server creation aligns with TLS-based authentication.

The function signature correctly removes the authentication client dependency since authentication is now handled at the TLS layer via client certificates, with authorization delegated to the authHandler.


196-199: LGTM! Clean options struct for metrics configuration.

The MetricsOptions struct provides a clear, type-safe way to configure authentication and authorization behavior for the metrics server.


208-223: LGTM! Serving certificate controller correctly initialized.

The serving certificate controller is properly set up to watch and automatically reload the server's TLS certificate and key files. The RunOnce initialization before starting the background watcher ensures the files are loaded before the server starts.


225-249: LGTM! Client CA controller correctly set up when authentication is enabled.

The conditional creation and initialization of the client CA controller is appropriate—it's only needed when client certificate authentication is enabled. The controller watches the extension-apiserver-authentication ConfigMap in kube-system for CA changes, which aligns with Kubernetes' standard approach for extension API server authentication.


276-288: LGTM! Server and TLS configuration properly wired.

The HTTP server creation respects the DisableAuthorization setting, and the TLS configuration correctly uses GetConfigForClient to enable dynamic certificate updates. The ClientAuth setting in the final config ensures proper fallback behavior if the callback fails.


290-311: LGTM! Server lifecycle properly managed with graceful shutdown.

The implementation correctly handles the run/shutdown contexts, waits for server termination, performs graceful shutdown on context cancellation, and properly collects results from the server goroutine. Error handling and logging are appropriate.


252-267: This concern is not valid. crypto.SecureTLSConfig from github.com/openshift/library-go/pkg/crypto automatically sets MinVersion to the cluster's default TLS version (via DefaultTLSVersion()) when MinVersion is not explicitly set in the passed tls.Config. The code in metrics.go correctly relies on this behavior and does not need explicit MinVersion configuration.

Likely an incorrect or invalid review comment.

pkg/start/start.go (1)

353-357: LGTM! MetricsOptions correctly configured for HyperShift mode.

The creation of the MetricsOptions struct with both DisableAuthentication and DisableAuthorization set to o.HyperShift is appropriate. This ensures that in HyperShift deployments, both authentication and authorization are disabled together, while in standard deployments, both are enabled.

@DavidHurta
Copy link
Contributor Author

/retest

1 similar comment
@DavidHurta
Copy link
Contributor Author

/retest

@DavidHurta
Copy link
Contributor Author

/test e2e-hypershift-conformance
/test e2e-hypershift

@DavidHurta
Copy link
Contributor Author

/test all

@DavidHurta
Copy link
Contributor Author

/test all

@DavidHurta
Copy link
Contributor Author

/test all

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 27, 2026
DavidHurta and others added 20 commits January 27, 2026 19:34
…updates

This commit's goal is to prepare the existing code for mTLS support.

In OpenShift, core operators SHOULD require authentication, and they
SHOULD support TLS client certificate authentication [1]. They also
SHOULD support local authorization and SHOULD allow the well-known
metrics scraping identity [1]. To achieve this, an operator must be able
to verify a client's certificate. To do this, the certificate can be
verified using the certificate authority (CA) bundle located in a
ConfigMap in the kube-system namespace [2].

This would entail an implementation of a new controller to watch the
ConfigMap for changes. To avoid such implementation to avoid
potential bugs and future maintenance, my goal is to utilize the
`k8s.io/apiserver/pkg/server/dynamiccertificates` package for this goal
as the package provides a functionality for this specific use case.

While doing so, we can also rework the existing, a bit complex,
implementation and utilize the package for existing use cases as well
to simplify the logic and use an existing, well-tested library.

[1]: https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#metrics
[2]: https://rhobs-handbook.netlify.app/products/openshiftmonitoring/collecting_metrics.md/#exposing-metrics-for-prometheus

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
In OpenShift, core operators SHOULD require authentication and they
SHOULD support TLS client certificate authentication [1]. They also
SHOULD support local authorization and SHOULD allow the well-known
metrics scraping identity [1]. To achieve this, an operator must be able
to verify a client's certificate. To do this, the certificate can be
verified using the certificate authority (CA) bundle located at the
client-ca-file key of the kube-system/extension-apiserver-authentication
ConfigMap [2].

[1]: https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#metrics
[2]: https://rhobs-handbook.netlify.app/products/openshiftmonitoring/collecting_metrics.md/#exposing-metrics-for-prometheus

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
In OpenShift, core operators SHOULD support local authorization and
SHOULD allow the well-known metrics scraping identity
(system:serviceaccount:openshift-monitoring:prometheus-k8s) to access
the /metrics endpoint. They MAY support delegated authorization check
via SubjectAccessReviews. [1]

The well-known metrics scraping identity's client certificate is issued
for the system:serviceaccount:openshift-monitoring:prometheus-k8s
Common Name (CN) and signed by the kubernetes.io/kube-apiserver-client
signer. [2]

Thus, the commit utilizes this fact to check the client's certificate
for this specific CN value. This is also done by the hardcodedauthorizer
package utilized by other OpenShift operators for the metrics
endpoint [3].

We could utilize the existing bearer token authorization as a fallback.
However, I would like to minimize the attack surface. Especially for
security things that we are implementing and testing, rather than
importing from well-established modules.

The commit implements a user information extraction from a
certificate to minimize the needed dependencies.

[1]: https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#metrics
[2]: https://rhobs-handbook.netlify.app/products/openshiftmonitoring/collecting_metrics.md/#exposing-metrics-for-prometheus
[3]: https://pkg.go.dev/github.com/openshift/library-go/pkg/authorization/hardcodedauthorizer

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The `o.HyperShift` option is not available in older release branches.
In newer branches, the option can be utilized.
In HyperShift, the CVO currently needs to have disabled both
authorization and authentication. Ensure the aspects are disabled so as
not break HyperShift.

However, in the future, the authentication will be enabled using mTLS
and a mounted CA bundle file. Thus, authentication needs to be
configurable.

Authorization needs to be configurable as well because HyperShift
allows a custom monitoring stack to scrape hosted control plane
components. In the future in HyperShift, authentication of the metrics
endpoint of the CVO will be enforced; however, the authorization will be
disabled. This commit prepares the code for these changes.
The previous approach of checking the client certificate on every HTTP
request was redundant. The certificate remains constant for the entire
session duration, which can span days with TLS session tickets and
resumptions.

Optimize authorization by verifying the CN once during the TLS handshake
instead.

This moves the authorization from the application onto the TLS layer.
This can result in harder debugging due to the TLS layer failing
without providing detailed information. For example:

```
curl: (56) OpenSSL SSL_read: error:0A000412:SSL routines::sslv3 alert bad certificate, errno 0
command terminated with exit code 56
```

However, it also makes the current authorization mechanism more
explicit. The previous mechanism could infer that every HTTP request
was being authorized, but that was not the case in reality.

Some alternatives may be:
- Disable session tickets and the resumption support to force more
frequent authorizations.
- Verify the client's certificate in the application layer, but add
useful logic, such as checking for the certificate expiration date per
every HTTP request.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This reverts commit 290ee8b.

The used library does not allow to modify the returned
TLS error message to return `access_denied` on authorization failures.
The logic would proceed to return `bad_certificate` on authorization
failures.

Revert to HTTP authorization to be able to return a user-friendly
return values, which are correct.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@DavidHurta
Copy link
Contributor Author

/test ci/prow/unit

A flake?

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 27, 2026

@DavidHurta: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test e2e-agnostic-operator
/test e2e-agnostic-operator-techpreview
/test e2e-agnostic-ovn
/test e2e-agnostic-ovn-upgrade-into-change
/test e2e-agnostic-ovn-upgrade-into-change-techpreview
/test e2e-agnostic-ovn-upgrade-out-of-change
/test e2e-agnostic-ovn-upgrade-out-of-change-techpreview
/test e2e-aws-ovn-techpreview
/test e2e-hypershift
/test e2e-hypershift-conformance
/test gofmt
/test images
/test lint
/test okd-scos-images
/test unit
/test verify-deps
/test verify-update
/test verify-yaml

The following commands are available to trigger optional jobs:

/test e2e-agnostic-operator-devpreview
/test e2e-extended-tests
/test okd-scos-e2e-aws-ovn

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-cluster-version-operator-main-e2e-agnostic-operator
pull-ci-openshift-cluster-version-operator-main-e2e-agnostic-ovn
pull-ci-openshift-cluster-version-operator-main-e2e-agnostic-ovn-upgrade-into-change
pull-ci-openshift-cluster-version-operator-main-e2e-agnostic-ovn-upgrade-out-of-change
pull-ci-openshift-cluster-version-operator-main-e2e-aws-ovn-techpreview
pull-ci-openshift-cluster-version-operator-main-e2e-hypershift
pull-ci-openshift-cluster-version-operator-main-e2e-hypershift-conformance
pull-ci-openshift-cluster-version-operator-main-gofmt
pull-ci-openshift-cluster-version-operator-main-images
pull-ci-openshift-cluster-version-operator-main-lint
pull-ci-openshift-cluster-version-operator-main-okd-scos-images
pull-ci-openshift-cluster-version-operator-main-unit
pull-ci-openshift-cluster-version-operator-main-verify-deps
pull-ci-openshift-cluster-version-operator-main-verify-update
pull-ci-openshift-cluster-version-operator-main-verify-yaml
Details

In response to this:

/test ci/prow/unit

A flake?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@DavidHurta
Copy link
Contributor Author

/test unit

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 27, 2026

@DavidHurta: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@hongkailiu
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 29, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 29, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: DavidHurta, hongkailiu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [DavidHurta,hongkailiu]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants