OCPBUGS-86008: Gate Route watch on management cluster capability#8484
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
Skipping CI for Draft Pull Request. |
📝 WalkthroughWalkthroughThe operator now does a one-time management-cluster discovery at startup to detect capabilities and stores them in HostedClusterConfigOperatorConfig.ManagementClusterCapabilities. The resources controller's Setup checks that capability checker and registers a watch for OpenShift Route resources only when the Route capability is detected; startup fails if discovery or capability detection returns an error. Sequence Diagram(s)sequenceDiagram
participant Cmd
participant DiscoveryClient
participant ManagementClusterAPI
participant CapabilitiesDetector
participant ResourcesController
participant RouteAPI
Cmd->>DiscoveryClient: create discovery client from kubeconfig
DiscoveryClient->>ManagementClusterAPI: query API resources
ManagementClusterAPI-->>DiscoveryClient: API resource list
DiscoveryClient->>CapabilitiesDetector: provide resource list
CapabilitiesDetector-->>Cmd: detected capabilities
Cmd->>ResourcesController: provide ManagementClusterCapabilities
ResourcesController->>CapabilitiesDetector: check CapabilityRoute
alt Route capability present
ResourcesController->>RouteAPI: register watch for Route resources
RouteAPI-->>ResourcesController: watch started
else Route capability absent
ResourcesController-->>ResourcesController: skip Route watch
end
Suggested reviewers
🚥 Pre-merge checks | ✅ 11 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
daa5044 to
94461ec
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #8484 +/- ##
==========================================
- Coverage 40.40% 40.40% -0.01%
==========================================
Files 755 755
Lines 93235 93249 +14
==========================================
Hits 37675 37675
- Misses 52858 52872 +14
Partials 2702 2702
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
94461ec to
f54aa29
Compare
|
@smrtrfszm: This pull request references Jira Issue OCPBUGS-86008, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@smrtrfszm: This pull request references Jira Issue OCPBUGS-86008, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
f54aa29 to
07098a5
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go`:
- Around line 313-318: The reconcile logic still performs a Get on routev1.Route
even when the management cluster lacks the Route API; update the code to gate
that Route-dependent logic by checking
opts.ManagementClusterCapabilities.Has(capabilities.CapabilityRoute) before
attempting any route operations. Add the capability check either at the caller
that invokes reconcileMetricsForwarder or at the start of
reconcileMetricsForwarder itself so that when CapabilityRoute is false it
returns early (or skips the Route.Get/Route-specific flows) and does not attempt
to interact with routev1.Route.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 1d5b9f2e-167f-44f7-93d1-a03784aaf380
📒 Files selected for processing (3)
control-plane-operator/hostedclusterconfigoperator/cmd.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.gocontrol-plane-operator/hostedclusterconfigoperator/operator/config.go
|
/lgtm |
|
Scheduling tests matching the |
The hosted-cluster-config-operator unconditionally watches route.openshift.io/v1 Routes against the management cluster to react to hostname changes on the metrics-proxy Route. On management clusters that do not expose the Routes API (e.g. non-OpenShift management clusters) this watch fails during controller setup and prevents HCCO from starting. Detect the management cluster Route capability using the existing capabilities.DetectManagementClusterCapabilities helper and only register the watch when route.openshift.io is registered. This mirrors the pattern already used in other parts of the code.
07098a5 to
1f4af14
Compare
|
/lgtm |
|
/retest |
AI Test Failure AnalysisJob: Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6 |
|
/retest |
1 similar comment
|
/retest |
|
/verified by @smrtrfszm Before this change on a vanilla kubernetes, HCCO was in a crash loop with error message: After this change on a vanilla kubernetes, HCCO is 1/1 ready: |
|
@smrtrfszm: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@coderabbitai resume |
✅ Actions performedReviews resumed. |
|
@smrtrfszm: This pull request references Jira Issue OCPBUGS-86008, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@control-plane-operator/hostedclusterconfigoperator/cmd.go`:
- Around line 228-238: Add unit tests covering the startup capability-detection
path: write tests that exercise discovery.NewDiscoveryClientForConfig and
capabilities.DetectManagementClusterCapabilities wiring in cmd.go so you verify
(1) successful discovery returns expected mgmtClusterCaps and those capabilities
are injected into controller setup inputs, (2) discovery client creation failure
returns the expected error, and (3) capability detection failure returns the
expected error. Use a fake discovery client (or mock
NewDiscoveryClientForConfig) to simulate success and failure paths and assert
the returned error messages and that the mgmtClusterCaps value is passed into
the controllers' setup/config functions (the code that consumes mgmtClusterCaps
during controller initialization). Ensure tests reference the functions
NewDiscoveryClientForConfig, DetectManagementClusterCapabilities and the
controller setup entrypoints to validate propagation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 4cc20e08-83f7-4c1f-b004-e24b4f32f511
📒 Files selected for processing (3)
control-plane-operator/hostedclusterconfigoperator/cmd.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.gocontrol-plane-operator/hostedclusterconfigoperator/operator/config.go
| // Detect the management cluster capabilities once at startup so controllers can | ||
| // gate optional behavior (e.g. watching route.openshift.io Routes) without each | ||
| // having to build its own discovery client. | ||
| cpDiscoveryClient, err := discovery.NewDiscoveryClientForConfig(cpConfig) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to create management cluster discovery client: %w", err) | ||
| } | ||
| mgmtClusterCaps, err := capabilities.DetectManagementClusterCapabilities(cpDiscoveryClient) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to detect management cluster capabilities: %w", err) | ||
| } |
There was a problem hiding this comment.
Add unit coverage for capability detection and config wiring.
Line 228 and Line 341 introduce startup-time capability detection and behavior-driving config injection, but this path has no accompanying tests in the PR. Please add unit tests for at least: discovery success, discovery failure, and capability propagation into controller setup inputs.
As per coding guidelines: **/*.go: Always include unit tests when creating new functions or modifying existing ones.
Also applies to: 320-342
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@control-plane-operator/hostedclusterconfigoperator/cmd.go` around lines 228 -
238, Add unit tests covering the startup capability-detection path: write tests
that exercise discovery.NewDiscoveryClientForConfig and
capabilities.DetectManagementClusterCapabilities wiring in cmd.go so you verify
(1) successful discovery returns expected mgmtClusterCaps and those capabilities
are injected into controller setup inputs, (2) discovery client creation failure
returns the expected error, and (3) capability detection failure returns the
expected error. Use a fake discovery client (or mock
NewDiscoveryClientForConfig) to simulate success and failure paths and assert
the returned error messages and that the mgmtClusterCaps value is passed into
the controllers' setup/config functions (the code that consumes mgmtClusterCaps
during controller initialization). Ensure tests reference the functions
NewDiscoveryClientForConfig, DetectManagementClusterCapabilities and the
controller setup entrypoints to validate propagation.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cblecker, smrtrfszm The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/override codecov/patch |
|
/override codecov/project |
|
@csrwng: Overrode contexts on behalf of csrwng: codecov/patch DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@csrwng: Overrode contexts on behalf of csrwng: codecov/project DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@smrtrfszm: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Now I have all the evidence needed. Let me compile the final report. Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryBoth Codecov checks fail because none of the 38 new/modified lines introduced by this PR are exercised by any unit test. The Root CauseThe root cause is straightforward: the two files modified by this PR lack unit test coverage for the changed lines.
The capability detection logic itself ( Recommendations
Evidence
|
25817d4
into
openshift:main
|
@smrtrfszm: Jira Issue Verification Checks: Jira Issue OCPBUGS-86008 Jira Issue OCPBUGS-86008 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Fix included in release 5.0.0-0.nightly-2026-06-06-100407 |
|
/cherry-pick release-4.22 |
|
@smrtrfszm: new pull request created: #8692 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What this PR does / why we need it:
The hosted-cluster-config-operator unconditionally watches route.openshift.io/v1 Routes against the management cluster to react to hostname changes on the metrics-proxy Route. On management clusters that do not expose the Routes API (e.g. non-OpenShift management clusters) this watch fails during controller setup and prevents HCCO from starting.
Detect the management cluster Route capability using the existing
capabilities.DetectManagementClusterCapabilitieshelper and only register the watch when route.openshift.io is registered. This mirrors the pattern already used in other parts of the code.Which issue(s) this PR fixes:
Fixes #OCPBUGS-86008
Special notes for your reviewer:
Checklist:
Summary by CodeRabbit
New Features
Bug Fixes