Skip to content

OCPBUGS-54790: Move packageserver PDB from guest cluster to management cluster#8459

Open
dhgautam99 wants to merge 2 commits into
openshift:mainfrom
dhgautam99:remove-packageserver-pdb-from-guest
Open

OCPBUGS-54790: Move packageserver PDB from guest cluster to management cluster#8459
dhgautam99 wants to merge 2 commits into
openshift:mainfrom
dhgautam99:remove-packageserver-pdb-from-guest

Conversation

@dhgautam99
Copy link
Copy Markdown

@dhgautam99 dhgautam99 commented May 7, 2026

What this PR does / why we need it:

The packageserver PodDisruptionBudget was being created in the guest cluster's
openshift-operator-lifecycle-manager namespace by CVO. However, packageserver
pods run on the management cluster in the clusters-<hosted-cluster> namespace,
making the guest cluster PDB ineffective.

This PR:

  • Prevents CVO from creating the packageserver PDB in the guest cluster (via manifestsToOmit)
  • Cleans up the orphaned PDB on existing clusters during upgrade (via resourcesToRemove)
  • Creates the PDB in the management cluster namespace using the cpov2 framework

Which issue(s) this PR fixes:

Fixes OCPBUGS-54790

Special notes for your reviewer:

The PDB cleanup applies to all platforms (both IBM/PowerVS and default) since
packageserver runs on the management cluster regardless of platform.

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • Improvements

    • Added a PodDisruptionBudget for the packageserver to improve availability during disruptions.
  • Tests

    • Added a unit test validating the packageserver component is created and named "packageserver".

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels May 7, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@dhgautam99: This pull request references Jira Issue OCPBUGS-54790, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

What this PR does / why we need it:

The packageserver PodDisruptionBudget was being created in the guest cluster's
openshift-operator-lifecycle-manager namespace by CVO. However, packageserver
pods run on the management cluster in the clusters-<hosted-cluster> namespace,
making the guest cluster PDB ineffective.

This PR:

  • Prevents CVO from creating the packageserver PDB in the guest cluster (via manifestsToOmit)
  • Cleans up the orphaned PDB on existing clusters during upgrade (via resourcesToRemove)
  • Creates the PDB in the management cluster namespace using the cpov2 framework

Which issue(s) this PR fixes:

Fixes OCPBUGS-54790

Special notes for your reviewer:

The PDB cleanup applies to all platforms (both IBM/PowerVS and default) since
packageserver runs on the management cluster regardless of platform.

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 7, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 7, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 7, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

A PodDisruptionBudget manifest (packageserver-pdb, minAvailable: 1) was added. The packageserver component now adapts that PDB manifest via AdaptPodDisruptionBudget, and a unit test for NewComponent was added. The CVO deployment code imports policy/v1 and includes packageserver-pdb in resourcesToRemove for IBMCloudPlatform/PowerVSPlatform and the default cleanup list.

Sequence Diagram(s)

sequenceDiagram
    participant Operator as HostedControlPlane Operator
    participant Component as packageserver Component
    participant CVO as CVO / deployment.go
    participant Kube as Kubernetes API

    Operator->>Component: NewComponent()
    Component->>Component: WithManifestAdapter(pdb.yaml -> AdaptPodDisruptionBudget)
    Operator->>CVO: prepare resourcesToRemove
    CVO->>CVO: include packageserver-pdb in resourcesToRemove
    CVO->>Kube: generate payload (may omit pdb in some payloads)
    Operator->>Kube: apply manifests (packageserver-pdb present/ensured)
Loading
🚥 Pre-merge checks | ✅ 11
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main objective of the PR: moving the packageserver PodDisruptionBudget from the guest cluster to the management cluster, which is the core change across all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed The PR adds a standard Go unit test, not Ginkgo tests. The custom check applies only to Ginkgo tests (It(), Describe(), Context(), When()), which are not present in this PR.
Test Structure And Quality ✅ Passed Test follows codebase patterns: standard Go testing, single responsibility, proper t.Parallel(), Gomega assertions, no resource cleanup needed. Consistent with similar component tests.
Topology-Aware Scheduling Compatibility ✅ Passed PDB manifest is topology-aware: minAvailable: 1 with no affinity constraints; AdaptPodDisruptionBudget() function adjusts spec per ControllerAvailabilityPolicy.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR adds no Ginkgo e2e tests; only a standard unit test (testing.T) with Gomega assertions that has no IPv4 assumptions or external connectivity requirements.
No-Weak-Crypto ✅ Passed No weak cryptography patterns (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB, custom crypto, or insecure secret comparisons) were found in any changed files.
Container-Privileges ✅ Passed No privileged container configurations, host access settings (hostPID/hostNetwork/hostIPC), SYS_ADMIN capabilities, or allowPrivilegeEscalation: true found in any modified files.
No-Sensitive-Data-In-Logs ✅ Passed No sensitive data logging found. Added files contain only Kubernetes manifests and component configuration with no logging statements exposing credentials or PII.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release and removed do-not-merge/needs-area labels May 7, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@dhgautam99: This pull request references Jira Issue OCPBUGS-54790, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

What this PR does / why we need it:

The packageserver PodDisruptionBudget was being created in the guest cluster's
openshift-operator-lifecycle-manager namespace by CVO. However, packageserver
pods run on the management cluster in the clusters-<hosted-cluster> namespace,
making the guest cluster PDB ineffective.

This PR:

  • Prevents CVO from creating the packageserver PDB in the guest cluster (via manifestsToOmit)
  • Cleans up the orphaned PDB on existing clusters during upgrade (via resourcesToRemove)
  • Creates the PDB in the management cluster namespace using the cpov2 framework

Which issue(s) this PR fixes:

Fixes OCPBUGS-54790

Special notes for your reviewer:

The PDB cleanup applies to all platforms (both IBM/PowerVS and default) since
packageserver runs on the management cluster regardless of platform.

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • Improvements
  • Enhanced availability protection for the packageserver component to minimize service disruptions during cluster maintenance operations.
  • Improved component configuration management for operational consistency and platform-specific handling.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 40.70%. Comparing base (2f52041) to head (1185ea9).
⚠️ Report is 61 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8459      +/-   ##
==========================================
+ Coverage   40.61%   40.70%   +0.08%     
==========================================
  Files         755      755              
  Lines       93227    93374     +147     
==========================================
+ Hits        37864    38004     +140     
+ Misses      52640    52636       -4     
- Partials     2723     2734      +11     
Files with missing lines Coverage Δ
...ontrollers/hostedcontrolplane/v2/cvo/deployment.go 41.17% <100.00%> (+0.58%) ⬆️
...stedcontrolplane/v2/olm/packageserver/component.go 73.91% <100.00%> (+73.91%) ⬆️

... and 7 files with indirect coverage changes

Flag Coverage Δ
cmd-support 34.70% <ø> (ø)
cpo-hostedcontrolplane 41.91% <100.00%> (+0.13%) ⬆️
cpo-other 41.39% <ø> (+0.32%) ⬆️
hypershift-operator 50.81% <ø> (+0.05%) ⬆️
other 31.61% <ø> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dhgautam99 dhgautam99 force-pushed the remove-packageserver-pdb-from-guest branch from 46bdd14 to ffe96be Compare May 7, 2026 13:51
@dhgautam99 dhgautam99 marked this pull request as ready for review May 7, 2026 14:10
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 7, 2026
@openshift-ci openshift-ci Bot requested review from cblecker and enxebre May 7, 2026 14:12
@cblecker
Copy link
Copy Markdown
Member

cblecker commented May 7, 2026

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 7, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws

@cwbotbot
Copy link
Copy Markdown

cwbotbot commented May 7, 2026

Test Results

e2e-aws

e2e-aks

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-azure-self-managed | Build: 2052414791430967296 | Cost: $3.95796535 | Failed step: hypershift-azure-run-e2e-self-managed

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aws | Build: 2052414791355469824 | Cost: $3.2685650000000006 | Failed step: hypershift-aws-run-e2e-nested

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@dhgautam99
Copy link
Copy Markdown
Author

/test e2e-aws
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aws | Build: 2052722353925787648 | Cost: $2.8874605499999997 | Failed step: hypershift-aws-run-e2e-nested

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@dhgautam99
Copy link
Copy Markdown
Author

/test e2e-aws

"0000_50_olm_07-collect-profiles.cronjob.yaml",
"0000_50_olm_08-catalog-operator.deployment.ibm-cloud-managed.yaml",
"0000_50_olm_08-catalog-operator.deployment.yaml",
"0000_50_olm_00-packageserver.pdb.yaml",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@dhgautam99 dhgautam99 force-pushed the remove-packageserver-pdb-from-guest branch from ffe96be to 86fec73 Compare May 20, 2026 06:40
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label May 20, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

New changes are detected. LGTM label has been removed.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
control-plane-operator/controllers/hostedcontrolplane/v2/olm/packageserver/component.go (1)

9-9: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Test name should follow "When...it should..." pattern.

As per coding guidelines, unit tests should use the format "When ... it should ..." for test case descriptions.

📝 Suggested fix
-func TestNewComponent(t *testing.T) {
+func TestNewComponent_WhenCreatingComponent_ItShouldReturnValidPackageserverComponent(t *testing.T) {

As per coding guidelines: "Always use 'When ... it should ...' format for describing test cases when creating unit tests".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@control-plane-operator/controllers/hostedcontrolplane/v2/olm/packageserver/component.go`
at line 9, Test descriptions in the OLM package server tests must follow the
"When ... it should ..." pattern; locate any test declarations (e.g.,
t.Run("..."), It("..."), or DescribeTable entries) related to the PackageServer
component and rename their string descriptions to the form "When <condition> it
should <expected outcome>" (for example change "validates X" to "When X it
should validate Y"). Ensure all changed descriptions remain clear and update any
related test helpers or snapshots that assert on the test name.
🧹 Nitpick comments (1)
control-plane-operator/controllers/hostedcontrolplane/v2/olm/packageserver/component_test.go (1)

9-9: ⚡ Quick win

Rename test to follow Gherkin "When... it should..." pattern.

The test name should follow the Gherkin syntax pattern as specified in the coding guidelines. Consider renaming to something like TestNewComponent_WhenCalled_ItShouldReturnComponentNamedPackageserver to align with project standards.

As per coding guidelines, "Always use 'When ... it should ...' format for describing test cases when creating unit tests."

♻️ Proposed fix for test naming
-func TestNewComponent(t *testing.T) {
+func TestNewComponent_WhenCalled_ItShouldReturnComponentNamedPackageserver(t *testing.T) {
 	t.Parallel()
 	g := NewWithT(t)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@control-plane-operator/controllers/hostedcontrolplane/v2/olm/packageserver/component_test.go`
at line 9, Rename the unit test function TestNewComponent to follow the Gherkin
"When... it should..." pattern; update the function name to something like
TestNewComponent_WhenCalled_ItShouldReturnComponentNamedPackageserver and adjust
any references or test runners accordingly so the test still compiles and
executes (look for the TestNewComponent function in component_test.go and rename
it consistently).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go`:
- Line 270: Add the packageserver PodDisruptionBudget manifest to the
manifestsToOmit list so it is never deployed (rather than created then removed);
specifically append the PodDisruptionBudget entry for Name="packageserver-pdb",
Namespace="openshift-operator-lifecycle-manager" (the same object literal used
in the diff: &policyv1.PodDisruptionBudget{ObjectMeta: metav1.ObjectMeta{Name:
"packageserver-pdb", Namespace: "openshift-operator-lifecycle-manager"}}) into
the manifestsToOmit array where other omitted manifests are listed (look for the
manifestsToOmit slice/variable in this file) so the CVO will skip deploying that
PDB on new clusters.

---

Outside diff comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/v2/olm/packageserver/component.go`:
- Line 9: Test descriptions in the OLM package server tests must follow the
"When ... it should ..." pattern; locate any test declarations (e.g.,
t.Run("..."), It("..."), or DescribeTable entries) related to the PackageServer
component and rename their string descriptions to the form "When <condition> it
should <expected outcome>" (for example change "validates X" to "When X it
should validate Y"). Ensure all changed descriptions remain clear and update any
related test helpers or snapshots that assert on the test name.

---

Nitpick comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/v2/olm/packageserver/component_test.go`:
- Line 9: Rename the unit test function TestNewComponent to follow the Gherkin
"When... it should..." pattern; update the function name to something like
TestNewComponent_WhenCalled_ItShouldReturnComponentNamedPackageserver and adjust
any references or test runners accordingly so the test still compiles and
executes (look for the TestNewComponent function in component_test.go and rename
it consistently).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 3c0b9051-33db-4f0b-b4b2-c45a059fff33

📥 Commits

Reviewing files that changed from the base of the PR and between ffe96be and 86fec73.

⛔ Files ignored due to path filters (15)
  • control-plane-operator/controllers/hostedcontrolplane/testdata/cluster-version-operator/AROSwift/zz_fixture_TestControlPlaneComponents_cluster_version_operator_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/cluster-version-operator/GCP/zz_fixture_TestControlPlaneComponents_cluster_version_operator_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/cluster-version-operator/IBMCloud/zz_fixture_TestControlPlaneComponents_cluster_version_operator_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/cluster-version-operator/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_cluster_version_operator_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/cluster-version-operator/zz_fixture_TestControlPlaneComponents_cluster_version_operator_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/packageserver/AROSwift/zz_fixture_TestControlPlaneComponents_packageserver_controlplanecomponent.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/packageserver/AROSwift/zz_fixture_TestControlPlaneComponents_packageserver_pdb_poddisruptionbudget.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/packageserver/GCP/zz_fixture_TestControlPlaneComponents_packageserver_controlplanecomponent.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/packageserver/GCP/zz_fixture_TestControlPlaneComponents_packageserver_pdb_poddisruptionbudget.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/packageserver/IBMCloud/zz_fixture_TestControlPlaneComponents_packageserver_controlplanecomponent.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/packageserver/IBMCloud/zz_fixture_TestControlPlaneComponents_packageserver_pdb_poddisruptionbudget.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/packageserver/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_packageserver_controlplanecomponent.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/packageserver/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_packageserver_pdb_poddisruptionbudget.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/packageserver/zz_fixture_TestControlPlaneComponents_packageserver_controlplanecomponent.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/packageserver/zz_fixture_TestControlPlaneComponents_packageserver_pdb_poddisruptionbudget.yaml is excluded by !**/testdata/**
📒 Files selected for processing (4)
  • control-plane-operator/controllers/hostedcontrolplane/v2/assets/packageserver/pdb.yaml
  • control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/olm/packageserver/component.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/olm/packageserver/component_test.go
✅ Files skipped from review due to trivial changes (1)
  • control-plane-operator/controllers/hostedcontrolplane/v2/assets/packageserver/pdb.yaml

switch platformType {
case hyperv1.IBMCloudPlatform, hyperv1.PowerVSPlatform:
return []client.Object{
&policyv1.PodDisruptionBudget{ObjectMeta: metav1.ObjectMeta{Name: "packageserver-pdb", Namespace: "openshift-operator-lifecycle-manager"}},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Search for packageserver PDB manifest in the codebase and verify manifestsToOmit contains it.

# Search for packageserver PDB manifest file
echo "=== Searching for packageserver PDB manifest ==="
fd -e yaml -e yml . | rg -i 'packageserver.*pdb|pdb.*packageserver'

# Check if it's referenced in manifestsToOmit
echo -e "\n=== Checking manifestsToOmit array ==="
rg -A 60 'manifestsToOmit.*=.*\[\]string' control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go | rg -i 'packageserver.*pdb'

# Search for the actual manifest name in CVO payload or OLM manifests
echo -e "\n=== Searching for OLM packageserver manifests ==="
rg -l '0000_50_olm.*packageserver' --type yaml

echo -e "\n=== Checking if PDB manifest name follows OLM pattern ==="
rg 'kind:\s*PodDisruptionBudget' --type yaml -A 5 -B 5 | rg -i packageserver

Repository: openshift/hypershift

Length of output: 17195


🏁 Script executed:

cat -n control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go | sed -n '140,200p'

Repository: openshift/hypershift

Length of output: 3942


🏁 Script executed:

cat -n control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go | sed -n '265,290p'

Repository: openshift/hypershift

Length of output: 2747


🏁 Script executed:

# Search for packageserver PDB manifest and check if it exists with a standard OLM naming pattern
cd control-plane-operator/controllers/hostedcontrolplane/v2/assets/packageserver
ls -la
cat pdb.yaml

Repository: openshift/hypershift

Length of output: 464


🏁 Script executed:

# Check how packageserver PDB is loaded and deployed in deployment.go
# Look for references to pdb.yaml or how assets are loaded
rg -A 10 -B 5 'packageserver.*pdb|pdb.*packageserver' control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go

Repository: openshift/hypershift

Length of output: 2425


🏁 Script executed:

# Search for how packageserver component is structured and deployed
# Check the packageserver component code to understand asset handling
rg -l 'packageserver' control-plane-operator/controllers/hostedcontrolplane/v2/ --type go | head -5

Repository: openshift/hypershift

Length of output: 502


🏁 Script executed:

# Check the packageserver component to see how it deploys assets
cat -n control-plane-operator/controllers/hostedcontrolplane/v2/olm/packageserver/component.go | head -50

Repository: openshift/hypershift

Length of output: 1782


🏁 Script executed:

# The component.go shows pdb.yaml is deployed by hypershift packageserver component.
# Now check if there's a packageserver PDB from CVO payload that needs to be omitted.
# Search for OLM manifests with packageserver and pdb pattern

# First, let's check what the CVO payload looks like in test fixtures
rg 'kind: PodDisruptionBudget' control-plane-operator/controllers/hostedcontrolplane/testdata/cluster-version-operator/ -A 5 -B 5 | grep -A 5 -B 5 packageserver

Repository: openshift/hypershift

Length of output: 11022


🏁 Script executed:

# Check the CVO deployment script to see what manifest filename the packageserver PDB has
# Extract more context from one of the CVO deployment test files
grep -n "0000_01_cleanup.yaml" control-plane-operator/controllers/hostedcontrolplane/testdata/cluster-version-operator/zz_fixture_TestControlPlaneComponents_cluster_version_operator_deployment.yaml | head -1

Repository: openshift/hypershift

Length of output: 142


🏁 Script executed:

# Search for the actual packageserver PDB manifest file in the CVO payload
# Check what manifests are actually in the payload vs what's in cleanup
rg 'packageserver.*pdb|0000_.*packageserver.*pdb' control-plane-operator/controllers/hostedcontrolplane/testdata/cluster-version-operator/zz_fixture_TestControlPlaneComponents_cluster_version_operator_deployment.yaml | head -20

Repository: openshift/hypershift

Length of output: 100


🏁 Script executed:

# Look at the preparePayloadScript function to understand how manifests are processed
# Search for the actual manifest list operations
rg -A 200 'func preparePayloadScript' control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go | grep -i 'packageserver\|manifest' | head -20

Repository: openshift/hypershift

Length of output: 1654


🏁 Script executed:

# Check what the actual cleanup script contains in the test fixtures
sed -n '190,250p' control-plane-operator/controllers/hostedcontrolplane/testdata/cluster-version-operator/zz_fixture_TestControlPlaneComponents_cluster_version_operator_deployment.yaml

Repository: openshift/hypershift

Length of output: 2468


🏁 Script executed:

# Check if there's an actual packageserver PDB manifest in the CVO release that should be omitted
# Look at the preparePayloadScript more carefully to see where release-manifests come from
sed -n '198,280p' control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go

Repository: openshift/hypershift

Length of output: 4453


🏁 Script executed:

# Look for any comments or references to packageserver PDB manifest naming
# and check if there's a specific manifest file for it
rg '0000_.*packageserver' --type yaml control-plane-operator/controllers/hostedcontrolplane/v2/

Repository: openshift/hypershift

Length of output: 46


🏁 Script executed:

# Check if there's a packageserver PDB manifest in the CVO payload by searching git history or release content
# First, let's check what files are being copied into the payload
rg 'release-manifests' control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go -B 5 -A 5 | head -30

Repository: openshift/hypershift

Length of output: 1720


🏁 Script executed:

# Check if there's any packageserver PDB manifest from OpenShift release in the test data
# Look for what manifests are in the release-manifests directory
find control-plane-operator/controllers/hostedcontrolplane/testdata -name "*.yaml" | xargs grep -l 'kind: PodDisruptionBudget' | xargs grep -l 'packageserver'

Repository: openshift/hypershift

Length of output: 2598


🏁 Script executed:

# Final verification: Check if there are any git changes or recent commits related to manifestsToOmit
git log --oneline -20 -- control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go 2>/dev/null || echo "Git log not available"

Repository: openshift/hypershift

Length of output: 136


Add packageserver PDB manifest to manifestsToOmit array.

The packageserver PDB exists in the CVO release payload (confirmed by resourcesToRemove cleanup), but is missing from the manifestsToOmit array. While resourcesToRemove will clean it up on upgrades, manifestsToOmit is needed to prevent the manifest from being deployed on new clusters in the first place. Without this entry, the CVO's packageserver PDB is unnecessarily created before being removed by the cleanup manifest, rather than being omitted from deployment initially.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go`
at line 270, Add the packageserver PodDisruptionBudget manifest to the
manifestsToOmit list so it is never deployed (rather than created then removed);
specifically append the PodDisruptionBudget entry for Name="packageserver-pdb",
Namespace="openshift-operator-lifecycle-manager" (the same object literal used
in the diff: &policyv1.PodDisruptionBudget{ObjectMeta: metav1.ObjectMeta{Name:
"packageserver-pdb", Namespace: "openshift-operator-lifecycle-manager"}}) into
the manifestsToOmit array where other omitted manifests are listed (look for the
manifestsToOmit slice/variable in this file) so the CVO will skip deploying that
PDB on new clusters.

@dhgautam99
Copy link
Copy Markdown
Author

/retest

@dhgautam99 dhgautam99 force-pushed the remove-packageserver-pdb-from-guest branch from 86fec73 to 938c84c Compare May 28, 2026 05:51
…ster

The packageserver PDB was being created in the guest cluster's
openshift-operator-lifecycle-manager namespace by CVO, but packageserver
pods run on the management cluster in the clusters-<hosted-cluster>
namespace. This moves the PDB to the correct location.

- Add packageserver PDB manifest to manifestsToOmit to prevent CVO from
  creating it in guest clusters
- Add packageserver-pdb to resourcesToRemove for all platforms to clean
  up the orphaned PDB on existing clusters during upgrade
- Register PDB manifest adapter in packageserver component to create the
  PDB in the management cluster namespace
@dhgautam99
Copy link
Copy Markdown
Author

/retest ci/prow/security

@dhgautam99
Copy link
Copy Markdown
Author

/test security

@dhgautam99 dhgautam99 force-pushed the remove-packageserver-pdb-from-guest branch from 938c84c to 8bfaa36 Compare May 28, 2026 13:59
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

@cblecker
Copy link
Copy Markdown
Member

/unassign
/uncc

@openshift-ci openshift-ci Bot removed the request for review from cblecker May 29, 2026 18:56
Regenerate CVO deployment and packageserver component test fixtures
to reflect the packageserver PDB being omitted from the guest cluster
CVO payload and added to the management cluster namespace.
@dhgautam99 dhgautam99 force-pushed the remove-packageserver-pdb-from-guest branch from 08fa09c to 1185ea9 Compare June 1, 2026 09:53
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 1, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dhgautam99
Once this PR has been reviewed and has the lgtm label, please assign sjenning for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 1, 2026

@dhgautam99: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@hypershift-jira-solve-ci
Copy link
Copy Markdown

I now have all the evidence needed to produce the report. The root cause is clear: this is a Konflux infrastructure outage, not a code issue with PR #8459.

Test Failure Analysis Complete

Job Information

  • Prow Job: Red Hat Konflux on-pull-request pipelines (4 pipelines)
  • Build ID: control-plane-operator-main-on-pull-request-lvhmh, hypershift-operator-main-on-pull-request-cdn4h, hypershift-release-mce-50-on-pull-request-cjqxn, hypershift-cli-mce-50-on-pull-request-6g69m
  • PR: #8459 — OCPBUGS-54790: Move packageserver PDB from guest cluster to management cluster
  • CI System: Red Hat Konflux (Tekton-based build pipelines on stone-prd-rh01)
  • Failed Task: clone-repository (all 4 pipelines)
  • Failure Window: 2026-06-01 09:54–10:12 UTC

Test Failure Analysis

Error

task clone-repository has the status "Failed":
error: Command failed after 10 tries with status 1
Command exited with non-zero status 1

Summary

All four Konflux pipeline failures are caused by a transient Konflux infrastructure outage on 2026-06-01 between ~08:11 and ~10:12 UTC — not by any code change in PR #8459. Every pipeline failed at the very first substantive step (clone-repository) before any code was compiled or tested. The same clone-repository failure hit an unrelated PR (#8647, "Bump go-toolset to 1.25.9") at the same time, and an even earlier failure on PR #8500 at 08:11 UTC shows a 504 Gateway Timeout from quay.io when resolving Tekton pipeline task bundles, confirming the Konflux platform was experiencing network/registry connectivity issues. PRs that started their Konflux runs ~15–20 minutes later (#8479 at 10:14 UTC, #8617 at 10:12 UTC) progressed successfully past clone-repository, indicating the outage was self-resolving.

Root Cause

Transient Konflux CI infrastructure outage (network/registry connectivity failure)

The clone-repository Tekton task in the Konflux pipeline was unable to clone the openshift/hypershift repository from GitHub. The task retried 10 times over 17 minutes before giving up with exit status 1. This is infrastructure-level — the git clone happens inside a Tekton TaskRun pod running on the stone-prd-rh01 Konflux cluster, and the failure indicates the pod could not reach GitHub (or an intermediary git-cache/proxy).

Corroborating evidence that this is infrastructure, not code:

  1. All 4 pipelines failed identicallycontrol-plane-operator-main, hypershift-operator-main, hypershift-release-mce-50, and hypershift-cli-mce-50 all failed at clone-repository with the same error after exactly 17 minutes. No code was ever built or tested.

  2. Unrelated PRs failed the same way at the same time — PR OCPBUGS-74599: Bump go-toolset to 1.25.9 to fix GO-2025-4155 #8647 ("Bump go-toolset to 1.25.9"), which changes completely different files, had its Konflux clone-repository task fail with the identical error (Command failed after 10 tries with status 1) at 09:55–10:12 UTC.

  3. Earlier outage symptom — PR OCPSTRAT-1677: fix(cpo): increase NTH default workers from 10 to 20 #8500 failed at 08:11 UTC with a different but related infrastructure error: GET https://quay.io/v2/: unexpected status code 504 Gateway Timeout — the Konflux cluster couldn't even pull the Tekton Task bundle images from quay.io, indicating broader network/registry issues on the Konflux platform.

  4. The outage resolved itself — PRs OCPBUGS-76530: Fix intermittent etcd peer communication failures #8479 and OCPBUGS-86662: Optimize CPO deployment polling interval in tests of additional trust bundle propagation #8617, whose Konflux runs started at 10:12–10:14 UTC (just minutes after PR OCPBUGS-54790: Move packageserver PDB from guest cluster to management cluster #8459's run completed), were in-progress and advancing past clone-repository successfully.

  5. Same pipelines passed previously — All four of these exact pipeline names passed on PR CNTRLPLANE-3584: Add kube-scheduler ServiceMonitor with CA-signed serving certs #8489 (May 18), PR OCPBUGS-43444: Allow kubevirt-csi storageclass default to be changed by user #8597 (May 27), and PR NO-JIRA: feat(ci): add skopeo and gh CLI to ARC runner image #8627 (May 29), confirming no persistent misconfiguration.

  6. PR code is not involved — The PR's HEAD commit (1185ea9) has not changed since May 7. The Konflux pipeline never got past the git clone step, so no code from this PR was ever compiled or evaluated.

Recommendations
  1. Retrigger the Konflux pipelines — The outage appears to have resolved by ~10:14 UTC. Simply re-running the failed check runs (via the GitHub UI "Re-run" button or a /retest comment if supported) should succeed.

  2. No code changes needed — This failure is entirely infrastructure-related. The PR's code changes (moving packageserver PDB from guest cluster to management cluster) were never reached by the CI system.

  3. Monitor for recurrence — If the retrigger also fails at clone-repository, escalate to the Konflux/RHTAP infrastructure team (crt-redhat-acm-tenant namespace on stone-prd-rh01). The earlier quay.io 504 error on PR OCPSTRAT-1677: fix(cpo): increase NTH default workers from 10 to 20 #8500 suggests potential upstream registry instability that could recur.

  4. Consider ignoring these failures for merge — If the Prow CI jobs (e2e tests, unit tests, etc.) pass, the Konflux build failures should not block merging since they are demonstrably unrelated to the PR's changes.

Evidence
Evidence Detail
Failed task clone-repository — first substantive pipeline step; no code was compiled or tested
Error message Command failed after 10 tries with status 1 (all 4 pipelines, identical)
Failure duration Exactly 17 minutes per pipeline (10 retry attempts)
PR #8459 failure window 2026-06-01 09:54:01 – 10:11:45 UTC
PR #8647 same failure Unrelated PR failed identically at 09:55:12 – 10:12:59 UTC (same error, same task)
PR #8500 earlier failure 08:11 UTC — quay.io 504 Gateway Timeout pulling Tekton task bundles
PR #8479 started later, succeeding Started 10:14 UTC — all 4 pipelines in-progress, past clone-repository
PR #8617 started later, succeeding Started 10:12 UTC — 3 pipelines in-progress, past clone-repository
PR #8489 same pipelines passed All 4 passed on 2026-05-18
PR code last changed 2026-05-07 (commit 1185ea9) — no recent code changes
Konflux cluster stone-prd-rh01.pg1f.p1.openshiftapps.com, namespace crt-redhat-acm-tenant

@dhgautam99
Copy link
Copy Markdown
Author

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants