Skip to content

CNTRLPLANE-3510: Enable additional golangci-lint linters#8567

Merged
openshift-merge-bot[bot] merged 7 commits into
openshift:mainfrom
bryan-cox:add-linter
Jun 2, 2026
Merged

CNTRLPLANE-3510: Enable additional golangci-lint linters#8567
openshift-merge-bot[bot] merged 7 commits into
openshift:mainfrom
bryan-cox:add-linter

Conversation

@bryan-cox
Copy link
Copy Markdown
Member

@bryan-cox bryan-cox commented May 21, 2026

Summary

Enables five new golangci-lint linters and fixes all violations across the codebase:

  • errorlint: Enforces errors.As/errors.Is instead of type assertions/direct comparisons on errors, enabling proper wrapped-error detection
  • nilerr: Detects functions that check err != nil then silently return nil, swallowing errors. Intentional cases (e.g., not-found during cleanup) are suppressed with //nolint:nilerr comments
  • noctx: Requires http.NewRequestWithContext instead of http.NewRequest/http.Get, ensuring HTTP requests respect context cancellation
  • usestdlibvars: Enforces stdlib constants (http.StatusOK, http.MethodGet) instead of magic literals
  • dupword: Catches duplicate words in comments (e.g., "the the", "is is")

Also adds unit tests covering the behavioral changes introduced by the linter fixes.

Commits

  1. chore: enable additional golangci-lint linters.golangci.yml configuration
  2. fix: resolve all errorlint violationserrors.As/errors.Is migration across ~30 files
  3. fix: suppress intentional nilerr violations//nolint:nilerr for legitimate error-swallowing patterns
  4. fix: resolve all noctx violationshttp.NewRequestWithContext migration
  5. fix: resolve all usestdlibvars violations — stdlib constant usage
  6. fix: resolve all dupword violations — comment typo fixes
  7. test: add unit tests for linter fix error paths — 21 new test cases validating wrapped-error detection, URL parsing, error retryability, and more

Test plan

  • make lint passes with all new linters enabled
  • make test passes (unit tests)
  • New unit tests cover key behavioral changes from errorlint (errors.As wrapped error detection) and noctx fixes
  • CI e2e tests pass

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Improved error messages and preservation so underlying causes can be inspected; refined not-found/permission handling and a few user-facing error texts.
    • Made many network/HTTP operations context-aware to respect cancellations/timeouts.
  • Tests

    • Expanded unit and integration tests covering error handling, context-aware requests, polling/retry flows, and edge cases.
  • Chores

    • Enabled additional linters to strengthen code quality checks.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 21, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@bryan-cox: This pull request explicitly references no jira issue.

Details

In response to this:

Summary

  • Enable the errorlint golangci-lint linter which catches three categories of error handling bugs:
  • %v/%s used instead of %w in fmt.Errorf() — errors not wrapped, breaks errors.Is/errors.As chains
  • Direct error comparison (==/!=) instead of errors.Is() — fails on wrapped errors
  • Error type assertions instead of errors.As() — fails on wrapped errors
  • Fix all existing violations across 76 files
  • Set max-same-issues: 0 so all violations surface in a single lint run

Test plan

  • make lint passes with 0 issues
  • make test passes with no regressions
  • CI e2e tests pass

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 21, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bryan-cox

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cli Indicates the PR includes changes for CLI area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/control-plane-pki-operator Indicates the PR includes changes for the control plane PKI operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/karpenter-operator Indicates the PR includes changes related to the Karpenter operator area/platform/aws PR/issue for AWS (AWSPlatform) platform area/platform/gcp PR/issue for GCP (GCPPlatform) platform area/platform/kubevirt PR/issue for KubeVirt (KubevirtPlatform) platform area/platform/powervs PR/issue for PowerVS (PowerVSPlatform) platform area/testing Indicates the PR includes changes for e2e testing and removed do-not-merge/needs-area labels May 21, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR standardizes error handling and context usage across the repository: it enables the errorlint and nilerr linters; replaces many non-wrapping fmt.Errorf uses with %w; converts direct type assertions/equality to errors.As / errors.Is where appropriate; makes HTTP requests, dialing, and some command executions context-aware (http.NewRequestWithContext, DialContext, exec.CommandContext); adds //nolint:nilerr annotations on intentional retry branches returning nil errors; and updates tests and minor textual fixes. No public API signatures were changed.

Suggested reviewers

  • cblecker
  • clebs
  • muraee
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

@bryan-cox bryan-cox changed the title NO-JIRA: Enable errorlint and fix all existing violations NO-JIRA: Enable errorlint and nilerr linters, fix all violations May 21, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

❌ Patch coverage is 41.01562% with 151 lines in your changes missing coverage. Please review.
✅ Project coverage is 41.18%. Comparing base (89e19f8) to head (7d7f0b0).
⚠️ Report is 76 commits behind head on main.

Files with missing lines Patch % Lines
...ostedcontrolplane/hostedcontrolplane_controller.go 35.00% 13 Missing ⚠️
...trollers/hostedcluster/hostedcluster_controller.go 55.55% 7 Missing and 1 partial ⚠️
availability-prober/availability_prober.go 0.00% 7 Missing ⚠️
cmd/infra/azure/rbac.go 0.00% 6 Missing ⚠️
cmd/infra/powervs/destroy.go 0.00% 6 Missing ⚠️
cmd/infra/aws/iam.go 0.00% 5 Missing ⚠️
...ontrollers/hostedcontrolplane/oauth/idp_convert.go 28.57% 5 Missing ⚠️
...rollers/hostedcontrolplane/v2/oauth/idp_convert.go 28.57% 5 Missing ⚠️
...perator/controllers/hostedcontrolplane/kas/auth.go 66.66% 4 Missing ⚠️
...operator/controllers/uwmtelemetry/uwm_telemetry.go 20.00% 3 Missing and 1 partial ⚠️
... and 56 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8567      +/-   ##
==========================================
+ Coverage   40.61%   41.18%   +0.57%     
==========================================
  Files         755      755              
  Lines       93227    93253      +26     
==========================================
+ Hits        37864    38408     +544     
+ Misses      52640    52129     -511     
+ Partials     2723     2716       -7     
Files with missing lines Coverage Δ
cmd/infra/aws/util/errors.go 100.00% <100.00%> (+100.00%) ⬆️
cmd/nodepool/core/create.go 20.23% <100.00%> (ø)
...erator/controllers/gcpprivateserviceconnect/dns.go 18.25% <100.00%> (+0.23%) ⬆️
...perator/controllers/resources/ingress/reconcile.go 69.93% <100.00%> (+7.18%) ⬆️
...tor/controllers/resources/kas/admissionpolicies.go 93.82% <100.00%> (+84.56%) ⬆️
...ontrollers/resources/registry/admissionpolicies.go 93.93% <100.00%> (+18.18%) ⬆️
...esigningcontroller/certificatesigningcontroller.go 51.87% <100.00%> (+1.87%) ⬆️
...rollers/hostedcluster/internal/proxy/validation.go 82.60% <100.00%> (ø)
...rshift-operator/controllers/nodepool/conditions.go 53.93% <100.00%> (ø)
ignition-server/controllers/cache.go 100.00% <100.00%> (ø)
... and 72 more
Flag Coverage Δ
cmd-support 34.86% <38.88%> (+0.16%) ⬆️
cpo-hostedcontrolplane 43.46% <41.97%> (+1.68%) ⬆️
cpo-other 42.44% <74.07%> (+1.37%) ⬆️
hypershift-operator 50.92% <42.10%> (+0.16%) ⬆️
other 31.60% <18.42%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/nodepool/core/create.go`:
- Around line 198-201: The Get call that checks the HostedCluster payload
currently swallows all errors (client.Get(..., hc) then return nil), which hides
transient API/RBAC issues; update the error handling to only ignore NotFound by
using apierrors.IsNotFound(err) (from k8s.io/apimachinery/pkg/api/errors) — if
IsNotFound(err) return nil, otherwise return the error (or wrap and return it);
ensure you add the import for apierrors and adjust the branch after the
client.Get so logger.Info remains for NotFound but other errors are propagated
instead of being dropped.

In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`:
- Around line 4469-4471: The code currently embeds the raw AWS error text into
the returned error (fmt.Errorf("%w: aws returned an error: %w", wrapped, err)),
which then flows into the ValidOIDCConfiguration status message; change this to
avoid surfacing provider-specific error text by returning a generic provider
error (e.g., fmt.Errorf("%w: aws returned an error", wrapped)) and separately
log the original err to the controller logger (or record it as an event) so the
detailed AWS error is available in logs but not in the status message; update
the site that constructs/returns wrapped (the place where variables wrapped and
err are used with fmt.Errorf) to remove embedding err and ensure you call the
controller logger (or event recorder) to log err.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 567af7d2-b72e-4f37-b4ed-c788a51be81d

📥 Commits

Reviewing files that changed from the base of the PR and between 36dfb1b and 2fe9cef.

📒 Files selected for processing (86)
  • .golangci.yml
  • cmd/bastion/aws/create.go
  • cmd/cluster/core/create.go
  • cmd/infra/aws/iam.go
  • cmd/infra/aws/iam_policies.go
  • cmd/infra/aws/route53.go
  • cmd/infra/aws/util/errors.go
  • cmd/infra/powervs/create.go
  • cmd/infra/powervs/destroy.go
  • cmd/kubeconfig/create.go
  • cmd/nodepool/core/create.go
  • control-plane-operator/controllers/gcpprivateserviceconnect/dns.go
  • control-plane-operator/controllers/gcpprivateserviceconnect/psc_endpoint_controller.go
  • control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
  • control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller_test.go
  • control-plane-operator/controllers/hostedcontrolplane/kas/auth.go
  • control-plane-operator/controllers/hostedcontrolplane/kas_pki_setup.go
  • control-plane-operator/controllers/hostedcontrolplane/oauth/idp_convert.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/assets/assets.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/cloud_controller_manager/kubevirt/config.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/cloud_controller_manager/powervs/config.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/ignitionserver/pki.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/auth.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/kubeconfig.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/oauth.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/oauth/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/oauth/idp_convert.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/olm/catalogs/deployment.go
  • control-plane-operator/hostedclusterconfigoperator/cmd.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/resources/kas/admissionpolicies.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/resources/registry/admissionpolicies.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go
  • control-plane-operator/hostedclusterconfigoperator/operator/config.go
  • control-plane-pki-operator/certificatesigningcontroller/certificatesigningcontroller.go
  • control-plane-pki-operator/targetconfigcontroller/targetconfigcontroller.go
  • control-plane-pki-operator/topology/detector.go
  • etcd-backup/etcdbackup.go
  • etcd-recovery/etcdrecovery.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_webhook.go
  • hypershift-operator/controllers/hostedcluster/internal/platform/kubevirt/kubevirt_test.go
  • hypershift-operator/controllers/hostedcluster/internal/proxy/validation.go
  • hypershift-operator/controllers/nodepool/apiserver-haproxy/haproxy.go
  • hypershift-operator/controllers/nodepool/aws_test.go
  • hypershift-operator/controllers/nodepool/conditions.go
  • hypershift-operator/controllers/nodepool/config.go
  • hypershift-operator/controllers/nodepool/nodepool_controller.go
  • hypershift-operator/controllers/nodepool/nto.go
  • hypershift-operator/controllers/platform/aws/controller.go
  • hypershift-operator/controllers/platform/gcp/privateserviceconnect_controller.go
  • hypershift-operator/controllers/uwmtelemetry/uwm_telemetry.go
  • ignition-server/cmd/start.go
  • ignition-server/controllers/local_ignitionprovider.go
  • ignition-server/controllers/tokensecret_controller.go
  • karpenter-operator/controllers/karpenter/machine_approver.go
  • karpenter-operator/controllers/nodeclass/ec2_nodeclass_controller.go
  • karpenter-operator/main.go
  • kubevirtexternalinfra/externalinfra.go
  • pkg/etcdcli/etcdcli.go
  • support/azureutil/azureutil.go
  • support/controlplane-component/controlplane-component.go
  • support/controlplane-component/controlplane-component_test.go
  • support/controlplane-component/kubeconfig.go
  • support/gcpapi/gcs_client.go
  • support/konnectivityproxy/dialer.go
  • support/releaseinfo/registryclient/client.go
  • support/releaseinfo/releaseinfo.go
  • support/supportedversion/version.go
  • support/thirdparty/docker/pkg/archive/archive.go
  • support/thirdparty/library-go/pkg/image/registryclient/client.go
  • support/thirdparty/oc/pkg/cli/image/manifest/manifest.go
  • support/util/util.go
  • support/validations/authentication.go
  • test/e2e/util/aws.go
  • test/e2e/util/dump/dump.go
  • test/e2e/util/node.go
  • test/e2e/util/oauth.go
  • test/e2e/util/reqserving/verifycp.go
  • test/e2e/util/reqserving/verifypods.go
  • test/e2e/util/reqserving/vpa.go
  • test/e2e/util/reqserving/waitfor.go
  • test/e2e/util/util.go
  • test/e2e/util/version.go

Comment thread cmd/nodepool/core/create.go Outdated
Comment thread hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/infra/aws/route53.go`:
- Around line 189-191: The cleanup currently swallows every error from
LookupZone; change it to only ignore the specific "zone not found" case and
return any other error. Update the block around the LookupZone call (id, err :=
LookupZone(ctx, client, name, false)) to inspect err: if it matches the
sentinel/not-found condition returned by LookupZone (or use errors.Is/ a helper
like IsZoneNotFound(err) / AWS's NotFound error check), then continue silently,
otherwise return the error so real Route53/API failures are propagated.

In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`:
- Line 1435: The error return wraps the wrong variable because the outer err
(from platform.GetPlatform) is nil; update the return in the
PlatformCredentialsFound status update failure so only the statusErr is wrapped
with %w and the reconcile error is included with a non-wrapping verb (or capture
the reconcile error into a separate variable). Locate the code around
p.ReconcileCredentials(...) and the fmt.Errorf(...) return and change the
formatting to include the reconcile error without %w (e.g., %v) and use %w only
for statusErr so you don't accidentally wrap a nil outer err.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: c16340aa-bb8a-4eb6-91c1-7304d463990f

📥 Commits

Reviewing files that changed from the base of the PR and between 2fe9cef and 680d473.

📒 Files selected for processing (86)
  • .golangci.yml
  • cmd/bastion/aws/create.go
  • cmd/cluster/core/create.go
  • cmd/infra/aws/iam.go
  • cmd/infra/aws/iam_policies.go
  • cmd/infra/aws/route53.go
  • cmd/infra/aws/util/errors.go
  • cmd/infra/powervs/create.go
  • cmd/infra/powervs/destroy.go
  • cmd/kubeconfig/create.go
  • cmd/nodepool/core/create.go
  • control-plane-operator/controllers/gcpprivateserviceconnect/dns.go
  • control-plane-operator/controllers/gcpprivateserviceconnect/psc_endpoint_controller.go
  • control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
  • control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller_test.go
  • control-plane-operator/controllers/hostedcontrolplane/kas/auth.go
  • control-plane-operator/controllers/hostedcontrolplane/kas_pki_setup.go
  • control-plane-operator/controllers/hostedcontrolplane/oauth/idp_convert.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/assets/assets.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/cloud_controller_manager/kubevirt/config.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/cloud_controller_manager/powervs/config.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/ignitionserver/pki.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/auth.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/kubeconfig.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/oauth.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/oauth/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/oauth/idp_convert.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/olm/catalogs/deployment.go
  • control-plane-operator/hostedclusterconfigoperator/cmd.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/resources/kas/admissionpolicies.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/resources/registry/admissionpolicies.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go
  • control-plane-operator/hostedclusterconfigoperator/operator/config.go
  • control-plane-pki-operator/certificatesigningcontroller/certificatesigningcontroller.go
  • control-plane-pki-operator/targetconfigcontroller/targetconfigcontroller.go
  • control-plane-pki-operator/topology/detector.go
  • etcd-backup/etcdbackup.go
  • etcd-recovery/etcdrecovery.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_webhook.go
  • hypershift-operator/controllers/hostedcluster/internal/platform/kubevirt/kubevirt_test.go
  • hypershift-operator/controllers/hostedcluster/internal/proxy/validation.go
  • hypershift-operator/controllers/nodepool/apiserver-haproxy/haproxy.go
  • hypershift-operator/controllers/nodepool/aws_test.go
  • hypershift-operator/controllers/nodepool/conditions.go
  • hypershift-operator/controllers/nodepool/config.go
  • hypershift-operator/controllers/nodepool/nodepool_controller.go
  • hypershift-operator/controllers/nodepool/nto.go
  • hypershift-operator/controllers/platform/aws/controller.go
  • hypershift-operator/controllers/platform/gcp/privateserviceconnect_controller.go
  • hypershift-operator/controllers/uwmtelemetry/uwm_telemetry.go
  • ignition-server/cmd/start.go
  • ignition-server/controllers/local_ignitionprovider.go
  • ignition-server/controllers/tokensecret_controller.go
  • karpenter-operator/controllers/karpenter/machine_approver.go
  • karpenter-operator/controllers/nodeclass/ec2_nodeclass_controller.go
  • karpenter-operator/main.go
  • kubevirtexternalinfra/externalinfra.go
  • pkg/etcdcli/etcdcli.go
  • support/azureutil/azureutil.go
  • support/controlplane-component/controlplane-component.go
  • support/controlplane-component/controlplane-component_test.go
  • support/controlplane-component/kubeconfig.go
  • support/gcpapi/gcs_client.go
  • support/konnectivityproxy/dialer.go
  • support/releaseinfo/registryclient/client.go
  • support/releaseinfo/releaseinfo.go
  • support/supportedversion/version.go
  • support/thirdparty/docker/pkg/archive/archive.go
  • support/thirdparty/library-go/pkg/image/registryclient/client.go
  • support/thirdparty/oc/pkg/cli/image/manifest/manifest.go
  • support/util/util.go
  • support/validations/authentication.go
  • test/e2e/util/aws.go
  • test/e2e/util/dump/dump.go
  • test/e2e/util/node.go
  • test/e2e/util/oauth.go
  • test/e2e/util/reqserving/verifycp.go
  • test/e2e/util/reqserving/verifypods.go
  • test/e2e/util/reqserving/vpa.go
  • test/e2e/util/reqserving/waitfor.go
  • test/e2e/util/util.go
  • test/e2e/util/version.go
✅ Files skipped from review due to trivial changes (16)
  • control-plane-operator/controllers/hostedcontrolplane/v2/cloud_controller_manager/kubevirt/config.go
  • etcd-recovery/etcdrecovery.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go
  • test/e2e/util/reqserving/vpa.go
  • support/thirdparty/docker/pkg/archive/archive.go
  • .golangci.yml
  • test/e2e/util/reqserving/verifycp.go
  • hypershift-operator/controllers/nodepool/conditions.go
  • test/e2e/util/node.go
  • hypershift-operator/controllers/uwmtelemetry/uwm_telemetry.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/resources/kas/admissionpolicies.go
  • test/e2e/util/reqserving/waitfor.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/cloud_controller_manager/powervs/config.go
  • karpenter-operator/controllers/karpenter/machine_approver.go
  • test/e2e/util/aws.go
  • test/e2e/util/oauth.go

Comment thread cmd/infra/aws/route53.go Outdated
@openshift-ci openshift-ci Bot added the area/platform/azure PR/issue for Azure (AzurePlatform) platform label May 21, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go (1)

1004-1011: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Close HTTP response body in KAS health check.

httpClient.Do(req) response is never closed. In a recurrent reconcile loop this can leak connections and eventually degrade controller reliability.

Suggested fix
 resp, err := httpClient.Do(req)
 if err != nil {
 	return err
 }
+defer resp.Body.Close()
 
 if resp.StatusCode != http.StatusOK {
 	return fmt.Errorf("APIServer endpoint %s is not healthy", ingressPoint)
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go`
around lines 1004 - 1011, The HTTP response from httpClient.Do(req) is not
closed, which can leak connections; after the successful call to
httpClient.Do(req) (the resp variable returned) add proper cleanup by ensuring
the response body is consumed/ discarded as needed and closed (e.g., call
io.Copy(io.Discard, resp.Body) then defer resp.Body.Close() immediately after
checking err) before checking resp.StatusCode for ingressPoint health so
connections are returned to the pool.
kubernetes-default-proxy/kubernetes_default_proxy.go (1)

70-84: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Close the listener on ctx cancellation so Accept() unblocks and shutdown completes.
net.ListenConfig.Listen(ctx, ...) only uses ctx during the listen setup; canceling ctx does not close the returned net.Listener, so a blocked listener.Accept() can remain stuck. Trigger shutdown by calling listener.Close() when ctx.Done() fires, and when Accept() returns an error, exit cleanly when the error is due to closure (e.g., net.ErrClosed) / when ctx.Err()!=nil—don’t just log and continue.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@kubernetes-default-proxy/kubernetes_default_proxy.go` around lines 70 - 84,
The listener created by (&net.ListenConfig{}).Listen(ctx, ...) must be closed
when ctx is cancelled so Accept() unblocks: start a goroutine that waits for
<-ctx.Done() and calls listener.Close(), and change the Accept() error handling
in the accept loop (function/method containing listener, Accept, s.log and ctx)
to stop continuing on errors caused by closure—use errors.Is(err, net.ErrClosed)
or check ctx.Err()!=nil and then return (or return ctx.Err()) instead of logging
and continue; keep logging for unexpected Accept errors only.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@availability-prober/availability_prober.go`:
- Around line 120-127: The infinite retry loop around creating and sending the
request (the for ; ; time.Sleep(sleepTime) loop that calls
http.NewRequestWithContext(ctx, ...) and client.Do(req)) doesn't stop when ctx
is canceled; update the loop to observe ctx.Done() and exit cleanly: before
sleeping or retrying check if ctx.Err() != nil (or select on ctx.Done()) and
break/return when canceled, and also after a failed client.Do(req) check
ctx.Err() and stop rather than continuing; ensure this change touches the loop
that constructs req and calls client.Do so the goroutine can exit when the
provided context is canceled.

---

Outside diff comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go`:
- Around line 1004-1011: The HTTP response from httpClient.Do(req) is not
closed, which can leak connections; after the successful call to
httpClient.Do(req) (the resp variable returned) add proper cleanup by ensuring
the response body is consumed/ discarded as needed and closed (e.g., call
io.Copy(io.Discard, resp.Body) then defer resp.Body.Close() immediately after
checking err) before checking resp.StatusCode for ingressPoint health so
connections are returned to the pool.

In `@kubernetes-default-proxy/kubernetes_default_proxy.go`:
- Around line 70-84: The listener created by (&net.ListenConfig{}).Listen(ctx,
...) must be closed when ctx is cancelled so Accept() unblocks: start a
goroutine that waits for <-ctx.Done() and calls listener.Close(), and change the
Accept() error handling in the accept loop (function/method containing listener,
Accept, s.log and ctx) to stop continuing on errors caused by closure—use
errors.Is(err, net.ErrClosed) or check ctx.Err()!=nil and then return (or return
ctx.Err()) instead of logging and continue; keep logging for unexpected Accept
errors only.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 47b1656b-eed1-4f13-beb4-589d2a91e790

📥 Commits

Reviewing files that changed from the base of the PR and between 680d473 and d46eed7.

📒 Files selected for processing (98)
  • .golangci.yml
  • availability-prober/availability_prober.go
  • cmd/bastion/aws/create.go
  • cmd/cluster/core/create.go
  • cmd/infra/aws/iam.go
  • cmd/infra/aws/iam_policies.go
  • cmd/infra/aws/route53.go
  • cmd/infra/aws/util/errors.go
  • cmd/infra/azure/rbac.go
  • cmd/infra/powervs/create.go
  • cmd/infra/powervs/destroy.go
  • cmd/kubeconfig/create.go
  • cmd/nodepool/core/create.go
  • control-plane-operator/controllers/gcpprivateserviceconnect/dns.go
  • control-plane-operator/controllers/gcpprivateserviceconnect/psc_endpoint_controller.go
  • control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
  • control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller_test.go
  • control-plane-operator/controllers/hostedcontrolplane/kas/auth.go
  • control-plane-operator/controllers/hostedcontrolplane/kas_pki_setup.go
  • control-plane-operator/controllers/hostedcontrolplane/oauth/idp_convert.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/assets/assets.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/cloud_controller_manager/kubevirt/config.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/cloud_controller_manager/powervs/config.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/cno/deployment_init_container_test.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/ignitionserver/pki.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/auth.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/kubeconfig.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/oauth.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/oauth/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/oauth/idp_convert.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/olm/catalogs/deployment.go
  • control-plane-operator/endpoint-resolver/server_test.go
  • control-plane-operator/hostedclusterconfigoperator/cmd.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/resources/kas/admissionpolicies.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/resources/registry/admissionpolicies.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go
  • control-plane-operator/hostedclusterconfigoperator/operator/config.go
  • control-plane-operator/metrics-proxy/proxy_test.go
  • control-plane-pki-operator/certificatesigningcontroller/certificatesigningcontroller.go
  • control-plane-pki-operator/targetconfigcontroller/targetconfigcontroller.go
  • control-plane-pki-operator/topology/detector.go
  • etcd-backup/etcdbackup.go
  • etcd-recovery/etcdrecovery.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_webhook.go
  • hypershift-operator/controllers/hostedcluster/internal/platform/kubevirt/kubevirt_test.go
  • hypershift-operator/controllers/hostedcluster/internal/proxy/validation.go
  • hypershift-operator/controllers/nodepool/apiserver-haproxy/haproxy.go
  • hypershift-operator/controllers/nodepool/aws_test.go
  • hypershift-operator/controllers/nodepool/conditions.go
  • hypershift-operator/controllers/nodepool/config.go
  • hypershift-operator/controllers/nodepool/nodepool_controller.go
  • hypershift-operator/controllers/nodepool/nto.go
  • hypershift-operator/controllers/platform/aws/controller.go
  • hypershift-operator/controllers/platform/gcp/privateserviceconnect_controller.go
  • hypershift-operator/controllers/uwmtelemetry/uwm_telemetry.go
  • ignition-server/cmd/start.go
  • ignition-server/controllers/local_ignitionprovider.go
  • ignition-server/controllers/tokensecret_controller.go
  • karpenter-operator/controllers/karpenter/machine_approver.go
  • karpenter-operator/controllers/nodeclass/ec2_nodeclass_controller.go
  • karpenter-operator/main.go
  • kubernetes-default-proxy/kubernetes_default_proxy.go
  • kubevirtexternalinfra/externalinfra.go
  • pkg/etcdcli/etcdcli.go
  • sharedingress-config-generator/controller.go
  • sharedingress-config-generator/controller_test.go
  • sharedingress-config-generator/haproxy_client.go
  • support/azureutil/azureutil.go
  • support/controlplane-component/controlplane-component.go
  • support/controlplane-component/controlplane-component_test.go
  • support/controlplane-component/kubeconfig.go
  • support/gcpapi/gcs_client.go
  • support/konnectivityproxy/dialer.go
  • support/releaseinfo/registryclient/client.go
  • support/releaseinfo/releaseinfo.go
  • support/supportedversion/version.go
  • support/thirdparty/docker/pkg/archive/archive.go
  • support/thirdparty/library-go/pkg/image/registryclient/client.go
  • support/thirdparty/oc/pkg/cli/image/manifest/manifest.go
  • support/util/util.go
  • support/validations/authentication.go
  • test/e2e/util/aws.go
  • test/e2e/util/dump/dump.go
  • test/e2e/util/dump/journals.go
  • test/e2e/util/external_oidc.go
  • test/e2e/util/node.go
  • test/e2e/util/oauth.go
  • test/e2e/util/reqserving/verifycp.go
  • test/e2e/util/reqserving/verifypods.go
  • test/e2e/util/reqserving/vpa.go
  • test/e2e/util/reqserving/waitfor.go
  • test/e2e/util/util.go
  • test/e2e/util/version.go
  • test/integration/framework/hosted-cluster.go
✅ Files skipped from review due to trivial changes (14)
  • support/validations/authentication.go
  • support/util/util.go
  • .golangci.yml
  • control-plane-operator/controllers/hostedcontrolplane/v2/oauth/deployment.go
  • hypershift-operator/controllers/nodepool/apiserver-haproxy/haproxy.go
  • test/e2e/util/aws.go
  • hypershift-operator/controllers/uwmtelemetry/uwm_telemetry.go
  • test/e2e/util/reqserving/vpa.go
  • etcd-backup/etcdbackup.go
  • test/e2e/util/node.go
  • karpenter-operator/controllers/karpenter/machine_approver.go
  • test/e2e/util/reqserving/verifycp.go
  • cmd/nodepool/core/create.go
  • cmd/cluster/core/create.go

Comment on lines 120 to 127
for ; ; time.Sleep(sleepTime) {
response, err := client.Get(target.String())
req, err := http.NewRequestWithContext(ctx, http.MethodGet, target.String(), nil)
if err != nil {
log.Error(err, "Failed to create request, retrying...")
continue
}
response, err := client.Do(req)
if err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle context cancellation to stop the probe loop.

Line 120 currently retries forever. Once ctx is canceled, Do(req) will keep failing and this loop never exits.

Suggested fix
 	for ; ; time.Sleep(sleepTime) {
+		select {
+		case <-ctx.Done():
+			log.Info("probe canceled, exiting", "reason", ctx.Err())
+			return
+		default:
+		}
 		req, err := http.NewRequestWithContext(ctx, http.MethodGet, target.String(), nil)
 		if err != nil {
 			log.Error(err, "Failed to create request, retrying...")
 			continue
 		}
 		response, err := client.Do(req)
 		if err != nil {
+			if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
+				log.Info("probe canceled, exiting", "reason", err)
+				return
+			}
 			log.Error(err, "Request failed, retrying...")
 			continue
 		}

As per coding guidelines "Do not leak goroutines — ensure they exit cleanly".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for ; ; time.Sleep(sleepTime) {
response, err := client.Get(target.String())
req, err := http.NewRequestWithContext(ctx, http.MethodGet, target.String(), nil)
if err != nil {
log.Error(err, "Failed to create request, retrying...")
continue
}
response, err := client.Do(req)
if err != nil {
for ; ; time.Sleep(sleepTime) {
select {
case <-ctx.Done():
log.Info("probe canceled, exiting", "reason", ctx.Err())
return
default:
}
req, err := http.NewRequestWithContext(ctx, http.MethodGet, target.String(), nil)
if err != nil {
log.Error(err, "Failed to create request, retrying...")
continue
}
response, err := client.Do(req)
if err != nil {
if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
log.Info("probe canceled, exiting", "reason", err)
return
}
log.Error(err, "Request failed, retrying...")
continue
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@availability-prober/availability_prober.go` around lines 120 - 127, The
infinite retry loop around creating and sending the request (the for ; ;
time.Sleep(sleepTime) loop that calls http.NewRequestWithContext(ctx, ...) and
client.Do(req)) doesn't stop when ctx is canceled; update the loop to observe
ctx.Done() and exit cleanly: before sleeping or retrying check if ctx.Err() !=
nil (or select on ctx.Done()) and break/return when canceled, and also after a
failed client.Do(req) check ctx.Err() and stop rather than continuing; ensure
this change touches the loop that constructs req and calls client.Do so the
goroutine can exit when the provided context is canceled.

bryan-cox and others added 3 commits May 27, 2026 09:34
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@bryan-cox
Copy link
Copy Markdown
Member Author

/uncc @clebs
/cc @enxebre

@openshift-ci openshift-ci Bot requested review from enxebre and removed request for clebs May 27, 2026 14:26
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@enxebre
Copy link
Copy Markdown
Member

enxebre commented May 28, 2026

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 28, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks-4-22
/test e2e-aws-4-22
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws
/test e2e-v2-gke

@bryan-cox
Copy link
Copy Markdown
Member Author

/pipeline required

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks-4-22
/test e2e-aws-4-22
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws
/test e2e-v2-gke

@cwbotbot
Copy link
Copy Markdown

cwbotbot commented May 28, 2026

Test Results

e2e-aws

e2e-aks

@bryan-cox
Copy link
Copy Markdown
Member Author

/retest

4 similar comments
@bryan-cox
Copy link
Copy Markdown
Member Author

/retest

@bryan-cox
Copy link
Copy Markdown
Member Author

/retest

@bryan-cox
Copy link
Copy Markdown
Member Author

/retest

@bryan-cox
Copy link
Copy Markdown
Member Author

/retest

@hypershift-jira-solve-ci
Copy link
Copy Markdown

hypershift-jira-solve-ci Bot commented Jun 2, 2026

The changes are clearly lint-only fixes: %v%w, adding http.NewRequestWithContext, fixing duplicate words, etc. These are cosmetic/lint compliance changes that shouldn't affect runtime behavior. The failure is an infrastructure issue. Let me now generate the report:

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

TestCreateCluster (7197.39s): Failed to wait for HostedCluster to rollout in 30m0s — 
ClusterVersionProgressing=True: ClusterOperatorNotAvailable(Unable to apply 5.0.0-0.ci-...: 
the cluster operator monitoring is not available). Process timed out after 2h0m30s (exit 127).

TestCreateClusterHABreakGlassCredentials (1328.13s): Failed to wait for kubeconfig to be 
published in 10m0s — HostedCluster control plane never bootstrapped (EtcdAvailable=False: 
StatefulSetNotFound, KubeAPIServerAvailable=False: NotFound, Degraded=True: capi-provider 
deployment has 1 unavailable replicas).

Summary

This is an infrastructure flake unrelated to PR #8567. The PR only enables additional golangci-lint linters (dupword, durationcheck, errorlint, fatcontext, nilerr, noctx, usestdlibvars) and applies their auto-fixes (%v%w, http.NewRequestWithContext, typo corrections). None of these changes affect runtime behavior. Two independent hosted cluster failures occurred: (1) TestCreateClusterHABreakGlassCredentials — the capi-provider deployment never became available, preventing the control plane from bootstrapping (etcd StatefulSet never created, kube-apiserver never deployed), timing out after 10 minutes waiting for kubeconfig; (2) TestCreateCluster — the cluster bootstrapped and nodes joined (35min), but the monitoring cluster operator remained unavailable, keeping ClusterVersion in Partial state for 30 minutes until timeout. After the 2-hour hard deadline, GKE OIDC credentials expired causing cascading Unauthorized errors that prevented cluster teardown.

Root Cause

The root cause is GCP/GKE infrastructure instability during this CI run, manifesting as two independent hosted cluster provisioning failures:

  1. TestCreateClusterHABreakGlassCredentials — The capi-provider deployment had 1 unavailable replica from the start (Degraded=True: UnavailableReplicas). This single upstream failure prevented the entire control plane from bootstrapping: etcd StatefulSet was never created (StatefulSetNotFound), kube-apiserver was never deployed (NotFound), ignition server was never deployed (NotFound), and consequently no kubeconfig was ever published (KubeconfigWaitingForCreate). The test timed out after 10 minutes waiting for kubeconfig.

  2. TestCreateCluster — This cluster progressed further: kubeconfig was published (3m36s), guest API server connected (1m18s), 2 nodes became ready (35m21s). However, the ClusterVersion rollout stalled because the monitoring cluster operator never became available: ClusterVersionProgressing=True: ClusterOperatorNotAvailable(Unable to apply 5.0.0-0.ci-...: the cluster operator monitoring is not available). The rollout stayed in Partial state for the entire 30-minute wait window.

  3. Cascading credential expiry — The 2-hour test execution hard limit was hit at 13:43:08Z. By this point, the GKE OIDC token had expired, causing all subsequent API calls to return Unauthorized. This prevented cluster teardown/dump/cleanup, flooding the logs with hundreds of failed to dump cluster: Unauthorized messages and ultimately causing the test step to exit with code 127 (process timed out).

  4. Controller-runtime cache sync failure — At 11:45:06Z (2 minutes into test execution), the e2e observer controller failed: "failed to wait for pod caches to sync: timed out waiting for cache to be synced for Kind *v1.Pod". This suggests the management GKE cluster was experiencing API server pressure or network issues early in the run.

Why this is unrelated to PR #8567: The PR changes are exclusively lint-compliance fixes — converting %v to %w in error wrapping, adding context to HTTP requests, fixing duplicate words in comments, and similar mechanical transformations. None of these changes affect control plane bootstrapping, cluster operator behavior, or GKE credential handling.

Recommendations
  1. Retest the PR — This failure is infrastructure flake. Trigger a /retest on the PR to re-run the e2e-gke job.

  2. No code changes needed — The PR's golangci-lint changes are not related to this failure. The monitoring cluster operator unavailability and capi-provider deployment issues are GCP infrastructure problems.

  3. If failure recurs — Check whether the e2e-gke job has elevated flake rates by reviewing recent runs of pull-ci-openshift-hypershift-main-e2e-gke on other PRs. If this pattern (monitoring operator stuck, capi-provider unavailable) repeats across PRs, it may indicate a systemic GKE CI environment issue that should be reported to the CI infrastructure team.

  4. Consider the 2h timeout — The test spent ~50 minutes on actual validation before hitting the rollout timeout, then ~50 minutes in teardown retry loops. The cascading Unauthorized errors during teardown are wasteful — the teardown loop could benefit from a credential-expiry check to fail fast rather than retrying for 50 minutes.

Evidence
Evidence Detail
Failed Test Step hypershift-gcp-run-e2e (exit code 127 — process timed out after 2h0m30s)
Test 1 TestCreateClusterHABreakGlassCredentials — FAIL after 1328.13s; ValidateHostedCluster timed out at 600s
Test 1 Root Cause capi-provider deployment unavailable → control plane never bootstrapped
Test 1 Conditions EtcdAvailable=False: StatefulSetNotFound, KubeAPIServerAvailable=False: NotFound, Degraded=True: UnavailableReplicas(capi-provider)
Test 2 TestCreateCluster — FAIL after 7197.39s; ValidateHostedCluster 4215.35s, Teardown 2976.93s
Test 2 Root Cause monitoring cluster operator not available → ClusterVersion stuck in Partial state for 30min
Test 2 Condition ClusterVersionProgressing=True: ClusterOperatorNotAvailable(the cluster operator monitoring is not available)
Credential Expiry Unauthorized errors begin after ~2h (GKE OIDC token lifetime exceeded)
Cache Sync Failure 11:45:06Zfailed to wait for pod caches to sync: timed out waiting for cache to be synced for Kind *v1.Pod
PR Nature Lint-only: enables dupword, durationcheck, errorlint, fatcontext, nilerr, noctx, usestdlibvars linters; 100 files changed with mechanical fixes (%v%w, NewRequestWithContext, typo fixes)
Post-step failures dump and hypershift-k8sgpt steps also failed with Unauthorized (consequence of expired credentials, not independent failures)

@bryan-cox
Copy link
Copy Markdown
Member Author

/verified by lint

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Jun 2, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@bryan-cox: This PR has been marked as verified by lint.

Details

In response to this:

/verified by lint

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 2, 2026

@bryan-cox: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gke 7d7f0b0 link false /test e2e-gke

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit eb04f61 into openshift:main Jun 2, 2026
43 of 44 checks passed
@bryan-cox bryan-cox deleted the add-linter branch June 2, 2026 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cli Indicates the PR includes changes for CLI area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/control-plane-pki-operator Indicates the PR includes changes for the control plane PKI operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/karpenter-operator Indicates the PR includes changes related to the Karpenter operator area/platform/aws PR/issue for AWS (AWSPlatform) platform area/platform/azure PR/issue for Azure (AzurePlatform) platform area/platform/gcp PR/issue for GCP (GCPPlatform) platform area/platform/kubevirt PR/issue for KubeVirt (KubevirtPlatform) platform area/platform/powervs PR/issue for PowerVS (PowerVSPlatform) platform area/testing Indicates the PR includes changes for e2e testing jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants