Skip to content

CNTRLPLANE-3324: Restore non-obvious comments after gocyclo refactor#8487

Merged
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
bryan-cox:restore-gocyclo-comments
May 12, 2026
Merged

CNTRLPLANE-3324: Restore non-obvious comments after gocyclo refactor#8487
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
bryan-cox:restore-gocyclo-comments

Conversation

@bryan-cox

@bryan-cox bryan-cox commented May 12, 2026

Copy link
Copy Markdown
Member

Summary

Changes by area

control-plane-operator/

  • awsprivatelink_controller.go: Stale-status endpoint adoption, EndpointID clearing, requeue error, subnet/SG ensure, ExternalName services
  • azureprivatelinkservice/controller.go: DNS deletion using zone name from status, base domain zone sharing guard, OAuth sibling check, PE deterministic name deletion
  • hostedcontrolplane_controller.go: Service-CA serving cert upgrade compatibility (Multus/NetworkNodeIdentity/OVN), release image digest resolution, controlPlaneVersion partial status
  • oauth/idp_convert.go: IBM Cloud OIDC special case, GitLab legacy OIDC, Keystone identity, unsafe ID claim, challenge-redirecting IdPs
  • v2/oauth/idp_convert.go: Fix //Handle// Handle formatting
  • main.go: HCCO image detection, registry overrides nil check, empty string anomaly, release provider lookup
  • resources/resources.go: DNS operator uncached client, ARO HCP cleanup, CIRO/PVC bootstrap, ROSA HCP TODO, data collection, MTU

hypershift-operator/

  • scheduler/aws/autoscaler.go: Pair label algorithm (7 blocks), collectTakenPairLabels doc, available nodes/machinesets criteria, paired machineset scale-up
  • scheduler/aws/dedicated_request_serving_nodes.go: Node count check, placeholder deletion, unlabeled node backfill, subnet format
  • nodepool/capi.go: CAPI webhook defaults, HOSTEDCP-971 label propagation, globalPS managed label, MachineDeployment annotations with CAPI docs link, interruptible label, OpenStack/GCP FailureDomain, version check
  • nodepool/aws.go: IMDS hop limit (AWS link), market type precedence, CapacityBlock default, Windows AMI, capacity reservation
  • hostedcluster/network_policies.go: VirtLauncher external infra policy (9-line doc), RBAC gap for networks.config, management cluster ingress, TODO for CPO migration
  • hostedcluster/metrics/metrics.go: ARO-HCP link, upgrade history, credential status encoding, proxy CA validity
  • nodepool/metrics/metrics.go: Autoscaling Replicas field usage, Karpenter vCPU seeding, loop labels
  • hostedclustersizing_controller.go: T-shirt sizing config validity, status/label split, SLA timing, transition delay, size class fallback
  • etcd_recovery.go: Intermittent failure handling, statefulset availability
  • auditlogpersistence/snapshot_controller.go: Annotation corruption handling, spec defaults copy, intentional retention error swallowing
  • platform/aws/controller.go: Obsolete endpoint service cleanup, EndpointServiceName clearing
  • main.go: Azure credential modes, webhook cleanup, UWM controller, ARO HCP skip, container status sha256

ignition-server/

  • local_ignitionprovider.go: MCO/MCC/MCS pipeline flow, version flags, disconnected overrides, TLS cert caching, MCS hard-coded TLS requirement, hash matching NOTE

cmd/

  • cluster/core/create.go: Release image defaulting, pausedUntil validation, RFC1123, feature set
  • install/install.go: GCP Workload Identity, scale-from-zero validation, platform validation
  • install/assets/hypershift_operator.go: Azure Workload Identity webhook explanation

Context

PR #8309 enabled the gocyclo linter and reduced cyclomatic complexity by extracting helper functions. Review feedback from @enxebre flagged that the refactor removed valuable comments documenting non-obvious behavior. Every removed comment line was classified as either non-obvious (restore) or redundant with the now-self-documenting extracted function names (skip).

A detailed HTML report documenting every removed comment and its disposition is available in the branch at comment-restoration-report.html (not committed).

Test plan

  • go build ./... passes
  • Comment-only changes — no behavioral impact

🤖 Generated with Claude Code

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 12, 2026
@openshift-ci-robot

openshift-ci-robot commented May 12, 2026

Copy link
Copy Markdown

@bryan-cox: This pull request references CNTRLPLANE-3324 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

Changes

  • awsprivatelink_controller.go: Add missing comment explaining why an error is returned after VPC Endpoint deletion (to trigger reconciliation requeue)
  • oauth/idp_convert.go (v1 and v2): Fix //Handle// Handle comment formatting for IBM Cloud OIDC special case
  • hostedcluster/metrics/metrics.go: Add ARO-HCP OpenAPI spec link for HostedClusterManagedAzureResourceType constant
  • hostedclustersizing_controller.go: Improve comment line wrapping; move "can't update status and labels in one call" comment before the condition it documents
  • nodepool/capi.go: Fix typo ("an" → "and") and improve line wrapping in CAPI 1.4.0 TODO comment

Context

PR #8309 enabled the gocyclo linter and reduced cyclomatic complexity by extracting helper functions across ~35 files. Review feedback from @enxebre flagged that the refactor removed valuable comments documenting non-obvious behavior. After systematic analysis of all ~600 removed comment lines, the squash merge was found to have preserved the vast majority. This PR addresses the remaining gaps.

Test plan

  • go build ./... passes
  • Comment-only changes — no behavioral impact

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. do-not-merge/needs-area labels May 12, 2026
@coderabbitai

coderabbitai Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This pull request adds and refines inline and block comments across many controllers and command files (control-plane-operator, hypershift-operator, ignition-server, and several cmd packages). Changes clarify reconciliation assumptions, finalizer/deletion behavior, IDP conversion notes (including IBM Cloud/OpenID), metrics encoding/semantics, scheduler/autoscaler decision rationale, installer flags (Azure Workload Identity, GCP, external-dns), and other implementation intents. No functional logic, control flow, exported API signatures, or runtime behavior were changed.

Suggested reviewers

  • enxebre
  • hasueki
🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 15.38% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (11 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: restoring non-obvious comments after a gocyclo refactor, which is the primary purpose of this PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed Not applicable: PR only adds documentation comments to non-test files. No test files were modified, no Ginkgo test names added or changed.
Test Structure And Quality ✅ Passed Check not applicable: PR modifies only production code with comment/documentation changes. No Ginkgo test files (_test.go) were modified. Custom check targets test code quality.
Microshift Test Compatibility ✅ Passed This PR does not add any new Ginkgo e2e tests. All 24 modified files are implementation/controller files with comment-only changes. The MicroShift Test Compatibility check is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No Ginkgo e2e tests added. PR contains only comment updates to controller and CLI code. The SNO test compatibility check is not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed This PR only adds documentation comments. No scheduling constraints, affinity rules, node selectors, or replica count changes are introduced. The changes document existing behavior.
Ote Binary Stdout Contract ✅ Passed Comment-only PR with no stdout writes in process-level code. stdout writes are in cobra command handlers (application-level), not main/init/suite setup.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No new Ginkgo e2e tests added in this PR. All changes are comment-only modifications to source code files (controllers, operators, CLI commands). Check is not applicable.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/aws PR/issue for AWS (AWSPlatform) platform and removed do-not-merge/needs-area labels May 12, 2026
@openshift-ci

openshift-ci Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bryan-cox

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 12, 2026
@openshift-ci openshift-ci Bot requested review from clebs and sdminonne May 12, 2026 09:53
@bryan-cox bryan-cox force-pushed the restore-gocyclo-comments branch from 73d3ea4 to 31bd1b7 Compare May 12, 2026 09:57
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 12, 2026
@codecov

codecov Bot commented May 12, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 85.45455% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 39.90%. Comparing base (b7c62b0) to head (7b92a2f).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
...ition-server/controllers/local_ignitionprovider.go 0.00% 5 Missing ⚠️
hypershift-operator/main.go 0.00% 4 Missing ⚠️
...ollers/awsprivatelink/awsprivatelink_controller.go 40.00% 3 Missing ⚠️
...t-operator/controllers/nodepool/metrics/metrics.go 0.00% 2 Missing ⚠️
...perator/controllers/hostedcluster/etcd_recovery.go 0.00% 1 Missing ⚠️
...ft-operator/controllers/platform/aws/controller.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8487      +/-   ##
==========================================
+ Coverage   39.85%   39.90%   +0.05%     
==========================================
  Files         751      751              
  Lines       92556    92661     +105     
==========================================
+ Hits        36888    36977      +89     
- Misses      52994    53010      +16     
  Partials     2674     2674              
Files with missing lines Coverage Δ
cmd/cluster/core/create.go 54.26% <100.00%> (+0.25%) ⬆️
cmd/install/assets/hypershift_operator.go 48.10% <ø> (ø)
cmd/install/install.go 61.77% <100.00%> (+0.12%) ⬆️
.../controllers/azureprivatelinkservice/controller.go 92.12% <100.00%> (+0.07%) ⬆️
...ostedcontrolplane/hostedcontrolplane_controller.go 45.30% <100.00%> (+0.17%) ⬆️
...ontrollers/hostedcontrolplane/oauth/idp_convert.go 61.10% <100.00%> (+0.34%) ⬆️
...rollers/hostedcontrolplane/v2/oauth/idp_convert.go 64.84% <100.00%> (ø)
...rconfigoperator/controllers/resources/resources.go 55.37% <100.00%> (+0.09%) ⬆️
...rollers/auditlogpersistence/snapshot_controller.go 62.13% <ø> (ø)
...rator/controllers/hostedcluster/metrics/metrics.go 92.73% <100.00%> (+0.06%) ⬆️
... and 12 more
Flag Coverage Δ
cmd-support 34.05% <100.00%> (+0.01%) ⬆️
cpo-hostedcontrolplane 40.56% <100.00%> (+0.04%) ⬆️
cpo-other 40.14% <82.35%> (+0.05%) ⬆️
hypershift-operator 50.47% <87.30%> (+0.09%) ⬆️
other 30.68% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bryan-cox bryan-cox force-pushed the restore-gocyclo-comments branch from 31bd1b7 to beaddff Compare May 12, 2026 10:06
@openshift-ci openshift-ci Bot added area/cli Indicates the PR includes changes for CLI area/platform/azure PR/issue for Azure (AzurePlatform) platform labels May 12, 2026
@bryan-cox bryan-cox force-pushed the restore-gocyclo-comments branch 2 times, most recently from e9904ed to 7a558c5 Compare May 12, 2026 10:55
Restore and improve comments that document hidden constraints, edge
cases, and design decisions across files affected by the gocyclo
complexity refactor in PR openshift#8309.

Changes:
- Add missing requeue comment in AWS private link deletion flow
- Restore OAuth IDP conversion comments: GitLab OIDC requirement,
  Keystone ID, IBM Cloud special case, ID claim safety, challenge
  flow validation constraint
- Fix comment formatting (//Handle -> // Handle) in v2 OAuth IDP
- Add ARO-HCP OpenAPI spec link for Azure resource type constant
- Restore CAPI label propagation doc comment with TODO and JIRA link
  (HOSTEDCP-971) explaining why labels/taints bypass rolling upgrades
- Restore globalPS DaemonSet scheduling comment

Signed-off-by: Bryan Cox <brcox@redhat.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@bryan-cox bryan-cox force-pushed the restore-gocyclo-comments branch from 7a558c5 to 7b92a2f Compare May 12, 2026 11:20
@enxebre

enxebre commented May 12, 2026

Copy link
Copy Markdown
Member

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 12, 2026
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks-4-22
/test e2e-aws-4-22
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws

@bryan-cox

Copy link
Copy Markdown
Member Author

/verified bypass

This is just adding comments back in. No functional code changes.

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label May 12, 2026
@openshift-ci-robot

Copy link
Copy Markdown

@bryan-cox: The verified label has been added.

Details

In response to this:

/verified bypass

This is just adding comments back in. No functional code changes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@bryan-cox bryan-cox closed this May 12, 2026
@bryan-cox bryan-cox deleted the restore-gocyclo-comments branch May 12, 2026 13:11
@bryan-cox

Copy link
Copy Markdown
Member Author

/open

@bryan-cox

Copy link
Copy Markdown
Member Author

/reopen

@hypershift-jira-solve-ci

Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aks | Build: 2054178982760288256 | Cost: $2.42316275 | Failed step: hypershift-azure-run-e2e

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@openshift-ci

openshift-ci Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

@bryan-cox: Failed to re-open PR: state cannot be changed. The restore-gocyclo-comments branch has been deleted.

Details

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@bryan-cox bryan-cox restored the restore-gocyclo-comments branch May 12, 2026 13:15
@bryan-cox bryan-cox reopened this May 12, 2026
@openshift-ci-robot

openshift-ci-robot commented May 12, 2026

Copy link
Copy Markdown

@bryan-cox: This pull request references CNTRLPLANE-3324 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

Changes by area

control-plane-operator/

  • awsprivatelink_controller.go: Stale-status endpoint adoption, EndpointID clearing, requeue error, subnet/SG ensure, ExternalName services
  • azureprivatelinkservice/controller.go: DNS deletion using zone name from status, base domain zone sharing guard, OAuth sibling check, PE deterministic name deletion
  • hostedcontrolplane_controller.go: Service-CA serving cert upgrade compatibility (Multus/NetworkNodeIdentity/OVN), release image digest resolution, controlPlaneVersion partial status
  • oauth/idp_convert.go: IBM Cloud OIDC special case, GitLab legacy OIDC, Keystone identity, unsafe ID claim, challenge-redirecting IdPs
  • v2/oauth/idp_convert.go: Fix //Handle// Handle formatting
  • main.go: HCCO image detection, registry overrides nil check, empty string anomaly, release provider lookup
  • resources/resources.go: DNS operator uncached client, ARO HCP cleanup, CIRO/PVC bootstrap, ROSA HCP TODO, data collection, MTU

hypershift-operator/

  • scheduler/aws/autoscaler.go: Pair label algorithm (7 blocks), collectTakenPairLabels doc, available nodes/machinesets criteria, paired machineset scale-up
  • scheduler/aws/dedicated_request_serving_nodes.go: Node count check, placeholder deletion, unlabeled node backfill, subnet format
  • nodepool/capi.go: CAPI webhook defaults, HOSTEDCP-971 label propagation, globalPS managed label, MachineDeployment annotations with CAPI docs link, interruptible label, OpenStack/GCP FailureDomain, version check
  • nodepool/aws.go: IMDS hop limit (AWS link), market type precedence, CapacityBlock default, Windows AMI, capacity reservation
  • hostedcluster/network_policies.go: VirtLauncher external infra policy (9-line doc), RBAC gap for networks.config, management cluster ingress, TODO for CPO migration
  • hostedcluster/metrics/metrics.go: ARO-HCP link, upgrade history, credential status encoding, proxy CA validity
  • nodepool/metrics/metrics.go: Autoscaling Replicas field usage, Karpenter vCPU seeding, loop labels
  • hostedclustersizing_controller.go: T-shirt sizing config validity, status/label split, SLA timing, transition delay, size class fallback
  • etcd_recovery.go: Intermittent failure handling, statefulset availability
  • auditlogpersistence/snapshot_controller.go: Annotation corruption handling, spec defaults copy, intentional retention error swallowing
  • platform/aws/controller.go: Obsolete endpoint service cleanup, EndpointServiceName clearing
  • main.go: Azure credential modes, webhook cleanup, UWM controller, ARO HCP skip, container status sha256

ignition-server/

  • local_ignitionprovider.go: MCO/MCC/MCS pipeline flow, version flags, disconnected overrides, TLS cert caching, MCS hard-coded TLS requirement, hash matching NOTE

cmd/

  • cluster/core/create.go: Release image defaulting, pausedUntil validation, RFC1123, feature set
  • install/install.go: GCP Workload Identity, scale-from-zero validation, platform validation
  • install/assets/hypershift_operator.go: Azure Workload Identity webhook explanation

Context

PR #8309 enabled the gocyclo linter and reduced cyclomatic complexity by extracting helper functions. Review feedback from @enxebre flagged that the refactor removed valuable comments documenting non-obvious behavior. Every removed comment line was classified as either non-obvious (restore) or redundant with the now-self-documenting extracted function names (skip).

A detailed HTML report documenting every removed comment and its disposition is available in the branch at comment-restoration-report.html (not committed).

Test plan

  • go build ./... passes
  • Comment-only changes — no behavioral impact

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@bryan-cox

Copy link
Copy Markdown
Member Author

/retest

@bryan-cox

Copy link
Copy Markdown
Member Author

Claude mistakenly closed this PR before.

@hypershift-jira-solve-ci

Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aws | Build: 2054178982856757248 | Cost: $2.4558885000000013 | Failed step: hypershift-aws-run-e2e-nested

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@bryan-cox

Copy link
Copy Markdown
Member Author

/pipeline required

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks-4-22
/test e2e-aws-4-22
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws
/test e2e-v2-gke

@hypershift-jira-solve-ci

Copy link
Copy Markdown

I have all the evidence needed. All 8 jobs share the exact same root cause: CI infrastructure pod scheduling timeout on the build01 cluster. Here is the report:

Test Failure Analysis Complete

Job Information

  • Prow Jobs: All 8 presubmit jobs for PR CNTRLPLANE-3324: Restore non-obvious comments after gocyclo refactor #8487 (openshift/hypershift)
  • Build IDs: 2054255060883345408, 2054255067413876736, 2054255064905682944, 2054255069930459136, 2054255074972012544, 2054255072442847232, 2054255079988400128, 2054255083356426240
  • Affected Jobs: e2e-aks, e2e-aws, e2e-aks-4-22, e2e-aws-4-22, e2e-azure-self-managed, e2e-aws-upgrade-hypershift-operator, e2e-v2-aws, e2e-v2-gke
  • Cluster: build01
  • Start Time: 2026-05-12T17:39:06Z (all 8 triggered simultaneously)
  • End Time: ~2026-05-12T18:09:08Z (all timed out after exactly 30 minutes)

Test Failure Analysis

Error

Pod scheduling timeout.

0/91 nodes are available: 1 node(s) didn't satisfy existing pods anti-affinity rules,
1 node(s) had untolerated taint {node-role.kubernetes.io/ci-builds-tmpfs-worker: ci-builds-tmpfs-worker},
1 node(s) had untolerated taint {node-role.kubernetes.io/ci-builds-worker: ci-builds-worker},
17 node(s) didn't match Pod's node affinity/selector,
2 node(s) had untolerated taint {node-role.kubernetes.io/ci-longtests-worker: ci-longtests-worker},
3 Insufficient memory,
3 node(s) had untolerated taint {node-role.kubernetes.io/infra: },
3 node(s) had untolerated taint {node-role.kubernetes.io/master: },
60 node(s) had untolerated taint {node-role.kubernetes.io/ci-tests-worker: ci-tests-worker}.
preemption: 0/91 nodes are available: 1 Insufficient memory, 1 node(s) didn't satisfy
existing pods anti-affinity rules, 2 No preemption victims found for incoming pod,
87 Preemption is not helpful for scheduling.

Summary

All 8 jobs failed identically due to a CI infrastructure capacity issue — not a problem with the PR code. Every job pod was created on the build01 cluster at 17:39:06Z and remained in Pending/Unschedulable state for 30 minutes until Prow's pod scheduling timeout killed them. The pods required nodes with ci-workload=prowjobs node selector and toleration for ci-prowjobs-worker taint, but of the 91 nodes in the cluster, none were available: 60 were ci-tests-worker nodes (wrong role), 17 didn't match pod affinity/selector, 3 had insufficient memory, and the remaining were infrastructure/master/specialized nodes. Preemption could not help either. No test code ever executed — no build-log.txt was produced. This is a transient CI infrastructure problem unrelated to PR #8487's changes.

Root Cause

The root cause is CI infrastructure resource exhaustion on the build01 cluster, specifically on the ci-prowjobs-worker node pool. This is entirely a CI platform issue and has no relationship to the PR's code changes (restoring comments after a gocyclo refactor).

Detailed breakdown of the 91-node cluster at failure time:

  1. 60 nodes — tainted as ci-tests-worker (pods only tolerate ci-prowjobs-worker)
  2. 17 nodes — didn't match the pod's node affinity/selector (ci-workload=prowjobs + kubernetes.io/arch in [amd64, arm64])
  3. 3 nodes — had insufficient memory (the eligible prowjob worker nodes were overcommitted)
  4. 3 nodes — infrastructure nodes (taint node-role.kubernetes.io/infra)
  5. 3 nodes — master nodes (taint node-role.kubernetes.io/master)
  6. 2 nodes — longtests workers (taint ci-longtests-worker)
  7. 1 node — builds-tmpfs worker
  8. 1 node — builds worker
  9. 1 node — blocked by pod anti-affinity rules

The few nodes that would normally schedule prowjobs (the ci-prowjobs-worker pool) were either memory-exhausted or blocked by anti-affinity. With all 8 jobs triggered simultaneously at 17:39:06Z, they competed for the same scarce resources. Kubernetes preemption was also unable to free capacity — 87 nodes were not helpful for preemption, and the remaining candidates had insufficient memory or anti-affinity conflicts.

The 30-minute scheduling timeout is a Prow-level safety net that terminates pods that cannot be scheduled within a reasonable window, preventing indefinite queuing.

Recommendations
  1. Retest the PR — simply re-trigger the jobs with /retest or /test all. The scheduling pressure is transient and typically resolves as other jobs complete and free up node capacity.

  2. No code changes needed — PR CNTRLPLANE-3324: Restore non-obvious comments after gocyclo refactor #8487 restores non-obvious comments after a gocyclo refactor. The failure is entirely in CI infrastructure; the test code was never executed.

  3. If retests continue to fail with the same pod scheduling timeout, escalate to the CI infrastructure team (OpenShift DPTP / Test Platform) to investigate capacity on the build01 cluster's ci-prowjobs-worker node pool. The issue would indicate sustained oversubscription rather than a transient spike.

Evidence
Evidence Detail
Failure mode Pod scheduling timeout — identical across all 8 jobs
Cluster build01
Pod state PendingFailed (never reached Running)
Scheduling condition PodScheduled=False, reason: Unschedulable
Node count 0/91 nodes available for scheduling
Primary blocker 60/91 nodes tainted ci-tests-worker (pods need ci-prowjobs-worker)
Secondary blocker 3 eligible nodes had insufficient memory
Preemption result 0/91 nodes available for preemption (87 not helpful, 1 insufficient memory, 1 anti-affinity, 2 no victims)
Pod node selector ci-workload: prowjobs
Pod toleration node-role.kubernetes.io/ci-prowjobs-worker
All jobs started 2026-05-12T17:39:06Z (simultaneous)
All jobs timed out ~2026-05-12T18:09:08Z (exactly 30m later)
Build log Not generated (pod never ran)
Artifacts produced Only prowjob.json, podinfo.json, started.json, finished.json

@bryan-cox

Copy link
Copy Markdown
Member Author

/retest

@cwbotbot

cwbotbot commented May 12, 2026

Copy link
Copy Markdown

Test Results

e2e-aks

e2e-aws

@bryan-cox

Copy link
Copy Markdown
Member Author

/test images

@openshift-ci

openshift-ci Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

@bryan-cox: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit 76bbefc into openshift:main May 12, 2026
61 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cli Indicates the PR includes changes for CLI area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/aws PR/issue for AWS (AWSPlatform) platform area/platform/azure PR/issue for Azure (AzurePlatform) platform jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants