Skip to content

no-jira: GCP Destroy: Group Load Balancer Resources & Retry on Errors#10586

Open
patrickdillon wants to merge 1 commit into
openshift:mainfrom
patrickdillon:gcp-destroy
Open

no-jira: GCP Destroy: Group Load Balancer Resources & Retry on Errors#10586
patrickdillon wants to merge 1 commit into
openshift:mainfrom
patrickdillon:gcp-destroy

Conversation

@patrickdillon
Copy link
Copy Markdown
Contributor

@patrickdillon patrickdillon commented Jun 1, 2026

Groups GCP Load Balancer resources together so they are destroyed before attempting to destroy subsequent, dependent resources.

Retry delete operations when an internal error is encountered instead of continually checking the operation with an internal error.

Fixes #10584

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced error handling for cloud operations during cluster destruction. The system now detects transient internal errors and automatically retries affected operations, improving overall reliability.
    • Optimized resource deletion sequence to ensure load balancer components are properly cleaned up before instance groups, resulting in more consistent and reliable cluster teardown.

Groups GCP Load Balancer resources together so they are destroyed
before attempting to destroy subsequent, dependent resources.

Retry delete operations when an internal error is encountered
instead of continually checking the operation with an internal error.
@patrickdillon patrickdillon changed the title GCP Destroy: Group Load Balancer Resources & Retry on Errors no-jira: GCP Destroy: Group Load Balancer Resources & Retry on Errors Jun 1, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1c7bc053-f710-41de-a919-04bda8a68da6

📥 Commits

Reviewing files that changed from the base of the PR and between e871a5d and fe4e7a8.

📒 Files selected for processing (1)
  • pkg/destroy/gcp/gcp.go

Walkthrough

The PR restructures GCP resource destruction to respect load-balancer dependency chains and adds detection for internal errors during operation polling. Load balancer frontends (forwarding rules, target pools, proxies) are now deleted before backends (services, health checks), which are deleted before instance groups. A new helper detects internal errors on incomplete operations and triggers retryable failures.

Changes

GCP Resource Destruction Improvements

Layer / File(s) Summary
Load Balancer Deletion Stage Reordering
pkg/destroy/gcp/gcp.go
The stagedFuncs pipeline is rebuilt to separate and sequence load-balancer deletions: forwarding rules, target pools, and target TCP proxies are deleted before backend services, health checks, and HTTP health checks, which are deleted before instance groups. This respects GCP's enforced dependency chain and prevents resource-in-use failures.
Internal Error Detection and Retryable Handling
pkg/destroy/gcp/gcp.go
A hasInternalError helper detects INTERNAL_ERROR codes in operation error responses. The handleOperation function now treats non-DONE operations with internal errors as retryable by resetting request IDs, suppressing warnings, and returning a failure message that triggers retry logic.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 15
✅ Passed checks (15 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the two main changes: grouping load balancer resources and adding retry logic on errors, which aligns with the PR objectives.
Linked Issues check ✅ Passed Changes implement the suggested fix from #10584: reordering load balancer deletion stages and adding internal error retry logic to handle operation failures.
Out of Scope Changes check ✅ Passed All changes are scoped to GCP destroy logic and directly address the dependency ordering and error handling issues described in #10584.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Stable And Deterministic Test Names ✅ Passed Check is not applicable; PR modifies only pkg/destroy/gcp/gcp.go (non-test file). No Ginkgo tests exist in the repository or were added.
Test Structure And Quality ✅ Passed PR modifies only gcp.go and non-Ginkgo test files. The destroy package tests use standard Go testing (t.Run), not Ginkgo. Custom check is inapplicable.
Microshift Test Compatibility ✅ Passed No new Ginkgo e2e tests are added in this PR. The changes are limited to GCP destroy logic in pkg/destroy/gcp/gcp.go, which is production code without any e2e test components.
Single Node Openshift (Sno) Test Compatibility ✅ Passed This PR modifies only pkg/destroy/gcp/gcp.go, a non-test provider file with no Ginkgo test definitions. The SNO compatibility check applies only to new e2e tests, which are not present in this PR.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies GCP cluster destruction logic only. No deployment manifests, operator code, or controllers with scheduling constraints are added or modified—check not applicable.
Ote Binary Stdout Contract ✅ Passed File is part of openshift-install CLI tool, not OTE binary. No stdout writes detected (no fmt.Print*, log.Print*, or klog calls). Check not applicable.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR modifies only pkg/destroy/gcp/gcp.go (implementation code), not Ginkgo e2e tests. Custom check for IPv6/disconnected network compatibility does not apply.
No-Weak-Crypto ✅ Passed No weak cryptography detected. PR adds error handling and reorders GCP resource deletion stages; no MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB, custom crypto, or insecure secret comparisons found.
Container-Privileges ✅ Passed PR modifies only Go source files in pkg/destroy/gcp/ for cluster destruction logic. No K8s manifests, container configs, or YAML files are present or modified.
No-Sensitive-Data-In-Logs ✅ Passed Logging added for GCP operation errors contains diagnostic info (error codes and resource names) from standard GCP API responses, which do not include passwords, tokens, keys, or PII.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 1, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@patrickdillon: This pull request explicitly references no jira issue.

Details

In response to this:

Groups GCP Load Balancer resources together so they are destroyed before attempting to destroy subsequent, dependent resources.

Retry delete operations when an internal error is encountered instead of continually checking the operation with an internal error.

Fixes #10584

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested review from barbacbd and rna-afk June 1, 2026 23:34
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 1, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign patrickdillon for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 2, 2026

@patrickdillon: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-xpn-dedicated-dns-project fe4e7a8 link false /test e2e-gcp-xpn-dedicated-dns-project
ci/prow/e2e-aws-ovn fe4e7a8 link true /test e2e-aws-ovn
ci/prow/e2e-gcp-ovn-byo-vpc fe4e7a8 link false /test e2e-gcp-ovn-byo-vpc
ci/prow/e2e-gcp-xpn-custom-dns fe4e7a8 link false /test e2e-gcp-xpn-custom-dns
ci/prow/e2e-gcp-ovn-xpn fe4e7a8 link false /test e2e-gcp-ovn-xpn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Comment thread pkg/destroy/gcp/gcp.go
}

func hasInternalError(op *compute.Operation) bool {
if op.Error == nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if op.Error == nil {
if op == nil || op.Error == nil {

Comment thread pkg/destroy/gcp/gcp.go
return false
}
for _, e := range op.Error.Errors {
if e.Code == "INTERNAL_ERROR" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if e.Code == "INTERNAL_ERROR" {
if e != nil && e.Code == "INTERNAL_ERROR" {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we define "INTERNAL_ERROR" as a const for consistancy?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GCP destroy: instance group deletion fails due to dependency ordering with backend services

3 participants