Resume a paused Redshift cluster before deleting it#68922
Open
seanghaeli wants to merge 1 commit into
Open
Conversation
A ``paused`` Redshift cluster cannot be deleted: ``delete_cluster`` raises
``InvalidClusterStateFault`` ("There is an operation running on the Cluster"),
and ``RedshiftDeleteClusterOperator``'s retry loop cannot recover because a
paused cluster never leaves that state on its own. The retries exhaust and the
cluster is left behind -- silently leaked until external cleanup reaps it.
Resume the cluster first when it is paused (and wait until it is ``available``)
before issuing the delete. Clusters that are not paused are unaffected.
Generated-by: Claude Code (Opus via Claude Code) on behalf of Sean Ghaeli
o-nikolas
reviewed
Jun 24, 2026
| def execute(self, context: Context): | ||
| # A paused cluster cannot be deleted; resume it first (otherwise the retry loop below | ||
| # would exhaust against InvalidClusterStateFault and the cluster would be leaked). | ||
| self._resume_if_paused() |
Contributor
There was a problem hiding this comment.
Should we do something to ensure this stays transactional? If we resume the cluster, then fail sometime between now and the deletion then the cluster is now running unexpectedly when the user thought it was paused (essentially the inverse of the situation that we find ourselves in now).
We should at least make this an opt in perhaps if we can't ensure the operation is transactional.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
A
pausedRedshift cluster cannot be deleted.delete_clusterraises:RedshiftDeleteClusterOperatoralready retries this error, but the retry cannot recover a paused cluster — a paused cluster never leaves that state on its own, so every attempt hits the same fault. The retries exhaust and the cluster is left behind, silently leaked until external cleanup reaps it.This was observed in practice with the
example_redshiftsystem test: a cluster left in thepausedphase (e.g. after an upstream task failed beforeresume_cluster) was never deleted by thedelete_clusterteardown task, and accumulated as a stale resource.What
Add
_resume_if_paused()and call it at the start ofexecute(): if the cluster ispaused, resume it and wait foravailablebefore issuing the delete. Clusters in any other state are unaffected (early return), and the existing busy-retry loop for transientInvalidClusterStateFaultduring deletion is unchanged.Tests
test_delete_paused_cluster_resumes_first— a paused cluster is resumed, waited on (cluster_available), then deleted.test_delete_available_cluster_does_not_resume— a non-paused cluster is deleted directly, with no spurious resume.Verified locally in Breeze: all
TestDeleteClusterOperatortests pass.Generated-by: Claude Code (Opus via Claude Code) on behalf of Sean Ghaeli