Resume a paused Redshift cluster before deleting it by seanghaeli · Pull Request #68922 · apache/airflow

seanghaeli · 2026-06-23T23:23:59Z

Why

A paused Redshift cluster cannot be deleted. delete_cluster raises:

InvalidClusterStateFault: There is an operation running on the Cluster. Please try to delete it at a later time.

RedshiftDeleteClusterOperator already retries this error, but the retry cannot recover a paused cluster — a paused cluster never leaves that state on its own, so every attempt hits the same fault. The retries exhaust and the cluster is left behind, silently leaked until external cleanup reaps it.

This was observed in practice with the example_redshift system test: a cluster left in the paused phase (e.g. after an upstream task failed before resume_cluster) was never deleted by the delete_cluster teardown task, and accumulated as a stale resource.

What

Add _resume_if_paused() and call it at the start of execute(): if the cluster is paused, resume it and wait for available before issuing the delete. Clusters in any other state are unaffected (early return), and the existing busy-retry loop for transient InvalidClusterStateFault during deletion is unchanged.

Tests

test_delete_paused_cluster_resumes_first — a paused cluster is resumed, waited on (cluster_available), then deleted.
test_delete_available_cluster_does_not_resume — a non-paused cluster is deleted directly, with no spurious resume.
Existing delete-operator tests (deferrable paths, busy-retry exhaustion) unchanged and passing.

Verified locally in Breeze: all TestDeleteClusterOperator tests pass.

Generated-by: Claude Code (Opus via Claude Code) on behalf of Sean Ghaeli

A ``paused`` Redshift cluster cannot be deleted: ``delete_cluster`` raises ``InvalidClusterStateFault`` ("There is an operation running on the Cluster"), and ``RedshiftDeleteClusterOperator``'s retry loop cannot recover because a paused cluster never leaves that state on its own. The retries exhaust and the cluster is left behind -- silently leaked until external cleanup reaps it. Resume the cluster first when it is paused (and wait until it is ``available``) before issuing the delete. Clusters that are not paused are unaffected. Generated-by: Claude Code (Opus via Claude Code) on behalf of Sean Ghaeli

o-nikolas · 2026-06-24T00:22:18Z

    def execute(self, context: Context):
+        # A paused cluster cannot be deleted; resume it first (otherwise the retry loop below
+        # would exhaust against InvalidClusterStateFault and the cluster would be leaked).
+        self._resume_if_paused()


Should we do something to ensure this stays transactional? If we resume the cluster, then fail sometime between now and the deletion then the cluster is now running unexpectedly when the user thought it was paused (essentially the inverse of the situation that we find ourselves in now).

We should at least make this an opt in perhaps if we can't ensure the operation is transactional.

seanghaeli requested a review from o-nikolas as a code owner June 23, 2026 23:24

boring-cyborg Bot added area:providers provider:amazon AWS/Amazon - related issues labels Jun 23, 2026

o-nikolas reviewed Jun 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resume a paused Redshift cluster before deleting it#68922

Resume a paused Redshift cluster before deleting it#68922
seanghaeli wants to merge 1 commit into
apache:mainfrom
aws-mwaa:feature/redshift-resume-before-delete

seanghaeli commented Jun 23, 2026

Uh oh!

o-nikolas Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

seanghaeli commented Jun 23, 2026

Why

What

Tests

Uh oh!

o-nikolas Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants