acc: Always destroy deployed bundles on test exit in cloud runs by chrisst · Pull Request #5585 · databricks/cli

chrisst · 2026-06-12T19:45:53Z

What

Adds a harness-level guarantee that bundles deployed by acceptance tests are destroyed when the test exits, for cloud runs (CLOUD_ENV set):

runTest registers a t.Cleanup before the script starts (covering failures, require aborts, and script timeouts), capturing a clone of the exact env the script ran with;
the cleanup walks the test temp dir for .databricks/bundle/<target> state dirs and runs $CLI bundle destroy --auto-approve --target <target> per bundle root (10-minute cap per destroy);
destroy errors are logged via t.Logf only — never fail the test; double-destroy is harmless ("No active deployment found to destroy!" exits 0);
local/testserver runs: complete no-op.

Why

Acceptance scripts run under bash -e, and script.cleanup fragments are appended after the main body — they never execute when a script fails or times out between bundle deploy and bundle destroy. Against shared cloud test workspaces this leaks real resources: during the 2026-06-11/12 incident one shared GCP workspace had accumulated 100+ leaked test warehouses and dozens of leaked test-bundle-pipeline-* pipelines, exhausting the project's local-SSD quota and blocking terraform-provider CI for ~2 days (ref ES-1974228).

Cleanup output cannot pollute golden files: it runs after output comparison and goes only to the test log.

Known limitation: destroy is best-effort — a bundle deployed with required --var flags or a config corrupted mid-test may still fail to destroy; this is logged as a leak warning rather than failing the run.

Tests

go build ./..., go vet ./acceptance pass; local deploy+destroy acceptance tests (bundle/resources/sql_warehouses, bundle/resources/pipelines/recreate-keys across all engine variants) pass with no output regressions.

This pull request and its description were written by Isaac.

When acceptance tests run against real cloud workspaces (CLOUD_ENV set), a test that fails, times out, or exits mid-script never reaches its own 'bundle destroy' step: scripts run under 'bash -e' and the merged script.cleanup parts are skipped on failure. The deployed resources (SQL warehouses, pipelines, jobs, ...) then leak in the shared test workspaces. Leaked started warehouses recently exhausted a GCP quota and took CI down for two days; we observed 100+ leaked warehouses and dozens of leaked test pipelines in a single workspace. This adds a harness-level safety net: on cloud runs, runTest registers a t.Cleanup (before starting the script, so it also covers timeouts and mid-test failures) that scans the test's temp dir for bundle state directories (<bundle_root>/.databricks/bundle/<target>) and runs '$CLI bundle destroy --auto-approve --target <target>' in each bundle root, reusing the exact environment the script ran with. The mechanism is deliberately best effort and invisible to test output: - It is a no-op for local testserver runs (gated on CLOUD_ENV). - It runs after output comparison and logs only via t.Logf, so cleanup output is never compared against expected out files. - Double-destroy is harmless: 'bundle destroy' on an already-destroyed bundle exits 0 with 'No active deployment found to destroy!'. In the common success path the shared script.cleanup already removed .databricks, so nothing is even attempted. - Destroy failures are logged but never fail the test. Co-authored-by: Isaac

Use context.WithoutCancel(t.Context()) instead of context.Background() (t.Context() is already canceled when cleanups run), make the best-effort nilerr skip explicit, and trim narration comments. Co-authored-by: Isaac

github-actions · 2026-06-12T20:12:34Z

Waiting for approval

Based on git history, these people are best suited to review:

@denik -- recent work in acceptance/

Eligible reviewers: @andrewnester, @anton-107, @pietern, @renaudhartert-db, @shreyas-goenka, @simonfaltum

_{Suggestions based on git history. See OWNERS for ownership rules.}

eng-dev-ecosystem-bot · 2026-06-12T20:43:51Z

Integration test report

Commit: 9aa5344

Run: 27794839349

	Env	💚RECOVERED	🙈SKIP	✅pass	🙈skip	Time
💚	aws linux	7	13	264	1011	5:12
💚	aws windows	7	13	266	1009	7:39
💚	aws-ucws linux	7	13	360	925	5:38
💚	aws-ucws windows	7	13	362	923	8:06
💚	azure linux	1	15	267	1009	5:02
💚	azure windows	1	15	269	1007	6:44
💚	azure-ucws linux	1	15	365	921	6:11
💚	azure-ucws windows	1	15	367	919	8:05
💚	gcp linux	1	15	263	1012	5:57
💚	gcp windows	1	15	265	1010	8:08

20 interesting tests: 13 SKIP, 7 RECOVERED

	Test Name	aws linux	aws windows	aws-ucws linux	aws-ucws windows	azure linux	azure windows	azure-ucws linux	azure-ucws windows	gcp linux	gcp windows
💚	TestAccept	💚R	💚R	💚R	💚R	💚R	💚R	💚R	💚R	💚R	💚R
🙈	TestAccept/bundle/invariant/no_drift	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/permissions	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
💚	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions	💚R	💚R	💚R	💚R	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
💚	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct	💚R	💚R	💚R	💚R
💚	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform	💚R	💚R	💚R	💚R
💚	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions	💚R	💚R	💚R	💚R	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
💚	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct	💚R	💚R	💚R	💚R
💚	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform	💚R	💚R	💚R	💚R
🙈	TestAccept/bundle/resources/postgres_branches/basic	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_branches/recreate	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_branches/replace_existing	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_branches/update_protected	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_branches/without_branch_id	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_endpoints/basic	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_projects/update_display_name	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/synced_database_tables/basic	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/vector_search_indexes/recreate/embedding_dimension	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/ssh/connection	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S

Top 20 slowest tests (at least 2 minutes):

duration	env	testname
4:53	gcp linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:35	gcp windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
4:29	gcp linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
4:25	gcp windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:22	azure-ucws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:22	aws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:21	aws-ucws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:02	aws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:54	aws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:52	aws-ucws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:49	azure-ucws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:48	azure linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:47	aws-ucws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:47	azure windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:37	aws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:33	azure-ucws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:33	aws-ucws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:27	azure linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:27	azure windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:20	azure-ucws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct

shreyas-goenka

Can you expand on the motivation for this change? We do have cleaup scripts in eng-dev-ecosystem that every night cleanup all resources and file trees in our test workspaces. That automatically captures orphaned resources.

Are we seeing more resource exhausted errors or running into limits?

Tangentially related: @pietern is also looking into creating separate worksapces for DABs vs PAE to help with the test load.

shreyas-goenka · 2026-06-23T09:50:41Z

+	}
+}
+
+func destroyBundle(t *testing.T, cliPath, bundleRoot, target string, env []string) {


A lot of acceptance tests do not create a bundle. Most (all?) also clean up after themselves. It's not clear having this is a net benefit.

chrisst · 2026-06-23T10:02:04Z

Can you expand on the motivation for this change? We do have cleaup scripts in eng-dev-ecosystem that every night cleanup all resources and file trees in our test workspaces. That automatically captures orphaned resources.

Are we seeing more resource exhausted errors or running into limits?

Tangentially related: @pietern is also looking into creating separate worksapces for DABs vs PAE to help with the test load.

Resources are currently leaking so fast that they are causing daily quota failures, I've been struggling lately to get PRs merged due to quotas being hit. IMO having a nightly cleanup is nice, but an imperfect solution. If we can address the cleanup right after the test run it will stop resources from building up during the day and possibly causing failures due to quotas. Also it's going to be better from a cost management perspective. If we can clean up with a high certainty right after a test suite runs then that's the gold standard.

re splitting dabs and pae - that's nice but tangential. We still should aim to clean up resources as soon as they are irrelevant. And the nightly cleanup should serve as a good janitor.

given all that - I'm not sure the best way to do targeted cleanups, this was just an attempt to use dabs to do the cleanup if dabs was the one doing the creation. I've got some other ideas to clean up per run, but this felt like a cheap and easy way to attempt a targeted cleanup.

chrisst · 2026-06-24T08:23:52Z

I'll be handling this another layer up to account for any timeout induced resource leakages.

chrisst temporarily deployed to test-trigger-is June 12, 2026 19:46 — with GitHub Actions Inactive

acc: Address lint findings and tighten comments

2aadb70

Use context.WithoutCancel(t.Context()) instead of context.Background() (t.Context() is already canceled when cleanups run), make the best-effort nilerr skip explicit, and trim narration comments. Co-authored-by: Isaac

chrisst temporarily deployed to test-trigger-is June 12, 2026 20:08 — with GitHub Actions Inactive

chrisst requested a review from pietern June 12, 2026 20:11

chrisst marked this pull request as ready for review June 12, 2026 20:12

Merge branch 'main' into chris.stephens/fix4-deferred-teardown

ed2ccb2

chrisst temporarily deployed to test-trigger-is June 17, 2026 19:00 — with GitHub Actions Inactive

chrisst requested a review from shreyas-goenka June 18, 2026 23:06

chrisst enabled auto-merge June 18, 2026 23:07

Merge branch 'main' into chris.stephens/fix4-deferred-teardown

9aa5344

chrisst temporarily deployed to test-trigger-is June 18, 2026 23:07 — with GitHub Actions Inactive

shreyas-goenka reviewed Jun 23, 2026

View reviewed changes

chrisst closed this Jun 24, 2026

auto-merge was automatically disabled June 24, 2026 08:23
Pull request was closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

acc: Always destroy deployed bundles on test exit in cloud runs#5585

acc: Always destroy deployed bundles on test exit in cloud runs#5585
chrisst wants to merge 4 commits into
mainfrom
chris.stephens/fix4-deferred-teardown

chrisst commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

eng-dev-ecosystem-bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

shreyas-goenka left a comment

Uh oh!

shreyas-goenka Jun 23, 2026

Uh oh!

chrisst commented Jun 23, 2026

Uh oh!

chrisst commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

chrisst commented Jun 12, 2026

What

Why

Tests

Uh oh!

github-actions Bot commented Jun 12, 2026

Waiting for approval

Uh oh!

eng-dev-ecosystem-bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Integration test report

Uh oh!

shreyas-goenka left a comment

Choose a reason for hiding this comment

Uh oh!

shreyas-goenka Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

chrisst commented Jun 23, 2026

Uh oh!

chrisst commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

eng-dev-ecosystem-bot commented Jun 12, 2026 •

edited

Loading