Spark: Stop streaming queries before dropping table in streaming read test teardown#16976
Draft
huan233usc wants to merge 1 commit into
Draft
Spark: Stop streaming queries before dropping table in streaming read test teardown#16976huan233usc wants to merge 1 commit into
huan233usc wants to merge 1 commit into
Conversation
… test teardown TestStructuredStreamingRead3 sets STREAMING_SNAPSHOT_POLLING_INTERVAL_MS=1 for the async parameter, so AsyncSparkMicroBatchPlanner's background thread refreshes the table from the catalog ~1000x/second. The class has two @AfterEach methods, stopStreams() and removeTables(), whose relative order is not guaranteed. When DROP TABLE runs while the planner thread is still alive, the flood of catalog refreshes contends with the drop and stalls teardown for ~20s per async test execution. Stop active streams before dropping the table in removeTables() so the background refresh thread is gone before the drop. The streaming reads themselves were never the bottleneck (each completes in <1s). Full-class TestStructuredStreamingRead3 on spark v3.5 drops from ~305s to ~188s (66 tests, still green). Applied to v3.5, v4.0 and v4.1.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
TestStructuredStreamingRead3is the largest single class in the Spark CI wall-time report (~1000s). Profiling shows the cost is not in the streaming reads (eachprocessAllAvailableis <1s) — it's in teardown.The class sets
STREAMING_SNAPSHOT_POLLING_INTERVAL_MS=1for theasync=trueparameter, soAsyncSparkMicroBatchPlanner's background thread refreshes the table from the catalog ~1000×/second. The class also has two@AfterEachmethods —stopStreams()andremoveTables()— whose relative order is not guaranteed. WhenDROP TABLEruns while the planner's background thread is still alive, that flood of catalog refreshes contends with the drop and stalls teardown by ~20s per async test execution.Measured (instrumenting teardown):
Change
Stop active streams before dropping the table in
removeTables(), so the background refresh thread is gone before the drop. One-line behavioral change;stopStreams()is unchanged and still runs as its own@AfterEach.Result
Full-class
TestStructuredStreamingRead3on spark v3.5: ~305s → ~188s (66 tests, 0 failures). Applied identically to v3.5, v4.0 and v4.1 (v4.0/v4.1 smoke-tested green).