Skip to content

Spark: Stop streaming queries before dropping table in streaming read test teardown#16976

Draft
huan233usc wants to merge 1 commit into
apache:mainfrom
huan233usc:spark-streaming-test-polling-storm
Draft

Spark: Stop streaming queries before dropping table in streaming read test teardown#16976
huan233usc wants to merge 1 commit into
apache:mainfrom
huan233usc:spark-streaming-test-polling-storm

Conversation

@huan233usc

Copy link
Copy Markdown
Contributor

Problem

TestStructuredStreamingRead3 is the largest single class in the Spark CI wall-time report (~1000s). Profiling shows the cost is not in the streaming reads (each processAllAvailable is <1s) — it's in teardown.

The class sets STREAMING_SNAPSHOT_POLLING_INTERVAL_MS=1 for the async=true parameter, so AsyncSparkMicroBatchPlanner's background thread refreshes the table from the catalog ~1000×/second. The class also has two @AfterEach methods — stopStreams() and removeTables() — whose relative order is not guaranteed. When DROP TABLE runs while the planner's background thread is still alive, that flood of catalog refreshes contends with the drop and stalls teardown by ~20s per async test execution.

Measured (instrumenting teardown):

DROP TABLE, async=true  : 20142 ms
DROP TABLE, async=false :     9 ms

Change

Stop active streams before dropping the table in removeTables(), so the background refresh thread is gone before the drop. One-line behavioral change; stopStreams() is unchanged and still runs as its own @AfterEach.

Result

Full-class TestStructuredStreamingRead3 on spark v3.5: ~305s → ~188s (66 tests, 0 failures). Applied identically to v3.5, v4.0 and v4.1 (v4.0/v4.1 smoke-tested green).

Note: the underlying 1ms polling interval refreshing the catalog regardless of need is a planner-side smell worth a separate look; this PR just makes the test teardown robust to it.

… test teardown

TestStructuredStreamingRead3 sets STREAMING_SNAPSHOT_POLLING_INTERVAL_MS=1
for the async parameter, so AsyncSparkMicroBatchPlanner's background thread
refreshes the table from the catalog ~1000x/second. The class has two
@AfterEach methods, stopStreams() and removeTables(), whose relative order
is not guaranteed. When DROP TABLE runs while the planner thread is still
alive, the flood of catalog refreshes contends with the drop and stalls
teardown for ~20s per async test execution.

Stop active streams before dropping the table in removeTables() so the
background refresh thread is gone before the drop. The streaming reads
themselves were never the bottleneck (each completes in <1s).

Full-class TestStructuredStreamingRead3 on spark v3.5 drops from ~305s to
~188s (66 tests, still green). Applied to v3.5, v4.0 and v4.1.
@github-actions github-actions Bot added the spark label Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant