Spark: Stop streaming queries before dropping table in streaming read test teardown by huan233usc · Pull Request #16976 · apache/iceberg

huan233usc · 2026-06-26T20:40:30Z

Problem

TestStructuredStreamingRead3 is the largest single class in the Spark CI wall-time report (~1000s). Profiling shows the cost is not in the streaming reads (each processAllAvailable is <1s) — it's in teardown.

The class sets STREAMING_SNAPSHOT_POLLING_INTERVAL_MS=1 for the async=true parameter, so AsyncSparkMicroBatchPlanner's background thread refreshes the table from the catalog ~1000×/second. The class also has two @AfterEach methods — stopStreams() and removeTables() — whose relative order is not guaranteed. When DROP TABLE runs while the planner's background thread is still alive, that flood of catalog refreshes contends with the drop and stalls teardown by ~20s per async test execution.

Measured (instrumenting teardown):

DROP TABLE, async=true  : 20142 ms
DROP TABLE, async=false :     9 ms

Change

Stop active streams before dropping the table in removeTables(), so the background refresh thread is gone before the drop. One-line behavioral change; stopStreams() is unchanged and still runs as its own @AfterEach.

Result

Full-class TestStructuredStreamingRead3 on spark v3.5: ~305s → ~188s (66 tests, 0 failures). Applied identically to v3.5, v4.0 and v4.1 (v4.0/v4.1 smoke-tested green).

Note: the underlying 1ms polling interval refreshing the catalog regardless of need is a planner-side smell worth a separate look; this PR just makes the test teardown robust to it.

… test teardown TestStructuredStreamingRead3 sets STREAMING_SNAPSHOT_POLLING_INTERVAL_MS=1 for the async parameter, so AsyncSparkMicroBatchPlanner's background thread refreshes the table from the catalog ~1000x/second. The class has two @AfterEach methods, stopStreams() and removeTables(), whose relative order is not guaranteed. When DROP TABLE runs while the planner thread is still alive, the flood of catalog refreshes contends with the drop and stalls teardown for ~20s per async test execution. Stop active streams before dropping the table in removeTables() so the background refresh thread is gone before the drop. The streaming reads themselves were never the bottleneck (each completes in <1s). Full-class TestStructuredStreamingRead3 on spark v3.5 drops from ~305s to ~188s (66 tests, still green). Applied to v3.5, v4.0 and v4.1.

github-actions Bot added the spark label Jun 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Spark: Stop streaming queries before dropping table in streaming read test teardown#16976

Spark: Stop streaming queries before dropping table in streaming read test teardown#16976
huan233usc wants to merge 1 commit into
apache:mainfrom
huan233usc:spark-streaming-test-polling-storm

huan233usc commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

huan233usc commented Jun 26, 2026

Problem

Change

Result

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant