[fix][test] Fix flaky BacklogQuotaManagerTest.createNamespaces (HTTP 409 cascade)#25680
Open
lhotari wants to merge 1 commit intoapache:masterfrom
Open
[fix][test] Fix flaky BacklogQuotaManagerTest.createNamespaces (HTTP 409 cascade)#25680lhotari wants to merge 1 commit intoapache:masterfrom
lhotari wants to merge 1 commit intoapache:masterfrom
Conversation
…409 cascade)
Two `BacklogQuotaManagerTest` cases consistently fail together: an
`@AfterMethod` `clearNamespaces` times out at 10s while force-deleting
`prop/ns-quota`, leaving the namespace in metadata; the next test's
`@BeforeMethod` `createNamespaces` then hits HTTP 409 ("Namespace already
exists") and the cascade continues for every subsequent test.
Root cause: PR apache#25624 added two pending-acks tests that leave Key_Shared
consumers with unacked messages on `prop/ns-quota` and an active
backlog-quota check loop running every 2s. Force-deleting that state can
exceed the 10s default `Awaitility` budget used by the shared
`MockedPulsarServiceBaseTest.deleteNamespaceWithRetry`. The afterMethod
times out, leftover metadata remains, and the next beforeMethod hits 409.
Fix:
1. `MockedPulsarServiceBaseTest.deleteNamespaceWithRetry` now uses
`atMost(20, SECONDS)` instead of relying on the Awaitility 10s
default. 2× headroom is enough for the heaviest cleanups in this
class while staying conservative for the many tests that share this
helper.
2. `BacklogQuotaManagerTest.createNamespaces` is split into a helper
that catches `ConflictException` and recovers by force-deleting the
leftover and recreating, so a slow prior cleanup cannot cascade as a
409 — defense in depth on top of (1).
Verified locally with `@Test(invocationCount = 10)` on the two new
pending-acks tests: 20/20 passes, all `clearNamespaces` cycles within
the new 20s budget.
60c77ed to
6f792e5
Compare
merlimat
approved these changes
May 5, 2026
dao-jun
approved these changes
May 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
When CI runs
BacklogQuotaManagerTest, two cases fail together as a cascade:BacklogQuotaManagerTest.clearNamespaces(the@AfterMethod) fails withorg.awaitility.core.ConditionTimeoutException: Condition ... was not fulfilled within 10 seconds.while force-deletingprop/ns-quota.@BeforeMethodcreateNamespacesthen fails withPulsarAdminException$ConflictException: Namespace already exists(HTTP 409), because the prior cleanup never finished — and the same 409 keeps cascading for every subsequent test.Example: https://github.com/apache/pulsar/actions/runs/25387367872/job/74455105884 (PR #25676 attempt 1; the failure is unrelated to that PR's change).
Root cause: PR #25624 added
testConsumerBacklogEvictionSizeQuotaCleansPendingAcksandtestConsumerBacklogEvictionTimeQuotaNotPreciseCleansPendingAcks, which leave Key_Shared consumers with unacked messages onprop/ns-quotaand an active backlog-quota check loop running every 2s. Force-deleting that state can exceed the 10s defaultAwaitilitybudget used byMockedPulsarServiceBaseTest.deleteNamespaceWithRetry. The afterMethod times out, leftover metadata remains, and the next beforeMethod hits 409.Modifications
MockedPulsarServiceBaseTest.deleteNamespaceWithRetrynow setsatMost(20, TimeUnit.SECONDS)instead of relying on Awaitility's 10s default. 2× headroom is enough for the heaviest cleanups while staying conservative for the many tests that share this helper.BacklogQuotaManagerTest.createNamespacesis split: a newcreateNamespaceForTest(String)helper catchesPulsarAdminException.ConflictExceptionand recovers by force-deleting the leftover and recreating. Defense in depth — even if cleanup ever exceeds the 20s budget, the next test starts with clean state instead of cascading 409s.Verifying this change
This change is already covered by existing tests, such as
BacklogQuotaManagerTest.testConsumerBacklogEvictionSizeQuotaCleansPendingAcksandBacklogQuotaManagerTest.testConsumerBacklogEvictionTimeQuotaNotPreciseCleansPendingAcks(the two heavy tests that exposed the cascade). Verified locally with@Test(invocationCount = 10)on both: 20/20 passes, allclearNamespacescycles within the new 20s budget.Does this pull request potentially affect one of the following parts: