Skip to content

[fix][test] Fix flaky BacklogQuotaManagerTest.createNamespaces (HTTP 409 cascade)#25680

Open
lhotari wants to merge 1 commit intoapache:masterfrom
lhotari:lh-fix-flaky-BacklogQuotaManagerTest
Open

[fix][test] Fix flaky BacklogQuotaManagerTest.createNamespaces (HTTP 409 cascade)#25680
lhotari wants to merge 1 commit intoapache:masterfrom
lhotari:lh-fix-flaky-BacklogQuotaManagerTest

Conversation

@lhotari
Copy link
Copy Markdown
Member

@lhotari lhotari commented May 5, 2026

Motivation

When CI runs BacklogQuotaManagerTest, two cases fail together as a cascade:

  1. BacklogQuotaManagerTest.clearNamespaces (the @AfterMethod) fails with org.awaitility.core.ConditionTimeoutException: Condition ... was not fulfilled within 10 seconds. while force-deleting prop/ns-quota.
  2. The next test method's @BeforeMethod createNamespaces then fails with PulsarAdminException$ConflictException: Namespace already exists (HTTP 409), because the prior cleanup never finished — and the same 409 keeps cascading for every subsequent test.

Example: https://github.com/apache/pulsar/actions/runs/25387367872/job/74455105884 (PR #25676 attempt 1; the failure is unrelated to that PR's change).

Root cause: PR #25624 added testConsumerBacklogEvictionSizeQuotaCleansPendingAcks and testConsumerBacklogEvictionTimeQuotaNotPreciseCleansPendingAcks, which leave Key_Shared consumers with unacked messages on prop/ns-quota and an active backlog-quota check loop running every 2s. Force-deleting that state can exceed the 10s default Awaitility budget used by MockedPulsarServiceBaseTest.deleteNamespaceWithRetry. The afterMethod times out, leftover metadata remains, and the next beforeMethod hits 409.

Modifications

  1. MockedPulsarServiceBaseTest.deleteNamespaceWithRetry now sets atMost(20, TimeUnit.SECONDS) instead of relying on Awaitility's 10s default. 2× headroom is enough for the heaviest cleanups while staying conservative for the many tests that share this helper.
  2. BacklogQuotaManagerTest.createNamespaces is split: a new createNamespaceForTest(String) helper catches PulsarAdminException.ConflictException and recovers by force-deleting the leftover and recreating. Defense in depth — even if cleanup ever exceeds the 20s budget, the next test starts with clean state instead of cascading 409s.

Verifying this change

  • Make sure that the change passes the CI checks.

This change is already covered by existing tests, such as BacklogQuotaManagerTest.testConsumerBacklogEvictionSizeQuotaCleansPendingAcks and BacklogQuotaManagerTest.testConsumerBacklogEvictionTimeQuotaNotPreciseCleansPendingAcks (the two heavy tests that exposed the cascade). Verified locally with @Test(invocationCount = 10) on both: 20/20 passes, all clearNamespaces cycles within the new 20s budget.

Does this pull request potentially affect one of the following parts:

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

…409 cascade)

Two `BacklogQuotaManagerTest` cases consistently fail together: an
`@AfterMethod` `clearNamespaces` times out at 10s while force-deleting
`prop/ns-quota`, leaving the namespace in metadata; the next test's
`@BeforeMethod` `createNamespaces` then hits HTTP 409 ("Namespace already
exists") and the cascade continues for every subsequent test.

Root cause: PR apache#25624 added two pending-acks tests that leave Key_Shared
consumers with unacked messages on `prop/ns-quota` and an active
backlog-quota check loop running every 2s. Force-deleting that state can
exceed the 10s default `Awaitility` budget used by the shared
`MockedPulsarServiceBaseTest.deleteNamespaceWithRetry`. The afterMethod
times out, leftover metadata remains, and the next beforeMethod hits 409.

Fix:
1. `MockedPulsarServiceBaseTest.deleteNamespaceWithRetry` now uses
   `atMost(20, SECONDS)` instead of relying on the Awaitility 10s
   default. 2× headroom is enough for the heaviest cleanups in this
   class while staying conservative for the many tests that share this
   helper.
2. `BacklogQuotaManagerTest.createNamespaces` is split into a helper
   that catches `ConflictException` and recovers by force-deleting the
   leftover and recreating, so a slow prior cleanup cannot cascade as a
   409 — defense in depth on top of (1).

Verified locally with `@Test(invocationCount = 10)` on the two new
pending-acks tests: 20/20 passes, all `clearNamespaces` cycles within
the new 20s budget.
@lhotari lhotari force-pushed the lh-fix-flaky-BacklogQuotaManagerTest branch from 60c77ed to 6f792e5 Compare May 5, 2026 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants