Skip to content

[aws2] Fix flaky SqsIOWriteBatchesTest timeout tests (#38946)#38971

Open
tkaymak wants to merge 1 commit into
apache:masterfrom
tkaymak:sqs-writebatches-flake-fix
Open

[aws2] Fix flaky SqsIOWriteBatchesTest timeout tests (#38946)#38971
tkaymak wants to merge 1 commit into
apache:masterfrom
tkaymak:sqs-writebatches-flake-fix

Conversation

@tkaymak

@tkaymak tkaymak commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

What

Fixes the flaky SqsIOWriteBatchesTest (#38946).

Why

The four timeout-related tests:

  • testWriteBatchesWithTimeout
  • testWriteBatchesWithStrictTimeout
  • testWriteBatchesToDynamicWithTimeout
  • testWriteBatchesToDynamicWithStrictTimeout

asserted the exact grouping of messages into SendMessageBatch calls.
Those groupings depend on wall-clock timing, the per-message Thread.sleep delay racing the configured withBatchTimeout, so on loaded CI runners the batches form differently and the strict verify(sqs).sendMessageBatch(request(exact entries)) checks fail with Mockito ArgumentsAreDifferent.

How

Replace the exact-grouping assertions with timing-independent invariants:

  • every expected message body is sent exactly once (captured via ArgumentCaptor, grouped by queue),
  • no batch exceeds the size implied by the timeout cadence,
  • at least the minimum number of batches is produced (verify(sqs, atLeast(n))).

This still exercises timeout-driven flushing (both the synchronous on-append path and the strict separate-thread variant) without depending on exact wall-clock behavior. The non-timeout tests are unchanged.

Testing

Ran the class 6 times under deliberate 4-core CPU saturation (load avg ~12) with no failures; previously the timeout tests would flake under load.

Fixes #38946

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses flaky tests in SqsIOWriteBatchesTest caused by reliance on wall-clock timing for batch grouping assertions. By shifting from strict order-based verification to invariant-based verification, the tests are now resilient to varying execution speeds on loaded CI runners while still effectively validating the timeout-driven flushing logic.

Highlights

  • Test Stability: Replaced brittle, timing-dependent Mockito assertions in SqsIOWriteBatchesTest with robust, timing-independent invariants to prevent flakiness in CI environments.
  • Assertion Logic: Introduced helper methods captureBatchRequests and assertMessageBodies to verify that all expected messages are sent exactly once and that batch sizes remain within expected limits, regardless of exact execution timing.
New Features

🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors several tests in SqsIOWriteBatchesTest.java to use timing-independent invariants instead of strict timing assertions, which helps prevent test flakiness on loaded machines. The feedback focuses on resolving potential Checker Framework nullness warnings by properly wrapping nullable objects (such as the results of Map.get and the @Nullable parameter in assertMessageBodies) with checkNotNull before dereferencing or passing them to assertions.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

The four timeout-related tests asserted the exact grouping of messages into
SendMessageBatch calls. Those groupings depend on wall-clock timing (the
per-message Thread.sleep delay racing the configured batch timeout), so on
loaded CI runners batches form differently and the strict
verify(...).sendMessageBatch(request(exact entries)) checks fail with
Mockito ArgumentsAreDifferent.

Rewrite the assertions to verify timing-independent invariants instead:
all expected message bodies are sent exactly once, no batch exceeds the
size implied by the timeout cadence, and at least the minimum number of
batches is produced. This still exercises the timeout-driven flushing
(both synchronous and the strict separate-thread variant) without
depending on exact wall-clock behavior.

Fixes apache#38946
@tkaymak tkaymak force-pushed the sqs-writebatches-flake-fix branch from 456be77 to 62a4c6d Compare June 15, 2026 19:47
@github-actions

Copy link
Copy Markdown
Contributor

Assigning reviewers:

R: @Abacn for label java.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Failing Test]: SqsIOWriteBatchesTest is flaky (Mockito ArgumentsAreDifferent in timeout-related tests)

1 participant