Flink: Fix monitor source rate limit for sub-second intervals#16979
Merged
Conversation
The monitor source rate was computed as perSecond(1.0 / rateLimit.getSeconds()). Duration.getSeconds() rounds any sub-second rate limit down to 0, so the rate became perSecond(Infinity), which turns off rate limiting. This change computes the rate from millis, so sub-second intervals are limited correctly. The 60s default is unchanged. This made the maintenance E2E tests flaky, since they use sub-second rate limits. With no rate limit the source calls table.refresh() in a loop and uses a full CPU core per job. CI runs the tests with -DtestParallelism=auto, which starts one MiniCluster per fork, so these busy sources use up all the cores. The converter runs on a timer, and under that load the timer fires too slowly, letting the test fail after the timeout hits.
Contributor
Author
Guosmilesmile
approved these changes
Jun 28, 2026
Contributor
|
Thanks @mxm for the PR! Thanks @Guosmilesmile for the review! |
Contributor
Author
|
Thanks @huaxiangsun @Guosmilesmile for reviewing! Thank you for merging @huaxiangsun! Backport is here: #16992 |
singhpk234
pushed a commit
that referenced
this pull request
Jun 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The monitor source rate was computed as
perSecond(1.0 / rateLimit.getSeconds()).Duration.getSeconds()rounds any sub-second rate limit down to 0, so the rate became perSecond(Infinity), which turns off rate limiting. This change computes the rate from millis, so sub-second intervals are limited correctly. The 60s default is unchanged.This made an equality conversion maintenance E2E test flaky (see #16969 (comment)), since they use sub-second rate limits. With no rate limit the source calls table.refresh() in a loop and uses a full CPU core per job. CI runs the tests with
-DtestParallelism=auto, which starts one MiniCluster per fork, so these busy sources use up all the cores. The converter runs on a timer, and under that load the timer fires too slowly, letting the test fail after the timeout hits.We have seen this issue across all Flink versions, so this will have to be backported to 1.20 and 2.0 as well.