Skip to content

Flink: Fix monitor source rate limit for sub-second intervals#16979

Merged
huaxingao merged 1 commit into
apache:mainfrom
mxm:flink-monitor-rate-limit-millis
Jun 28, 2026
Merged

Flink: Fix monitor source rate limit for sub-second intervals#16979
huaxingao merged 1 commit into
apache:mainfrom
mxm:flink-monitor-rate-limit-millis

Conversation

@mxm

@mxm mxm commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

The monitor source rate was computed as perSecond(1.0 / rateLimit.getSeconds()). Duration.getSeconds() rounds any sub-second rate limit down to 0, so the rate became perSecond(Infinity), which turns off rate limiting. This change computes the rate from millis, so sub-second intervals are limited correctly. The 60s default is unchanged.

This made an equality conversion maintenance E2E test flaky (see #16969 (comment)), since they use sub-second rate limits. With no rate limit the source calls table.refresh() in a loop and uses a full CPU core per job. CI runs the tests with -DtestParallelism=auto, which starts one MiniCluster per fork, so these busy sources use up all the cores. The converter runs on a timer, and under that load the timer fires too slowly, letting the test fail after the timeout hits.

We have seen this issue across all Flink versions, so this will have to be backported to 1.20 and 2.0 as well.

The monitor source rate was computed as perSecond(1.0 / rateLimit.getSeconds()).
Duration.getSeconds() rounds any sub-second rate limit down to 0, so the rate
became perSecond(Infinity), which turns off rate limiting. This change computes
the rate from millis, so sub-second intervals are limited correctly. The 60s
default is unchanged.

This made the maintenance E2E tests flaky, since they use sub-second rate
limits.  With no rate limit the source calls table.refresh() in a loop and uses
a full CPU core per job. CI runs the tests with -DtestParallelism=auto, which
starts one MiniCluster per fork, so these busy sources use up all the cores. The
converter runs on a timer, and under that load the timer fires too slowly,
letting the test fail after the timeout hits.
@mxm

mxm commented Jun 27, 2026

Copy link
Copy Markdown
Contributor Author

CC @huaxiangsun @pvary

@huaxingao huaxingao left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@huaxingao huaxingao merged commit 9ca2a4b into apache:main Jun 28, 2026
36 of 37 checks passed
@huaxingao

Copy link
Copy Markdown
Contributor

Thanks @mxm for the PR! Thanks @Guosmilesmile for the review!

@mxm

mxm commented Jun 28, 2026

Copy link
Copy Markdown
Contributor Author

Thanks @huaxiangsun @Guosmilesmile for reviewing! Thank you for merging @huaxiangsun!

Backport is here: #16992

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants