Skip to content

[server] Make leader-only rebalance tasks execute sequentially#3071

Open
swuferhong wants to merge 1 commit intoapache:mainfrom
swuferhong:rebalance-leader-switch
Open

[server] Make leader-only rebalance tasks execute sequentially#3071
swuferhong wants to merge 1 commit intoapache:mainfrom
swuferhong:rebalance-leader-switch

Conversation

@swuferhong
Copy link
Copy Markdown
Contributor

Purpose

Linked issue: #3070

Previously, leader-only rebalance tasks (where only the leader changes
but the replica set stays the same) were completed immediately after
triggering the leader election, without waiting for the TabletServer
to acknowledge the change. This caused all leader migrations to fire
simultaneously, putting excessive pressure on tablet servers, especially
for KV tables.

This patch removes the immediate finishRebalanceTask() call for
leader-only changes and instead completes the task in
processNotifyLeaderAndIsrResponseReceivedEvent() when the TabletServer
responds. This ensures leader migrations execute one at a time, matching
the behavior of replica migration tasks.

Also fixes TestTabletServerGateway to return per-bucket success results
in NotifyLeaderAndIsr responses, which is necessary for the coordinator
to identify acknowledged buckets in the response callback.

Brief change log

Tests

API and Format

Documentation

@swuferhong swuferhong force-pushed the rebalance-leader-switch branch from eda0fad to e60a784 Compare April 14, 2026 07:39
@swuferhong swuferhong force-pushed the rebalance-leader-switch branch from e60a784 to 6bf8f61 Compare April 14, 2026 08:14
Copy link
Copy Markdown
Contributor

@LiebingYu LiebingYu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@swuferhong Thanks for the PR, only minor comments for me, not block the PR.

// server to acknowledge the leader change before proceeding to the next migration.
for (NotifyLeaderAndIsrResultForBucket notifyLeaderAndIsrResultForBucket :
notifyLeaderAndIsrResultForBuckets) {
tryToCompleteRebalanceTask(notifyLeaderAndIsrResultForBucket.getTableBucket());
Copy link
Copy Markdown
Contributor

@LiebingYu LiebingYu Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After switching to sequential leader transfer one bucket at a time, a new issue arises: if the leader transfer fails for any single bucket, it may block all subsequent leader transfers. However, I understand it is acceptable to not address this issue for now, but users must manually handle such blocking situations. WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants