Skip to content

feat: pause DDL during TiDB version upgrade (smooth upgrade)#6892

Draft
tennix wants to merge 1 commit into
pingcap:mainfrom
tennix:feat/smooth-upgrade-ddl-pause
Draft

feat: pause DDL during TiDB version upgrade (smooth upgrade)#6892
tennix wants to merge 1 commit into
pingcap:mainfrom
tennix:feat/smooth-upgrade-ddl-pause

Conversation

@tennix
Copy link
Copy Markdown
Member

@tennix tennix commented May 12, 2026

Summary

  • Introduces /upgrade/start and /upgrade/finish lifecycle hooks in the TiDBGroup reconciliation loop, mirroring TiUP's smooth upgrade behavior
  • TaskSmoothUpgradeStart runs before TaskUpdater to pause DDL; TaskSmoothUpgradeFinish runs after TaskStatusRevisionAndReplicas to resume DDL
  • A persistent annotation (tidb.core.pingcap.com/smooth-upgrade-phase: in-progress) tracks in-flight state across operator restarts

Behavior

The keyspace value from spec.template.spec.keyspace is passed directly to /upgrade/start.

Guards:

  • Only fires when spec.version changes (not scale in/out or config changes)
  • Both source and target versions must be ≥ v7.5.0
  • If /upgrade/start fails, rolling upgrade is blocked (retried) — no pods restart without a successful DDL pause

Note: Dedicated deployments on release-1.x (TiDB Operator v1) require a separate implementation and PR on that branch.

Changes

  • api/core/v1alpha1/tidb_types.go — annotation key/value constants
  • pkg/tidbapi/v1/types.goUpgradeRequest struct
  • pkg/tidbapi/v1/client.goUpgradeStart + UpgradeFinish on TiDBClient interface
  • pkg/compatibility/semver.goSupportsSmoothUpgrade(version) bool
  • pkg/controllers/tidbgroup/tasks/upgrade.go — new TaskSmoothUpgradeStart + TaskSmoothUpgradeFinish
  • pkg/controllers/tidbgroup/builder.go — wire new tasks into reconciliation loop

Test plan

  • make test passes (unit tests for new tasks, HTTP client methods, semver helper)
  • Version upgrade v7.5.x → v8.x: verify annotation appears before first pod restarts, removed after last pod upgraded
  • Scale-only change: verify annotation is never set
  • Upgrade from v7.4.x: verify smooth upgrade is skipped
  • Premium keyspace upgrade: verify /upgrade/start body contains keyspace name

🤖 Generated with Claude Code

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented May 12, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented May 12, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign wizardxiao for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@github-actions github-actions Bot added the v2 for operator v2 label May 12, 2026
@ti-chi-bot ti-chi-bot Bot added the size/XXL label May 12, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 12, 2026

Codecov Report

❌ Patch coverage is 76.40449% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 37.77%. Comparing base (2b81667) to head (dd778a7).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6892      +/-   ##
==========================================
+ Coverage   37.44%   37.77%   +0.33%     
==========================================
  Files         392      393       +1     
  Lines       22432    22572     +140     
==========================================
+ Hits         8399     8526     +127     
- Misses      14033    14046      +13     
Flag Coverage Δ
unittest 37.77% <76.40%> (+0.33%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Introduce /upgrade/start and /upgrade/finish lifecycle hooks in the
TiDBGroup reconciliation loop, mirroring TiUP's smooth upgrade behavior.

- Before rolling upgrade begins, call POST /upgrade/start on a healthy
  TiDB instance to pause DDL (global for Dedicated, keyspace-scoped for
  Premium / TiDB Worker).
- After all instances reach the new version, call POST /upgrade/finish
  to resume DDL.
- Guards: only fires when spec.version changes AND both source and target
  versions support smooth upgrade (>= v7.5.0); no-op for scale/config
  changes.
- Annotation tidb.core.pingcap.com/smooth-upgrade-phase tracks in-flight
  state across operator restarts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@tennix tennix force-pushed the feat/smooth-upgrade-ddl-pause branch from 8229389 to dd778a7 Compare May 12, 2026 23:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants