Move-tables. Multi-table by zacharysierakowski · Pull Request #1726 · github/gh-ost

zacharysierakowski · 2026-06-23T15:52:51Z

re: https://github.com/github/database-infrastructure/issues/8205

Multi-table `--move-tables`

Extends feature-move-tables to move N tables in one run (--move-tables=a,b,c): one binlog stream, per-table interleaved row copy, a single per-run crash-safe checkpoint, and one atomic multi-table RENAME at cutover. A single table is just N=1 — no "representative" table, no index-0 special-casing.

Key decisions: per-table state containers (MoveTable) instead of a representative table; run-wide artifacts named from a stable hash of the table set (_gho_<token>_ghk); one checkpoint table with a hex-serialized per-table key range; GTID-based drain + atomic RENAME TABLE t1 TO …, t2 TO …. Single-table-only paths fail fast (panic/error) in move-tables mode.

Read in this order:

go/base/context.go — MoveTable + InitMoveTableContainers, OrderedMoveTables, MoveTablesRunToken, checkpoint/_del naming (start here)
applier.go — per-table query builders + buildDMLEventQuery routing; target-table create/validate
migrator.go: iterateChunksMoveTables — interleaved per-table copy
checkpoint.go + applier Write/ReadMoveTableCheckpoints + migrator resume — checkpoint redesign
migrator.go: moveTablesCutOver / drainMoveTablesCutOver / resume…FromCheckpoint — atomic cutover
The cross-cutting guards (context/inspect/applier/migrator) — the "no representative table" invariant
hooks / server / throttler — multi-table awareness
tests (localtests/move-tables/*, *_test.go) + script/move-tables/*

Validated: unit (checkpoint serialize/run-token), integration (1/2/3-table idle + DML, resume-after-crash ×3, atomic-cutover), and manual e2e on 2 clusters / 3–4 tables with live writes.

…sh of ordered table names for consistent identifier

…r things like "GetTargetTableName" in move-tables mode

…tance checks

zacharysierakowski · 2026-06-25T17:04:01Z

 // or a given table name
 func (mctx *MigrationContext) GetGhostTableName() string {
+	if mctx.IsMoveTablesMode() {
+		panic("GetGhostTableName() must not be called in move-tables mode; there is no ghost table (the target keeps each migrated table's name)")


I chose to add panics to function calls that I felt were dangerous enough if they accidentally were called by a codepath from --move-tables. Happy to discuss alternatives, or ditching the panics altogether, but they gave me confidence while making the changes across this PR.

One thought - we could swap the raw panics for failpoint panics and only enable them for integration tests if we wanted to get fancier while removing intentional panics from the prod binary. But I don't know if that gains us that much.

zacharysierakowski · 2026-06-25T17:07:20Z

+// It is used to name run-wide singular artifacts (checkpoint table, applier
+// advisory lock, serve socket) so they never depend on any single migrated
+// table name. Returns "" outside move-tables mode.
+func (mctx *MigrationContext) MoveTablesRunToken() string {


This is the func that determines the unique identifier per run based on the table names. More details in the comment above but just wanted to flag it directly with a comment.

A non-start alternative could have been concating all the table names, but that would make for a really long identifier in most cases.

Another alternative could have been to require ForceTmpTableName in --move-tables mode, so that the operator chooses what the identifier for the table is. We'd potentially need to extend that a bit for other callsites like the server sock file, etc.

zacharysierakowski · 2026-06-25T17:09:24Z

@@ -809,12 +959,6 @@ func (apl *Applier) CreateCheckpointTable() error {
 		"`gh_ost_dml_applied` bigint",
 		"`gh_ost_is_cutover` tinyint(1) DEFAULT '0'",
 	}
-	if apl.migrationContext.IsMoveTablesMode() {
-		colDefs = append(colDefs,
-			"`gh_ost_move_tables_cutover_started` tinyint(1) DEFAULT '0'",
-			"`gh_ost_move_tables_drain_gtid` text charset ascii",
-		)
-	}
 	for _, col := range apl.migrationContext.UniqueKey.Columns.Columns() {


we had originally made these conditional changes to CreateCheckpointTable in our feature branch for single move. These removals just get it back to baseline (main) since we have our own CreateCheckpointTable variant for move-tables now createMoveTablesCheckpointTable

zacharysierakowski · 2026-06-25T17:11:18Z

all the funky hex serialization stuff lives in this diff.

alternative could have been to create a checkpoint table per table, but there is shared state across the checkpoints that's valuable to keep in a single table. This way, on crash-resume when mid cutover, we can easily just query the last checkpoint row.

zacharysierakowski · 2026-06-25T17:11:40Z

no major changes here, just some guards against function calls we don't want called in move-tables mode (panic)

zacharysierakowski · 2026-06-25T17:14:18Z

+	var replicationLagQuery string
+	if !thlr.migrationContext.IsMoveTablesMode() {
+		replicationLagQuery = fmt.Sprintf(`
+			select value from %s.%s where hint = 'heartbeat' and id <= 255
+			`,
+			sql.EscapeName(thlr.migrationContext.DatabaseName),
+			sql.EscapeName(thlr.migrationContext.GetChangelogTableName()),
+		)
+	}


These lines just build a query that's only run in NOT move tables mode (below). However, building the query here calls GetChangelogTableName which is now guarded with the panic. So this just wraps the query building in the move tables check

zacharysierakowski · 2026-06-25T17:15:03Z

@chriskirkland and I talked about making some changes to the existing integration tests for the crash/resume so that they cover multi-table as well.

Copilot

Pull request overview

Extends --move-tables from single-table to true multi-table operation (one run handling N tables) by introducing per-table runtime containers, run-wide artifact naming via a stable run token, per-table interleaved row copy + DML routing, redesigned checkpointing, and an atomic multi-table cutover RENAME TABLE.

Changes:

Introduces per-table MoveTable containers and run-token-based naming for run-wide artifacts (checkpoint table, socket, lock identity).
Refactors migrator/applier/inspector to inspect/copy/apply/checkpoint multiple tables (interleaved copy, table-routed DML, multi-row checkpoint model, atomic multi-table cutover).
Adds/updates localtest fixtures and the script/move-tables manual harness to seed/reset/drive workload across 1–3 canonical tables, plus new atomic-cutover and resume validation scenarios.

Show a summary per file

File	Description
script/move-tables/setup	Switches manual harness seeding to the canonical 3-table fixture.
script/move-tables/reset	Resets all canonical tables and drops run-token checkpoint tables on target.
script/move-tables/README.md	Updates manual workflow docs for 1/2/3-table runs, reset, and CI scenarios.
script/move-tables/insert-source-primary-loop	Drives DML across whichever canonical fixtures exist (1–3 tables).
localtests/move-tables/resume-panic-on-row-copy/test.sh	Finds run-token checkpoint table dynamically for resume validation.
localtests/move-tables/resume-panic-before-drain-complete/test.sh	Finds run-token checkpoint table dynamically for drain/cutover resume validation.
localtests/move-tables/multiple-two/tables.txt	Adds 2-table move-tables fixture table list.
localtests/move-tables/multiple-two/create.sql	Adds 2-table heterogeneous schema fixture.
localtests/move-tables/multiple-three/tables.txt	Adds 3-table move-tables fixture table list.
localtests/move-tables/multiple-three/create.sql	Adds canonical 3-table heterogeneous schema fixture.
localtests/move-tables/multiple-three-concurrent-writes/tables.txt	Adds 3-table concurrent-writes fixture table list.
localtests/move-tables/multiple-three-concurrent-writes/on_test.sh	Starts DML loop during 3-table copy to validate DML capture.
localtests/move-tables/multiple-three-concurrent-writes/create.sql	Adds canonical 3-table concurrent-writes SQL fixture.
localtests/move-tables/atomic-multi-table-cutover/test.sh	New deterministic atomic-cutover workload + invariant validation.
localtests/move-tables/atomic-multi-table-cutover/tables.txt	Adds 2-table list for atomic-cutover test fixture.
localtests/move-tables/atomic-multi-table-cutover/create.sql	Adds paired-schema fixture for atomic-cutover validation.
go/logic/throttler.go	Avoids changelog-heartbeat lag query in move-tables mode; uses replica status lag.
go/logic/server.go	Updates interactive command table-arg matching/messages for multi-table migrations.
go/logic/migrator.go	Implements multi-table inspection, interleaved copy loop, resume logic, atomic multi-table cutover rename, and cleanup.
go/logic/migrator_move_tables_cutover_test.go	Adapts cutover tests for multi-table token/table-list expectations and cleanup API rename.
go/logic/migrator_move_tables_cleanup_test.go	Updates cleanup tests for run-token checkpoint naming and multi-table drop method.
go/logic/inspect.go	Adds move-tables-safe per-table validation/inspection/counting and removes representative-table calls.
go/logic/hooks.go	Exposes multi-table context via hook env vars (comma-joined tables + per-table rollback handles).
go/logic/checkpoint.go	Adds per-table checkpoint row support and serialized heterogeneous key-range encoding helpers.
go/logic/checkpoint_test.go	Unit tests for range serialization/deserialization and empty-range detection.
go/logic/applier.go	Adds per-table query builders, move-tables checkpoint table schema + read/write, and per-table DML routing.
go/logic/applier_test.go	Updates/extends tests for move-tables checkpointing, copy, and DML routing via per-table containers.
go/cmd/gh-ost/main.go	Allows multi-table `--move-tables`, rejects empty/duplicate names, and uses run-token socket naming.
go/base/context.go	Adds `MoveTable` container model, deterministic ordered iteration, run-token generation, and move-tables-specific naming helpers.
go/base/context_test.go	Tests run-token determinism/order-independence and `MoveTableDelName` behavior.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 30/30 changed files
Comments generated: 4

+	now := time.Now().Unix()
+	for _, chk := range rows {
+		coordStr := ""
+		if chk.LastTrxCoords != nil {
+			coordStr = chk.LastTrxCoords.String()
+		}
+		args := sqlutils.Args(
+			now,
+			chk.TableName,
+			coordStr,
+			chk.Iteration,
+			chk.RowsCopied,
+			chk.DMLApplied,
+			chk.IsCutover,
+			chk.MoveTablesCutOverStarted,
+			apl.checkpointDrainGTIDString(chk),
+			serializeRangeValues(chk.IterationRangeMin),
+			serializeRangeValues(chk.IterationRangeMax),
+		)
+		if _, err := apl.checkpointDB().Exec(query, args...); err != nil {
+			return err
+		}
+	}
+	return nil


 	if mgtr.sourcePrimaryDB == nil {
 		return errors.New("source primary connection not initialized; cannot drop source __del table")
 	}


+func (isp *Inspector) selectUniqueKey(tableName string, candidateKeys []*sql.UniqueKey) *sql.UniqueKey {
 	for i, candidateKey := range candidateKeys {
-		isp.applyColumnTypes(isp.migrationContext.DatabaseName, isp.originalTableName(), &candidateKey.Columns)
+		isp.applyColumnTypes(isp.migrationContext.DatabaseName, tableName, &candidateKey.Columns)
 		uniqueKeyIsValid := true
 		for _, column := range candidateKey.Columns.Columns() {


chriskirkland

Looks great! No major concerns (for anything introduced here). Just some questions/comments that are non-blocking

chriskirkland · 2026-06-25T22:02:19Z

+	// One advisory lock per run. In move-tables mode it is keyed on the
+	// set-derived run token (not any single table) so two processes moving the
+	// same set of tables collide, while a single-table run keeps its table-keyed


// One advisory lock per run. In move-tables mode it is keyed on the // set-derived run token (not any single table) so two processes moving the // same set of tables collide, while a single-table run keeps its table-keyed

On it's face, I think this implies that intersecting move-table sets could be migrated at the same time. I think the CREATEs for the overlapping tables on the target would break for all but the first run (assuming they use the same database/schema). Not sure if this is a realizable concern at present... is there anything else preventing that kind of conflict?

Entirely possible this is covered by one of the design docs.

Great callout! I should have mentioned this as an implication to the choice here.

But yea at the very start it checks each table to ensure they aren't created on the target. I felt like that was enough to protect us.

Simple alt would be to create multiple locks if we want to be extra safe?

Yeah, I think we don't have to overcomplicate. Mostly wanted to make sure I understand the protections here.

chriskirkland · 2026-06-26T02:13:48Z

-	mgtr.migrationContext.SharedColumns = mgtr.migrationContext.OriginalTableColumns
-	mgtr.migrationContext.MappedSharedColumns = mgtr.migrationContext.OriginalTableColumns
+	// Aggregate the row estimate across all tables for overall progress reporting.
+	atomic.StoreInt64(&mgtr.migrationContext.RowsEstimate, totalRowsEstimate)


I think there's an existing bug here. If a table has primary key of type UNSIGNED BIGINT, it would need uint64 to avoid overflow here. That's not super likely to happen with single table, but more likely with multi table.

I don't think we'll be in the business of moving > 2^64 rows in a single cohort, but we might want to file an issue or something 🤷🏼

cc @danieljoos

I think we'll never reach more than 2^63 rows even across multiple tables.
Not even sure if that's possible at all in InnoDB.

hmmm. I can change RowsEstimate to a uint64 if we are concerned enough. The usage of that field seems minimal so probably low risk / easy effort.

danieljoos · 2026-06-26T08:45:15Z

+// accessor that has no meaning in move-tables mode (every table is handled
+// through its own MoveTable container), so calling it there is a programmer
+// error and panics to fail fast rather than silently operate on the wrong table.
 func (apl *Applier) originalTableName() string {


Same here, the originalTableName() func was introduced by our move-table changes. Maybe we can just get rid of it again.

I don't see as much harm in leaving this one here. There's a lot of callsites to it from our original PR that introduces it for single-table move. And if i remove it, it means all those callsites switch to apl.migrationContext.OriginalTableName directly and we lose our guard against calling accidentally calling OriginalTableName in move-tables mode

danieljoos · 2026-06-26T08:51:20Z

@@ -131,7 +137,7 @@ func (isp *Inspector) InspectOriginalTable() (err error) {

 func (isp *Inspector) originalTableName() string {


and another one here: the originalTableName() method has been introduced with move-tables.
Whole func could removed, imo.

same comment as here

danieljoos · 2026-06-26T09:16:10Z

-	mgtr.migrationContext.SharedColumns = mgtr.migrationContext.OriginalTableColumns
-	mgtr.migrationContext.MappedSharedColumns = mgtr.migrationContext.OriginalTableColumns
+	// Aggregate the row estimate across all tables for overall progress reporting.
+	atomic.StoreInt64(&mgtr.migrationContext.RowsEstimate, totalRowsEstimate)


I think we'll never reach more than 2^63 rows even across multiple tables.
Not even sure if that's possible at all in InnoDB.

zacharysierakowski added 10 commits June 22, 2026 13:39

wip

d6ab6c8

Merge branch 'feature-move-tables' into multitable-spike

43d298b

stash

7778d74

tidy printMigrationStatusHint

f9ad349

three tables

aadb5ee

Merge branch 'lag-measurement-mods' into multitable-spike

7e18a4e

Fix TableNames[0] refs and change checkpoint to row per table. Add ha…

d4ab1bf

…sh of ordered table names for consistent identifier

update GH_OST_TABLE_NAME

a85e5f6

misc

79394c2

strong guards (some panics) against any calls to .OriginalTableName o…

dd8b581

…r things like "GetTargetTableName" in move-tables mode

zacharysierakowski added the feature-move-tables PRs that are associated with the new move-tables feature label Jun 23, 2026

zacharysierakowski changed the title ~~Multitable~~ Move-tables. Multi-table Jun 23, 2026

zacharysierakowski added 5 commits June 23, 2026 16:10

remove dead code

a4fe2d2

guard read/write checkpoint funcs

43db172

fix row closure lint

42ce923

more guards on non-move-table specific funcs. fix up front table exis…

b133029

…tance checks

Merge branch 'chriskirkland/issues-8260-pt2' into multitable-spike

c6ca7ef

zacharysierakowski changed the base branch from feature-move-tables to chriskirkland/issues-8260-pt2 June 23, 2026 18:27

zacharysierakowski added 11 commits June 23, 2026 18:29

Merge branch 'chriskirkland/issues-8260-pt2' into multitable-spike

ec1671d

fix existing test case that queried the checkpoint table

a52ebf7

merge conflict

19002c0

rename func to be more clear for move-tables multi table drop

dc1d943

applier_test fixes

d38bf67

test coverage fix, mising movetables.tableNames

56a6c4a

failpoint merge conflict

6a3055a

Merge branch 'chriskirkland/issues-8260-pt2' into multitable-spike

d378a17

fixed callsite of move-tables-panic-after-row-copy

cdebc6c

naming things

dd6d32a

dynamic lookup _ghk

b203557

zacharysierakowski self-assigned this Jun 23, 2026

zacharysierakowski added 3 commits June 23, 2026 20:16

couple straggling issues from e2e testing

b68a807

Merge branch 'chriskirkland/issues-8260-pt2' into multitable-spike

cb1f359

Merge branch 'feature-move-tables' into multitable-spike

a3a8f09

zacharysierakowski changed the base branch from chriskirkland/issues-8260-pt2 to feature-move-tables June 24, 2026 15:03

zacharysierakowski added 2 commits June 24, 2026 15:57

added some unit tests for checkpoint

9e752b8

integration tests

69e1110

zacharysierakowski commented Jun 25, 2026

View reviewed changes

zacharysierakowski requested review from chriskirkland and danieljoos June 25, 2026 17:15

zacharysierakowski marked this pull request as ready for review June 25, 2026 17:15

zacharysierakowski requested review from meiji163 and timvaillancourt as code owners June 25, 2026 17:15

Copilot AI review requested due to automatic review settings June 25, 2026 17:15

Copilot started reviewing on behalf of zacharysierakowski June 25, 2026 17:17 View session

Copilot AI reviewed Jun 25, 2026

View reviewed changes

chriskirkland approved these changes Jun 26, 2026

View reviewed changes

danieljoos approved these changes Jun 26, 2026

View reviewed changes

get rid of GetTargetTableName

8738524

chriskirkland approved these changes Jun 26, 2026

View reviewed changes

zacharysierakowski merged commit a6d682a into feature-move-tables Jun 26, 2026
9 checks passed

		@@ -131,7 +137,7 @@ func (isp *Inspector) InspectOriginalTable() (err error) {

		func (isp *Inspector) originalTableName() string {

Uh oh!

Conversation

zacharysierakowski commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Multi-table --move-tables

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

chriskirkland left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zacharysierakowski commented Jun 23, 2026 •

edited

Loading

Multi-table `--move-tables`