From 454ab5a2ba8143cea5e63a339f8ea702a1e0b09b Mon Sep 17 00:00:00 2001 From: nscuro Date: Thu, 4 Jun 2026 14:40:40 +0200 Subject: [PATCH] Update v4 migration docs Signed-off-by: nscuro --- .../administration/migrating-from-v4.md | 73 ++++++++++++++++--- docs/tutorials/rehearsing-the-v4-migration.md | 34 ++++++--- 2 files changed, 88 insertions(+), 19 deletions(-) diff --git a/docs/guides/administration/migrating-from-v4.md b/docs/guides/administration/migrating-from-v4.md index f1db4b0c..c118125e 100644 --- a/docs/guides/administration/migrating-from-v4.md +++ b/docs/guides/administration/migrating-from-v4.md @@ -161,9 +161,12 @@ v4-migrator verify \ `verify` reads but never writes. On a freshly bootstrapped target with no staging present, expect: -- Schema version `202605111028` reported `OK`. +- Flyway head `202605111028` reported `OK`. - All row count columns **except for the v5 `PERMISSION` table** zero. - No probe entries. +- The terminator line `== verify complete ==`. + +At this stage the migration has not moved any data, so `verify` shows the seeded baseline with nothing to flag. ### 3. Dry run @@ -239,23 +242,44 @@ It now reports source / staging / v5 row counts per table and surfaces every pro ```text [Schema] - OK Schema version = 202605111028 + OK Flyway head = 202605111028 [Row counts] - Table Source Staging v5 + Table Source Staging v5 Note LICENSE 5234 5234 5234 - TEAM 47 42 42 + TEAM 47 42 42 expected: dedup by NAME (-5) + COMPONENT 4823714 4823711 4823711 see [Probes] (-3) + PROJECTS_TAGS 12480 12462 12462 reduction (-18), see migration guide ... [Probes] - LICENSE 3 malformed UUID(s) dropped + COMPONENT 3 malformed UUID(s) dropped [Constraints] 18 CHECK constraint(s) hold across 6 loaded table(s) + +== verify complete == ``` -Investigate any mismatch where you do not expect dedup or skipping. -The lossy changes section below lists the expected sources of mismatch. +The `Note` column annotates tables whose row count drops from source to v5. +Row counts often decrease during migration because of deduplication, filtering, and retention. +The migrator makes these reductions intentionally. The `Note` (and the `[Probes]` section) tells you which +reason applies, so you can confirm what accounts for each drop rather than leaving it unexplained. +Each note is one of three states: + +- `expected: (-N)`. A known, intentional reduction. The reason maps to a transform in the + [Lossy and non-obvious changes](#lossy-and-non-obvious-changes) section. +- `see [Probes] (-N)`. The `[Probes]` section itemizes the drop (invalid UUIDs, skipped users, case collisions). +- `reduction (-N), see migration guide`. An intentional reduction without a dedicated note. + Consult the [Lossy and non-obvious changes](#lossy-and-non-obvious-changes) section to understand it. + +!!! warning "Permission join tables are not a data-loss signal" + + `TEAMS_PERMISSIONS` and `USERS_PERMISSIONS` both lose and gain rows as part of the v4-to-v5 permission remap + (see [Portfolio access control bypass](#portfolio-access-control-bypass)), so their row-count delta carries no meaning. + Disregard any `Note` on these two tables. + +The final `== verify complete ==` line marks a clean exit. Its absence means `verify` stopped before finishing. ### 7. Drop staging @@ -451,7 +475,7 @@ INFO -> LICENSE: 5231 rows in 27 ms (193740 rows/s) ``` The load phase additionally emits a heartbeat every 5 seconds while a single table is still running, -so large tables stay visible even when the underlying `INSERT … SELECT` has not yet committed: +so large tables stay visible: ```text INFO Loading COMPONENT into v5 @@ -460,11 +484,40 @@ INFO .. COMPONENT: still loading after 30s (expected 4823714 rows) INFO -> COMPONENT: 4823714 rows in 32104 ms (150253 rows/s) ``` -The expected count comes from the staging `tgt_*` row count and is an upper bound -(deduped, malformed-UUID-dropped rows may reduce the final number). +The expected count is an upper bound. The final count can be lower because of deduplication and dropped rows. A 100 GB v4 dataset should complete in a few hours on a workstation-class disk, less on dedicated server hardware. +### Phase completion lines + +Each phase prints an explicit completion line on success, after its per-table lines: + +```text +INFO Extract phase completed: 64 table(s), 5821044 row(s) in 184302 ms +INFO Transform phase completed: 64 table(s), 5820102 row(s) in 96118 ms +INFO Load phase completed: 64 table(s), 5820097 row(s) in 211740 ms +``` + +The `run` command, which chains extract, transform, and load, prints one final line: + +```text +INFO Migration completed: extract + transform + load finished. Run 'verify' to review row counts and probes. +``` + +A completed phase prints its completion line. For `run`, the `Migration completed` line confirms success. +**Absence of that line after the per-table output means the phase did not finish.** Check the exit code and the preceding log lines. + +After the last table loads, the migrator emits progress lines while it finalizes: + +```text +INFO Finalizing load: re-enabling triggers and resetting identity sequences +INFO Analyzing 64 loaded table(s) +INFO Refreshing PORTFOLIOMETRICS_GLOBAL materialized view +INFO Applying v5.7.0 cleanup deletes +``` + +The `Load phase completed` line follows once finalization finishes. + ## Resumability Each phase persists per-table state in the `migration_state` table inside the staging schema. diff --git a/docs/tutorials/rehearsing-the-v4-migration.md b/docs/tutorials/rehearsing-the-v4-migration.md index bd120a49..f3e86f1a 100644 --- a/docs/tutorials/rehearsing-the-v4-migration.md +++ b/docs/tutorials/rehearsing-the-v4-migration.md @@ -163,11 +163,13 @@ v4-migrator verify \ Expected output: ```text linenums="1" +== v4-migrator verify == + [Schema] - OK Schema version = 202605111028 + OK Flyway head = 202605111028 [Row counts] - Table Source Staging v5 + Table Source Staging v5 Note (no source configured) LICENSE - 0 0 TEAM - 0 0 @@ -178,9 +180,11 @@ Expected output: [Constraints] 13 CHECK constraint(s) hold across 55 loaded table(s) + +== verify complete == ``` -If the schema version is anything else or any row count is non-zero, **except in the `PERMISSION` table**, +If the Flyway head differs or any row count is non-zero, **except in the `PERMISSION` table**, the rest of the rehearsal will not work. ## Dry-running the migration @@ -248,16 +252,18 @@ We drop `--dry-run` and run the real extract, transform, and load: ``` Runtime depends on the size of our v4 dataset. -The migrator prints per-table progress and a heartbeat every five seconds for long-running tables: +The migrator prints per-table progress and a heartbeat every five seconds for long-running tables, each phase prints a completion line, and `run` ends with a `Migration completed` line that confirms success: ```text linenums="1" MetricsRetention - Metrics retention set to 30 days (cutoff = 2026-04-15T11:38:10.636499285Z) ExtractPhase - Extracting LICENSE ExtractPhase - -> 811 rows in 395 ms ... +ExtractPhase - Extract phase completed: 64 table(s), 5821044 row(s) in 184302 ms TransformPhase - Transforming LICENSE TransformPhase - -> 811 rows in 60 ms ... +TransformPhase - Transform phase completed: 64 table(s), 5820102 row(s) in 96118 ms LoadPhase - Pre-creating metrics partitions for 32 day(s) from 2026-04-15 to 2026-05-16 LoadPhase - Loading LICENSE into v5 LoadProgressReporter - -> LICENSE: 811 rows in 56 ms (14482 rows/s) @@ -267,6 +273,12 @@ LoadProgressReporter - .. VULNERABLESOFTWARE_VULNERABILITIES: still loading af LoadProgressReporter - .. VULNERABLESOFTWARE_VULNERABILITIES: still loading after 10s (expected 2740322 rows) LoadProgressReporter - .. VULNERABLESOFTWARE_VULNERABILITIES: still loading after 15s (expected 2740322 rows) ... +LoadPhase - Finalizing load: re-enabling triggers and resetting identity sequences +LoadPhase - Analyzing 64 loaded table(s) +LoadPhase - Refreshing PORTFOLIOMETRICS_GLOBAL materialized view +LoadPhase - Applying v5.7.0 cleanup deletes +LoadPhase - Load phase completed: 64 table(s), 5820097 row(s) in 211740 ms +RunCommand - Migration completed: extract + transform + load finished. Run 'verify' to review row counts and probes. ``` ## Verifying the result @@ -290,10 +302,12 @@ Expected output: OK Flyway head = 202605111028 [Row counts] - Table Source Staging v5 + Table Source Staging v5 Note LICENSE 811 811 811 LICENSEGROUP 4 4 4 LICENSEGROUP_LICENSE 131 131 131 + TEAM 47 42 42 expected: dedup by NAME (-5) + PROJECTS_TAGS 12480 12462 12462 reduction (-18), see migration guide ... [Probes] @@ -301,13 +315,15 @@ Expected output: [Constraints] 13 CHECK constraint(s) hold across 55 loaded table(s) + +== verify complete == ``` -We expect mismatches between source and v5 row counts wherever the migrator -deduplicates, drops, or rewrites rows. +Source and v5 row counts differ wherever the migrator deduplicates, drops, or rewrites rows. +The migrator makes these reductions intentionally. The `Note` column explains each one inline, pointing to the +relevant transform or to the `[Probes]` section, so we can confirm what accounts for every drop. The migration guide's [lossy and non-obvious changes](../guides/administration/migrating-from-v4.md#lossy-and-non-obvious-changes) -section catalogs every case. -We read it now and confirm that the mismatches we see match the cases the guide describes. +section catalogs every case, and we confirm that the reductions we see match the cases it describes. ## Dropping the staging schema