Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 63 additions & 10 deletions docs/guides/administration/migrating-from-v4.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,9 +161,12 @@ v4-migrator verify \

`verify` reads but never writes. On a freshly bootstrapped target with no staging present, expect:

- Schema version `202605111028` reported `OK`.
- Flyway head `202605111028` reported `OK`.
- All row count columns **except for the v5 `PERMISSION` table** zero.
- No probe entries.
- The terminator line `== verify complete ==`.

At this stage the migration has not moved any data, so `verify` shows the seeded baseline with nothing to flag.

### 3. Dry run

Expand Down Expand Up @@ -239,23 +242,44 @@ It now reports source / staging / v5 row counts per table and surfaces every pro

```text
[Schema]
OK Schema version = 202605111028
OK Flyway head = 202605111028

[Row counts]
Table Source Staging v5
Table Source Staging v5 Note
LICENSE 5234 5234 5234
TEAM 47 42 42
TEAM 47 42 42 expected: dedup by NAME (-5)
COMPONENT 4823714 4823711 4823711 see [Probes] (-3)
PROJECTS_TAGS 12480 12462 12462 reduction (-18), see migration guide
...

[Probes]
LICENSE 3 malformed UUID(s) dropped
COMPONENT 3 malformed UUID(s) dropped

[Constraints]
18 CHECK constraint(s) hold across 6 loaded table(s)

== verify complete ==
```

Investigate any mismatch where you do not expect dedup or skipping.
The lossy changes section below lists the expected sources of mismatch.
The `Note` column annotates tables whose row count drops from source to v5.
Row counts often decrease during migration because of deduplication, filtering, and retention.
The migrator makes these reductions intentionally. The `Note` (and the `[Probes]` section) tells you which
reason applies, so you can confirm what accounts for each drop rather than leaving it unexplained.
Each note is one of three states:

- `expected: <reason> (-N)`. A known, intentional reduction. The reason maps to a transform in the
[Lossy and non-obvious changes](#lossy-and-non-obvious-changes) section.
- `see [Probes] (-N)`. The `[Probes]` section itemizes the drop (invalid UUIDs, skipped users, case collisions).
- `reduction (-N), see migration guide`. An intentional reduction without a dedicated note.
Consult the [Lossy and non-obvious changes](#lossy-and-non-obvious-changes) section to understand it.

!!! warning "Permission join tables are not a data-loss signal"

`TEAMS_PERMISSIONS` and `USERS_PERMISSIONS` both lose and gain rows as part of the v4-to-v5 permission remap
(see [Portfolio access control bypass](#portfolio-access-control-bypass)), so their row-count delta carries no meaning.
Disregard any `Note` on these two tables.

The final `== verify complete ==` line marks a clean exit. Its absence means `verify` stopped before finishing.

### 7. Drop staging

Expand Down Expand Up @@ -451,7 +475,7 @@ INFO -> LICENSE: 5231 rows in 27 ms (193740 rows/s)
```

The load phase additionally emits a heartbeat every 5 seconds while a single table is still running,
so large tables stay visible even when the underlying `INSERT … SELECT` has not yet committed:
so large tables stay visible:

```text
INFO Loading COMPONENT into v5
Expand All @@ -460,11 +484,40 @@ INFO .. COMPONENT: still loading after 30s (expected 4823714 rows)
INFO -> COMPONENT: 4823714 rows in 32104 ms (150253 rows/s)
```

The expected count comes from the staging `tgt_*` row count and is an upper bound
(deduped, malformed-UUID-dropped rows may reduce the final number).
The expected count is an upper bound. The final count can be lower because of deduplication and dropped rows.

A 100 GB v4 dataset should complete in a few hours on a workstation-class disk, less on dedicated server hardware.

### Phase completion lines

Each phase prints an explicit completion line on success, after its per-table lines:

```text
INFO Extract phase completed: 64 table(s), 5821044 row(s) in 184302 ms
INFO Transform phase completed: 64 table(s), 5820102 row(s) in 96118 ms
INFO Load phase completed: 64 table(s), 5820097 row(s) in 211740 ms
```

The `run` command, which chains extract, transform, and load, prints one final line:

```text
INFO Migration completed: extract + transform + load finished. Run 'verify' to review row counts and probes.
```

A completed phase prints its completion line. For `run`, the `Migration completed` line confirms success.
**Absence of that line after the per-table output means the phase did not finish.** Check the exit code and the preceding log lines.

After the last table loads, the migrator emits progress lines while it finalizes:

```text
INFO Finalizing load: re-enabling triggers and resetting identity sequences
INFO Analyzing 64 loaded table(s)
INFO Refreshing PORTFOLIOMETRICS_GLOBAL materialized view
INFO Applying v5.7.0 cleanup deletes
```

The `Load phase completed` line follows once finalization finishes.

## Resumability

Each phase persists per-table state in the `migration_state` table inside the staging schema.
Expand Down
34 changes: 25 additions & 9 deletions docs/tutorials/rehearsing-the-v4-migration.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,11 +163,13 @@ v4-migrator verify \
Expected output:

```text linenums="1"
== v4-migrator verify ==

[Schema]
OK Schema version = 202605111028
OK Flyway head = 202605111028

[Row counts]
Table Source Staging v5
Table Source Staging v5 Note
(no source configured)
LICENSE - 0 0
TEAM - 0 0
Expand All @@ -178,9 +180,11 @@ Expected output:

[Constraints]
13 CHECK constraint(s) hold across 55 loaded table(s)

== verify complete ==
```

If the schema version is anything else or any row count is non-zero, **except in the `PERMISSION` table**,
If the Flyway head differs or any row count is non-zero, **except in the `PERMISSION` table**,
the rest of the rehearsal will not work.

## Dry-running the migration
Expand Down Expand Up @@ -248,16 +252,18 @@ We drop `--dry-run` and run the real extract, transform, and load:
```

Runtime depends on the size of our v4 dataset.
The migrator prints per-table progress and a heartbeat every five seconds for long-running tables:
The migrator prints per-table progress and a heartbeat every five seconds for long-running tables, each phase prints a completion line, and `run` ends with a `Migration completed` line that confirms success:
Comment thread
nscuro marked this conversation as resolved.

```text linenums="1"
MetricsRetention - Metrics retention set to 30 days (cutoff = 2026-04-15T11:38:10.636499285Z)
ExtractPhase - Extracting LICENSE
ExtractPhase - -> 811 rows in 395 ms
...
ExtractPhase - Extract phase completed: 64 table(s), 5821044 row(s) in 184302 ms
TransformPhase - Transforming LICENSE
TransformPhase - -> 811 rows in 60 ms
...
TransformPhase - Transform phase completed: 64 table(s), 5820102 row(s) in 96118 ms
LoadPhase - Pre-creating metrics partitions for 32 day(s) from 2026-04-15 to 2026-05-16
LoadPhase - Loading LICENSE into v5
LoadProgressReporter - -> LICENSE: 811 rows in 56 ms (14482 rows/s)
Expand All @@ -267,6 +273,12 @@ LoadProgressReporter - .. VULNERABLESOFTWARE_VULNERABILITIES: still loading af
LoadProgressReporter - .. VULNERABLESOFTWARE_VULNERABILITIES: still loading after 10s (expected 2740322 rows)
LoadProgressReporter - .. VULNERABLESOFTWARE_VULNERABILITIES: still loading after 15s (expected 2740322 rows)
...
LoadPhase - Finalizing load: re-enabling triggers and resetting identity sequences
LoadPhase - Analyzing 64 loaded table(s)
LoadPhase - Refreshing PORTFOLIOMETRICS_GLOBAL materialized view
LoadPhase - Applying v5.7.0 cleanup deletes
LoadPhase - Load phase completed: 64 table(s), 5820097 row(s) in 211740 ms
RunCommand - Migration completed: extract + transform + load finished. Run 'verify' to review row counts and probes.
```

## Verifying the result
Expand All @@ -290,24 +302,28 @@ Expected output:
OK Flyway head = 202605111028

[Row counts]
Table Source Staging v5
Table Source Staging v5 Note
LICENSE 811 811 811
LICENSEGROUP 4 4 4
LICENSEGROUP_LICENSE 131 131 131
TEAM 47 42 42 expected: dedup by NAME (-5)
PROJECTS_TAGS 12480 12462 12462 reduction (-18), see migration guide
...

[Probes]
No probe entries.

[Constraints]
13 CHECK constraint(s) hold across 55 loaded table(s)

== verify complete ==
```

We expect mismatches between source and v5 row counts wherever the migrator
deduplicates, drops, or rewrites rows.
Source and v5 row counts differ wherever the migrator deduplicates, drops, or rewrites rows.
The migrator makes these reductions intentionally. The `Note` column explains each one inline, pointing to the
relevant transform or to the `[Probes]` section, so we can confirm what accounts for every drop.
The migration guide's [lossy and non-obvious changes](../guides/administration/migrating-from-v4.md#lossy-and-non-obvious-changes)
section catalogs every case.
We read it now and confirm that the mismatches we see match the cases the guide describes.
section catalogs every case, and we confirm that the reductions we see match the cases it describes.
Comment thread
nscuro marked this conversation as resolved.

## Dropping the staging schema

Expand Down