Skip to content

feat(versioning): entity-version base infrastructure (gated off, dark launch)#41176

Draft
mikebridge wants to merge 25 commits into
apache:masterfrom
mikebridge:sc-111231-versioning-base-infra
Draft

feat(versioning): entity-version base infrastructure (gated off, dark launch)#41176
mikebridge wants to merge 25 commits into
apache:masterfrom
mikebridge:sc-111231-versioning-base-infra

Conversation

@mikebridge

@mikebridge mikebridge commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

SUMMARY

Lands the entity-versioning base infrastructure (schema + SQLAlchemy-Continuum wiring) gated off by default, so it deploys inert — a "base infra, dark" PR. Capture is activated later by flipping the gate default on, once validated in production.

With ENABLE_VERSIONING_CAPTURE=False (the shipped default), init_versioning() detaches Continuum's write listeners, so a save writes zero version_transaction rows and zero *_version shadow rows — proven by a behavioral test. The migration is additive and inert; the read-only /versions/ list + get endpoints are wired but return empty until capture is enabled.

⚠️ Stacked PR. Built on top of the composite-PK reshape in #39859 (Continuum's M2M tracker needs that shape). Until #39859 merges, the diff below includes its commits — review only the two versioning commits (feat(versioning): entity-version base infrastructure (gated off) and fix(versioning): gate write-path bookkeeping and harden capture); everything else belongs to #39859.

What's included (all inert with the gate off):

  • Alembic migration: version_transaction + the *_version shadow tables (additive).
  • Continuum wiring: make_versioned(), VersionTransactionFactory, VersioningFlaskPlugin, the superset/versioning/ capture machinery (baseline, change-records, diff engine).
  • The ENABLE_VERSIONING_CAPTURE gate (default off), a permanent operational kill-switch.
  • Read-only endpoints: GET /api/v1/{chart,dashboard,dataset}/<uuid>/versions/ and /versions/<version_uuid>/.
  • Mapper-level correctness (reset_ownership, UUID coercion) so existing import/clone paths behave correctly with the versioned mappers present.

Deferred to follow-ups: version restore, the cross-entity activity view, version-history retention/prune, and the frontend UI.

THIS IS A RELEASE TOGGLE / OPS KILL-SWITCH, NOT A LONG-LIVED FEATURE FLAG

A common, reasonable objection is "feature flags that change functionality are a maintenance hazard." That objection is about long-lived flags — permanent configuration surface that parameterizes product behavior per tenant, branches the test matrix indefinitely, and accumulates as config sprawl. ENABLE_VERSIONING_CAPTURE is a different category.

In the Continuous Delivery taxonomy it begins as a Release Toggle: a transitory toggle whose only job is to decouple deploy from release, so this not-yet-validated, high-blast-radius change (Continuum writes on every flush) can land on master and ship dark, then be switched on once trusted. It does not parameterize product behavior, gate an experiment, or vary per tenant.

Where it differs from the soft-delete rollout toggle in #41166 (a release toggle that is deleted after it soaks): capture is an active write path with a measurable save-path cost, so once validated the default flips off→on and the switch is retained as a permanent Ops Toggle — an operational kill-switch giving a ~30-second recovery (flip off, restart workers) if a versioning-induced regression appears in production, instead of revert-and-redeploy. Ops toggles are the one category in Fowler's taxonomy that legitimately live long; this is that, not an experiment or permission flag.

Long-lived feature flag (the wariness) ENABLE_VERSIONING_CAPTURE
Category Permission / Experiment toggle Release toggle → Ops kill-switch
Purpose Parameterize behavior / A-B per tenant, indefinitely De-risk rollout of one high-blast-radius change, then serve as emergency stop
Lifespan Indefinite by design The off-default is transitory (flips on after soak); the switch itself persists as the kill-switch
End state Both branches maintained forever One intended path (on); the off-path is "inert, zero rows" — proven by the acceptance test
Test matrix Permanent fork Suite runs capture on; the dark/off contract is pinned by one dedicated test

So the default is off only for the introducing release (deploy ≠ release); it flips on after validation, and the switch then persists purely as an operational safety valve — not as a permanent product feature flag.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A — backend, gated off; no user-visible change on deploy.

TESTING INSTRUCTIONS

# Behavioral proof of the dark-launch contract (Postgres):
pytest tests/integration_tests/versioning/capture_disabled_tests.py
# capture-off → zero version_transaction + shadow rows; capture-on control → one version row.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required config flag: ENABLE_VERSIONING_CAPTURE (default off for this release; a release toggle that flips on after soak and is retained as a permanent operational kill-switch — not removed). See the section above.
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
  • Introduces new feature or API (gated off by default)
  • Removes existing feature or API

Stacked on #39859.

Mike Bridge and others added 24 commits June 17, 2026 14:54
Replace synthetic id INTEGER PRIMARY KEY with composite PRIMARY KEY (fk1, fk2)
on the eight pure-junction tables: dashboard_roles, dashboard_slices,
dashboard_user, report_schedule_user, rls_filter_roles, rls_filter_tables,
slice_user, sqlatable_user. The redundant UNIQUE(fk1, fk2) on dashboard_slices
and report_schedule_user is dropped (subsumed by the new PK).

Migration handles dialect quirks: copy_from for tables with pre-existing
UNIQUE (so SQLite's anonymous-constraint reflection doesn't matter), wrapped-
subquery dedupe for MySQL (ERROR 1093), sa.Identity(always=False) on downgrade
to backfill the restored id column without NOT NULL violations, and distinct
PK names per direction (pk_<table> on upgrade, <table>_pkey on downgrade) to
avoid round-trip index-name collisions on Postgres.

ORM Table() definitions updated to match. UPDATING.md entry added with
operator runbook (BI-tool impact, pre-flight inventory queries, dedupe-row-
loss notice, pg_dump workaround, FK-NOT-NULL downgrade asymmetry note).

Tests: 8 schema-shape assertions (post-upgrade), 8 duplicate-rejection unit
tests, 8 distinct-pair sanity tests, 1 round-trip + idempotency test
(in-memory SQLite via Alembic MigrationContext).

Continuum-restore verification against the new shape is out of scope for this
PR; it is the responsibility of the versioning epic (sc-103156).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two cleanups from PR review:

1. ``dashboard_roles.dashboard_id`` was created nullable in revision
   e11ccdd12658 but was missing from ``TABLES_WITH_NULLABLE_FKS``. A
   production database with a stray NULL ``dashboard_id`` row would have
   failed the PK-add with a cryptic constraint violation. Fix by running
   the NULL-FK cleanup on every affected table — it is a no-op DELETE on
   tables whose FK columns are already NOT NULL, and it eliminates the
   risk of further drift in the hardcoded set. ``dashboard_roles`` is
   added to the documentation set; the runtime now does not consult it.

2. The unit-test parent-table name for ``rls_filter_roles`` and
   ``rls_filter_tables`` was ``rls_filter`` (does not exist) instead of
   the real parent ``row_level_security_filters``. Test passes either
   way (the in-memory FK is self-consistent), but the parameter is now
   accurate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four operator-experience improvements from the second review pass:

1. ``TABLES_WITH_NULLABLE_FKS`` is now explicitly documented as an
   informational set that is not consulted at runtime; the comment
   explains the previous ``dashboard_roles`` omission was the bug
   that motivated the always-run cleanup.
2. ``_delete_null_fk_rows`` docstring updated to match the
   "always run" semantics (was still claiming "called only on tables
   in TABLES_WITH_NULLABLE_FKS").
3. ``_check_no_external_fks_to_id`` now documents its scope
   limitation: ``Inspector.get_table_names()`` returns the default
   schema only, so cross-schema FKs in non-standard multi-schema
   PostgreSQL deployments would not be caught. The single-schema
   case (Superset's documented deployment) is fully covered.
4. ``_dedupe_by_min_id`` now logs a sample of up to 10 discarded
   ``(fk1, fk2, id)`` tuples at WARN before deletion, so operators
   can audit which rows the ``MIN(id)`` policy drops. The keep-
   original policy is correct in practice but discards later
   re-grants on ownership tables; the sample makes that visible.
5. ``UPDATING.md`` documents the upgrade/downgrade primary-key
   name divergence (``pk_<table>`` vs ``<table>_pkey``) so
   operators using schema-comparison tools don't mistake it for
   migration drift.

No schema or runtime-behaviour changes. All 44 migration tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address Beto's review comments on apache#39859: replace
``sa.text(f"...")`` SQL construction in the three pre-flight helpers
(``_delete_null_fk_rows``, ``_dedupe_by_min_id``, ``_assert_no_duplicates``)
with SQLAlchemy core constructs (``sa.delete``, ``sa.select``,
``sa.func``, ``.subquery()``, ``.notin_()``).

A small ``_table_clause()`` helper builds a lightweight ``TableClause``
exposing the columns the queries reference; the three helpers consume
it. Removes all ``# noqa: S608`` comments — they are no longer needed
because there is no string-interpolated SQL.

Verified the compiled SQL is identical on Postgres, MySQL, and SQLite,
including the MySQL ERROR 1093 workaround (the inner aggregation is
wrapped in a derived table via ``.subquery()``, producing
``... NOT IN (SELECT keep_id FROM (SELECT min(id) ...) AS keep_min)``).

Also drops the redundant ``f`` prefix on the two non-interpolating
lines of the ``_check_no_external_fks_to_id`` error message.

44 migration tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI test-mysql failed with:

  MySQLdb.OperationalError: (1826, "Duplicate foreign key constraint
  name 'fk_dashboard_slices_slice_id_slices'")

Root cause: MySQL scopes foreign-key constraint names per-database,
not per-table (PostgreSQL and SQLite scope per-table). The
``batch_alter_table(... recreate="always", copy_from=...)`` path
used for ``dashboard_slices`` and ``report_schedule_user`` builds
``_alembic_tmp_<table>`` carrying the original FK names from
``copy_from`` while the original table still holds those names — MySQL
rejects the temp-table creation with ERROR 1826.

Fix: on MySQL only, drop the original FK constraints by name before
the ``batch_alter_table`` runs. The ``copy_from`` re-creates them on
the rebuilt table with their original names, so the post-migration
shape is unchanged. On PostgreSQL and SQLite the original code path
still runs unchanged.

Local SQLite tests (44 passed, 1 skipped) still pass; CI will validate
on MySQL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two MySQL-only failures in the downgrade path, found by running the
full migration history against a fresh MySQL 8 container:

1. ``MySQLdb.OperationalError: (1553, "Cannot drop index 'PRIMARY':
   needed in a foreign key constraint")``. InnoDB uses the composite
   PK index to back the FK on the leftmost column. The downgrade
   tried to drop the composite PK before dropping the FKs, orphaning
   the FK's backing index. PostgreSQL and SQLite create separate
   indexes for FK columns and don't trip on this.

2. ``Field 'id' doesn't have a default value`` on subsequent INSERT.
   ``sa.Identity(always=False)`` only emits ``AUTO_INCREMENT`` on
   MySQL when the column is created with ``primary_key=True`` — our
   portable path adds the column first then creates the PK separately,
   so MySQL leaves the column without auto-generation. Existing rows
   would all collide on id=0; future inserts fail because no default.
   Postgres' ``GENERATED BY DEFAULT AS IDENTITY`` and SQLite's
   ``INTEGER PRIMARY KEY`` rowid alias don't have this gap.

Fix: extract ``_downgrade_mysql_table()`` that emits the canonical
MySQL idiom — drop FKs, then a single ALTER combining
``DROP PRIMARY KEY, ADD COLUMN id INT NOT NULL AUTO_INCREMENT,
ADD PRIMARY KEY (id)`` (which backfills existing rows with sequential
ids and preserves AUTO_INCREMENT), restore the redundant UNIQUE on
the 2 tables that originally had it, and re-add the FKs with their
original names. Postgres and SQLite keep the existing portable
``batch_alter_table`` path.

Raw SQL is unavoidable for the combined-ALTER form; per the
constitution it's allowed for dialect-specific DDL with no SQLA
equivalent, with triple-quoted strings for legibility.

Verified end-to-end: upgrade → downgrade → upgrade against a fresh
MySQL 8 container with INSERT-without-id sanity check showing the
restored ``id`` column auto-increments correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Found by running fresh-install + round-trip against a real SQLite DB:
6 of the 8 affected tables had FK columns that were originally
declared nullable. PostgreSQL and MySQL implicitly promote the
constituent columns of an ``ALTER TABLE ... ADD PRIMARY KEY`` to
``NOT NULL``; SQLite does not (it's a long-standing SQLite quirk —
only ``INTEGER PRIMARY KEY`` enforces NOT NULL on a composite-PK
column). Result: a fresh SQLite install would accept
``INSERT INTO dashboard_slices (NULL, 5)`` despite both columns
being part of the composite PK.

Our integration tests previously masked this: the test fixture seeds
columns with ``nullable=False``, so the post-upgrade NOT NULL
assertion passed regardless of whether the migration enforced it.

Fix: add explicit ``batch_op.alter_column(fk, nullable=False)`` for
both FK columns inside the per-table batch_alter_table block. On
PostgreSQL and MySQL this is a no-op (PK already implies NOT NULL);
on SQLite it adds the missing NOT NULL declaration so a fresh
install matches the data-model.md "After" contract.

Verified end-to-end:
- Postgres + MySQL: column shape unchanged (still NOT NULL)
- SQLite fresh install + round-trip: all 8 tables now have NOT NULL
  on FK columns, ``INSERT (NULL, 5)`` correctly rejected with
  IntegrityError on dashboard_slices, dashboard_user, sqlatable_user

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI cypress + playwright shards were red with:

  ERROR [flask_migrate] Error: Multiple head revisions are present
  for given argument 'head'

The recent rebase onto master pulled in
``33d7e0e21daa_add_semantic_layers_and_views.py`` (from PR apache#37815,
"semantic layer extension"), which had been authored against
``ce6bd21901ab`` as its parent — the same parent our migration
referenced. After the rebase both migrations point at
``ce6bd21901ab``, producing two heads and breaking ``flask db
upgrade head`` for any downstream consumer (CI's Cypress / Playwright
shards spin up a real Superset instance via ``superset db upgrade``,
which is why those shards failed first; the integration shards run
against a precomputed schema and didn't surface this).

Fix: chain our migration after the semantic-layer migration by
pointing ``down_revision`` at ``33d7e0e21daa``. The chain is now
linear:

    ... → ce6bd21901ab → 33d7e0e21daa (semantic layers)
                          → 2bee73611e32 (composite PK, this PR)

Verified with ``superset db heads`` (returns single head
``2bee73611e32``) and the local migration test suite (44 passed,
1 skipped).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…105349)

Add a "Sizing the maintenance window on PostgreSQL" sub-section to the
operator runbook. The simple per-table COUNT/duplicate/NULL queries
that were already there are dialect-portable but only count rows;
operators on PostgreSQL with large deployments need to characterize
the migration's runtime cost before scheduling it.

Adds four diagnostic queries:

- Per-table size, row count (from pg_class.reltuples), and which
  migration path each table will take (recreate-rewrite vs direct
  ALTER). Sizes the work concretely.
- Aggregated duplicate-row roll-up: dup_groups + total rows_dropped
  per table. Replaces eight separate per-table queries with one
  consolidated result for audit/dump-before-apply decisions.
- External-FK pre-flight check (the same one the migration runs at
  upgrade time and aborts on). Lets operators surface any blocking
  external reference ahead of the maintenance window. Should be
  empty on a stock install.
- Lock-window estimate for the two full-rewrite tables, using
  pg_relation_size and a conservative 100 MB/s rewrite throughput
  assumption. The other six use direct ALTER and are dominated by
  composite-index build time (seconds for low-millions-of-rows
  tables).

Prompted by reviewer feedback on apache#39859 from a large
deployment asking how to size the maintenance window. The original
pre-flight queries are kept for cross-dialect operators (MySQL,
SQLite) since the new queries use PostgreSQL-specific catalog views.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…349)

Mirror of the PostgreSQL diagnostic queries added in 1114877,
adapted for MySQL/InnoDB. One important difference: InnoDB rebuilds
the clustered index on every PK change, so all eight tables undergo
a full table rebuild on MySQL — not just the two that go through
the explicit ``recreate="always"`` path. The lock-window estimate
query is updated to cover all eight rather than just two, and the
"migration_path" column makes the rebuild expectation explicit
("direct ALTER (still rebuilds InnoDB clustered index)").

Other notes:
- ``information_schema.TABLES.TABLE_ROWS`` is an InnoDB estimate,
  analogous to PostgreSQL's ``reltuples``; documented inline.
- ``KEY_COLUMN_USAGE`` carries both sides of the FK in a single
  row on MySQL, so the external-FK pre-flight check is simpler
  than the PostgreSQL version (no joins between three views).
- The aggregated dedupe query is portable standard SQL; included
  verbatim for copy-paste convenience.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds ``docker-compose-mysql.yml``, a compose-override file that swaps
the default Postgres metadata DB for MySQL 8 with one extra ``-f``
flag:

  docker compose -f docker-compose.yml -f docker-compose-mysql.yml up

Useful for evaluating dialect-specific behaviour (e.g., the runtime
cost of DDL migrations on a deployment whose production metadata DB
is MySQL — the question raised by review feedback on this PR).

Mirrors the connection settings used by CI's ``test-mysql`` shard:
``mysql+mysqldb`` dialect, charset ``utf8mb4`` with binary_prefix.
Host port defaults to 13306 (configurable via ``DATABASE_PORT_MYSQL``)
to avoid colliding with a native MySQL install on 3306.

A separate volume (``db_home_mysql``) keeps MySQL data isolated from
the Postgres ``db_home`` volume, so switching between the two with
``-f`` flag toggles doesn't corrupt either side.

The Postgres-specific init scripts under
``docker/docker-entrypoint-initdb.d/`` are not mounted on the MySQL
service (they are postgres-only). Examples / cypress fixtures still
load via ``superset-init``'s post-startup steps, which run
``superset load-examples`` against whichever metadata DB is in use.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix two follow-on issues reported when starting the dev stack with
docker-compose-mysql.yml:

1. ``superset-init`` step 4 (load-examples) fails with
   ``MySQLdb.OperationalError: (2002, "Can't connect to server on 'db'")``
   because the analytics-examples DB connection inherits ``EXAMPLES_PORT=5432``
   (Postgres port) from ``docker/.env``. The override flipped
   ``DATABASE_DIALECT`` to ``mysql+mysqldb`` but left the EXAMPLES_*
   group on Postgres defaults, so the URI became
   ``mysql+mysqldb://examples:examples@db:5432/examples`` — MySQL
   container has no listener on 5432.

   Fix: add ``EXAMPLES_HOST/PORT/DB/USER/PASSWORD`` and a complete
   ``SUPERSET__SQLALCHEMY_EXAMPLES_URI`` to the ``mysql-env`` anchor.

2. The Postgres init scripts under
   ``docker/docker-entrypoint-initdb.d/`` (``cypress-init.sh``,
   ``examples-init.sh``) get mounted on the MySQL container too —
   compose merges volume lists. They invoke ``psql`` which doesn't
   exist in the MySQL image, abort with ``psql: command not found``,
   and prevent the ``examples`` DB from being created.

   Fix: add a MySQL-specific init script
   ``docker/mysql-init/examples-init.sql`` that creates the
   ``examples`` database and user, and mount it at
   ``/docker-entrypoint-initdb.d`` in the override. Compose's
   later-takes-precedence rule on duplicate volume targets displaces
   the Postgres init dir, so the MySQL container only sees the
   MySQL-compatible script.

   (Used a plain duplicate-target mount rather than the ``!override``
   tag because pre-commit's ``check-yaml`` doesn't recognize Compose's
   custom YAML tags.)

Recovery for an existing failed MySQL stack: ``docker compose -f
docker-compose.yml -f docker-compose-mysql.yml down``, then
``docker volume rm superset_db_home_mysql`` (so the new init script
runs on the next fresh boot), then ``up`` again.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add ``scripts/seed_junction_load.py``, a backend-agnostic script that
bulk-inserts synthetic parent rows (dashboards, slices, users, roles,
tables, dbs) and many-to-many junction rows for the four largest
association tables targeted by the composite-PK migration:
``dashboard_slices``, ``slice_user``, ``dashboard_user``,
``dashboard_roles``.

Designed for measuring migration runtime at varying scales — run with
a series of size flags (100K / 1M / 5M / 10M for the target table)
and time the migration at each scale to verify the predicted
``O(N log N)`` extrapolation against real numbers.

Properties:
- **Reproducible**: deterministic cross-product walk through parent IDs
  produces a stable pair sequence; re-running is replayable.
- **Idempotent**: re-running with the same target is a no-op; with a
  higher target, only new rows are added.
- **Backend-agnostic**: connects via Superset's standard ``DATABASE_*``
  env vars (or ``SUPERSET__SQLALCHEMY_DATABASE_URI``). Branches on
  dialect for ``BINARY(16)`` vs ``UUID`` vs TEXT/BLOB UUID columns.
- **Batched**: bulk INSERT 10K rows per statement.
- **Per-phase timing**: logs elapsed wall time for the parents phase,
  the junctions phase as a whole, and per junction-table.
- **Avoidance set**: loads existing junction pairs into a Python set
  so re-runs on top of pre-existing data don't collide on the
  uniqueness constraint.

Usage (inside the Superset container):

    docker exec superset-superset-1 \\
        /app/.venv/bin/python /app/scripts/seed_junction_load.py \\
        --dashboard-slices 1000000

Defaults target a "large multi-team install" shape: 1M
``dashboard_slices``, 100K each ``slice_user`` / ``dashboard_user``,
10K ``dashboard_roles``. Override per-table via flags.

Tested locally on MySQL (the user's current eval stack):
- 200/100/100/50 row mini-run produced expected counts.
- Re-running at the same target is a no-op (idempotent).
- ``--dry-run`` plans without writing.

Junction tables not yet covered (``sqlatable_user``, ``rls_filter_*``,
``report_schedule_user``) are typically small in production and
require additional parent seeding (RLS filters, report schedules)
that wasn't worth the scope here. Adding them is straightforward by
extending ``JUNCTIONS`` and writing the corresponding parent seeder.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends the stress-test seed script with an optional duplicate-row
injection step, used to measure the empirical cost of the migration's
``_dedupe_by_min_id`` phase.

Usage: after running the normal seed at a given scale, add
``--dirty-duplicates-pct 5`` (or any non-zero value) to inject that
percentage of duplicate ``(fk1, fk2)`` rows into each non-UNIQUE
junction (slice_user, dashboard_user, dashboard_roles —
dashboard_slices is skipped because its UNIQUE constraint, present
both pre- and post-migration, rejects duplicates).

Pre-condition: requires the DB to be at the pre-migration revision
(33d7e0e21daa). The post-migration composite PK rejects duplicates,
so attempting to inject on the upgraded schema errors out.

Empirical result on MySQL @ 10M dashboard_slices + ~2.1M other
junction rows + 105K injected duplicates (5% on the 3 non-UNIQUE
tables):
  Upgrade time: 1m 36s vs clean baseline 1m 37s
  → dedupe cost is within measurement noise; the table-scan that
    the migration already performs dominates whether or not
    duplicates exist.

This empirically confirms what the cost-model predicted: the
``_dedupe_by_min_id`` GROUP BY scan is the dominant cost of that
phase, and the actual per-duplicate DELETE is negligible.

NULL-FK injection deliberately skipped — would require altering the
six non-UNIQUE FK columns from NOT NULL back to nullable (the
migration's downgrade keeps them NOT NULL by design), which adds
per-backend ALTER complexity for a code path that's structurally
identical in cost shape (DELETE WHERE col IS NULL is the same scan
shape as the dedupe scan).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…5349)

Justin Park (@justinpark) reported on apache#39859:

  MySQLdb.OperationalError: (1832, "Cannot change column 'dashboard_id':
  used in a foreign key constraint 'fk_dashboard_roles_dashboard_id_dashboards'")

Root cause: ``batch_op.alter_column(fk1, nullable=False)`` for the six
non-UNIQUE association tables emits ``ALTER COLUMN`` on a column that
participates in an FK constraint. MySQL 8 rejects this with ERROR 1832
when the table has data — even when the change is just ``NULL`` →
``NOT NULL`` and the column is already part of a freshly-added
composite primary key (which InnoDB has just made implicitly NOT NULL
anyway). The error fires on populated tables only; CI's ``test-mysql``
shard runs against empty tables and so didn't catch this, while a
real production-shaped install does.

The ``alter_column`` was only ever needed for SQLite, where composite
``PRIMARY KEY`` does not promote constituent columns to ``NOT NULL``
(a long-standing SQLite quirk — only ``INTEGER PRIMARY KEY`` does).
PostgreSQL and MySQL implicitly promote PK columns to ``NOT NULL`` as
part of ``ADD PRIMARY KEY``, so the explicit step is unnecessary on
both — and on MySQL it's actively broken on populated tables.

Fix: extract the ``alter_column`` pair into a helper
``_enforce_not_null_for_sqlite()`` that no-ops on Postgres and MySQL.
Both branches of the per-table upgrade (the ``recreate="always"`` path
for the two UNIQUE-bearing tables, and the direct-ALTER path for the
other six) now call the helper instead of inlining the
``alter_column``.

Verified end-to-end: downgrade-then-upgrade against MySQL with
~12M total junction rows (10M dashboard_slices + 1M each
slice_user/dashboard_user + 100K dashboard_roles) completes in
1m 39s with no ERROR 1832. The 44 in-memory SQLite tests still pass.

Considered Justin's alternative (drop FKs on MySQL across all eight
tables, unifying the two branches) but rejected as more invasive —
it would require capturing FK metadata and explicitly re-creating
the FKs for the six non-recreate tables, since they don't go through
the ``copy_from`` path that re-creates FKs automatically. The
SQLite-only approach is more targeted: it removes the operation that
MySQL rejects rather than working around the rejection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three improvements from @aminghadersohi's review on
apache#39859:

1. **`fk["name"]` unguarded in ``_downgrade_mysql_table`` re-add loop**

   The drop loop gates on ``if fk_name := fk.get("name"):`` but the
   re-add loop accessed ``fk["name"]`` unconditionally in an f-string.
   MySQL/InnoDB always assigns FK names, so this branch was defensive,
   but the asymmetry was confusing. Symmetrized via ``continue`` at the
   top of the re-add loop.

2. **``ondelete`` whitelist before raw-SQL interpolation**

   The value comes from MySQL's ``information_schema`` (not user
   input), but interpolating a reflected string into raw SQL without a
   guard left a "what if an unexpected value appears" footgun. Added
   ``_VALID_ONDELETE_ACTIONS`` (the four SQL-standard actions) and a
   ``RuntimeError`` when an unexpected value is reflected.

3. **Direct ALTER on PostgreSQL for tables with pre-existing UNIQUE**

   ``recreate="always"`` is dialect-agnostic — on PostgreSQL it
   triggers ``CREATE TABLE AS SELECT → DROP → RENAME`` holding
   ``ACCESS EXCLUSIVE`` for the full table-copy duration. For a
   multi-million-row ``dashboard_slices``, that lock window can be
   noticeable. The reflected UNIQUE constraint has a stable name on
   PostgreSQL (default ``<table>_<cols>_key`` convention), so dropping
   it directly and then running structural change as direct ALTER
   avoids the copy entirely.

   The reflected UNIQUE name is wrapped in a new
   ``_drop_redundant_unique_by_name()`` helper. Postgres takes the
   direct path; MySQL keeps ``recreate="always"`` because InnoDB binds
   FKs to the UNIQUE's underlying index for back-reference (``DROP
   CONSTRAINT`` on the UNIQUE there raises ``ERROR 1553``); SQLite
   keeps ``recreate="always"`` because unnamed UNIQUEs reflect with
   ``name=None`` and can't be dropped by name.

Verified end-to-end: downgrade-then-upgrade against MySQL with
~12M total junction rows seeded completes in ~1m 41s (within the
range of the prior measurements).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Belt-and-braces invariant: ``t.name`` is interpolated as a
backtick-quoted identifier into the ALTER statements emitted by
``_downgrade_mysql_table``. The values originate from
``AFFECTED_TABLES`` (a module-level literal), so SQL injection is
already structurally precluded at the call site. Adding an explicit
``allowed = {a.name for a in AFFECTED_TABLES}`` membership check
makes that invariant load-bearing rather than implicit — a future
refactor that loosens the call-site can't slip past review.

Surfaced during a downstream SQLAlchemy review on the entity-versioning
branch that stacks on top of this one; lifted onto sc-105349 because
the patch is properly scoped to this branch's composite-PK migration.
After rebasing onto master, 2bee73611e32 and master's 31dae2559c05 both revised 33d7e0e21daa, forking the alembic chain into two heads ('superset db upgrade' refuses to run). Re-point down_revision at 31dae2559c05 so the versioning chain extends the real head.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The MySQL branch dropped the live FK constraints and then re-reflected them for the copy_from table — which only returned the pre-drop list via the Inspector's per-instance info_cache, an implementation detail. Capture the list before dropping and pass it through explicitly (the downgrade path already did this).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two fixes from a 4-lens review pass:

- Resumability guard: on MySQL every DDL statement auto-commits, so a failure at table N of 8 left tables 1..N-1 converted with alembic_version un-stamped — re-running failed at table 1 (drop_column('id') on a converted table) and downgrade couldn't run either. Skip tables whose id column is already gone, making re-runs safe on every dialect.

- The down_revision re-point left two stale 33d7e0e21daa references: the migration docstring header, and — operationally worse — the seed script's --dirty-duplicates-pct help text, which instructed a downgrade that would unwind every migration since 2025-11. The help text now points at the migration's down_revision instead of hardcoding a hash.

Also: drop the never-called required_parent_count helper and trim report_schedule_user from JUNCTIONS_WITH_UNIQUE (the script never seeds that table; the entry implied coverage that doesn't exist).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The migration's riskiest half — _delete_null_fk_rows / _dedupe_by_min_id / _assert_no_duplicates — had zero coverage: the fixtures seeded no rows, and both test schema builders created FK columns NOT NULL, diverging from the real pre-migration shape (six of eight tables allowed NULLs), so test_fk_columns_not_null passed trivially.

- Build the pre-migration schema with historically-accurate nullable FKs (keyed on the migration's TABLES_WITH_NULLABLE_FKS, giving that documentation set a load-bearing consumer).
- Add test_upgrade_scrubs_null_fks_and_duplicates: seeds NULL-FK rows and duplicate pairs, runs upgrade, asserts exactly the distinct non-NULL pairs survive. Verified deletable-detectable: commenting out _dedupe_by_min_id makes it fail.
- Delete the permanently-skipped placeholder test and the captured-but-never-asserted pre_shape; replace spec-kit references (T034a/tasks.md/quickstart.md) with self-contained prose.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Four corrections to the maintenance-window guidance:

- PostgreSQL takes the direct-ALTER path for ALL eight tables (the redundant UNIQUEs are dropped by name); the doc described a recreate='always' full rewrite the code deliberately avoids, and sized only those two tables. The lock-window query now covers all eight.
- State the cumulative-lock property: Alembic runs the upgrade in one transaction on Postgres, so ACCESS EXCLUSIVE locks are held until commit — total unavailability is the sum of per-table windows; quiesce the app.
- MySQL: DROP COLUMN and ADD PRIMARY KEY are separate ALTERs, so most tables pay the InnoDB clustered-index rebuild twice — budget ~2x the single-rebuild estimate.
- Downgrade is a comparable maintenance window in its own right, not a quick undo.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
After rebasing onto current master, the migration root pointed at a stale
master revision, forking alembic into multiple heads in the PR-merge CI.
Re-point down_revision onto master's current head so the chain is linear.
Land the version-history schema + SQLAlchemy-Continuum wiring inert: the
ENABLE_VERSIONING_CAPTURE flag defaults OFF, so init_versioning detaches
Continuum's write listeners and a save writes zero version_transaction /
shadow rows (proven by capture_disabled_tests). The read-only /versions/
list + get endpoints are wired (return empty until capture is enabled).
Restore and the version-history UI ship in follow-ups.

- migration: version_transaction + *_version shadow tables (additive, inert)
- Continuum wiring: make_versioned, VersionTransactionFactory,
  VersioningFlaskPlugin, the superset/versioning/ module (minus restore)
- gate: ENABLE_VERSIONING_CAPTURE (default off; permanent kill-switch)
- read endpoints: GET /api/v1/{chart,dashboard,dataset}/<uuid>/versions[/...]
- behavioral test: capture-off writes nothing; capture-on control writes one

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added risk:db-migration PRs that require a DB migration api Related to the REST API risk:ci-script PR modifies scripts that execute in CI (supply chain risk) labels Jun 17, 2026
@netlify

netlify Bot commented Jun 17, 2026

Copy link
Copy Markdown

Deploy Preview for superset-docs-preview ready!

Name Link
🔨 Latest commit 7e7c2fa
🔍 Latest deploy log https://app.netlify.com/projects/superset-docs-preview/deploys/6a331b2175b0a80008a8f56d
😎 Deploy Preview https://deploy-preview-41176--superset-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

@codecov

codecov Bot commented Jun 17, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 57.48588% with 602 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.66%. Comparing base (6d08e79) to head (ae756e4).
⚠️ Report is 49 commits behind head on master.

Files with missing lines Patch % Lines
superset/versioning/diff.py 60.82% 87 Missing and 18 partials ⚠️
superset/versioning/queries.py 22.13% 95 Missing ⚠️
superset/daos/dataset.py 11.32% 47 Missing ⚠️
superset/versioning/api_helpers.py 34.28% 46 Missing ⚠️
superset/versioning/baseline/children.py 27.65% 34 Missing ⚠️
superset/versioning/changes/listener.py 76.51% 24 Missing and 7 partials ⚠️
superset/versioning/factory.py 72.22% 23 Missing and 7 partials ⚠️
superset/versioning/baseline/dirty.py 63.33% 13 Missing and 9 partials ⚠️
superset/versioning/baseline/insertion.py 47.61% 20 Missing and 2 partials ⚠️
superset/versioning/changes/shadow_queries.py 77.01% 14 Missing and 6 partials ⚠️
... and 19 more

❗ There is a different number of reports uploaded between BASE (6d08e79) and HEAD (ae756e4). Click for more details.

HEAD has 86 uploads less than BASE
Flag BASE (6d08e79) HEAD (ae756e4)
python 68 2
presto 11 1
hive 11 1
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #41176      +/-   ##
==========================================
- Coverage   64.31%   55.66%   -8.65%     
==========================================
  Files        2652     2672      +20     
  Lines      144799   146193    +1394     
  Branches    33415    33651     +236     
==========================================
- Hits        93122    81383   -11739     
- Misses      50016    64013   +13997     
+ Partials     1661      797     -864     
Flag Coverage Δ
hive 39.24% <34.46%> (-0.09%) ⬇️
mysql ?
postgres ?
presto 41.30% <57.48%> (+0.39%) ⬆️
python 41.36% <57.48%> (-18.19%) ⬇️
sqlite ?
unit ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Review hardening for the dark base-infra landing. The gate must make the
infrastructure truly inert when off, and capture must basically work when on.

Gate / dark-contract:
- init_versioning's flag fallback now defaults False (was True), so any
  app-factory path that doesn't load config.py stays inert instead of
  silently enabling capture; config + docstrings reconciled to "ships off".
- The chart/dashboard/dataset PUT and GET endpoints no longer run version
  bookkeeping queries unconditionally: the lookups move behind gated helpers
  (current_entity_version_info / current_entity_etag_uuid) that issue zero
  queries when capture is off, so the kill-switch covers the full save path.
- _remove_continuum_write_listeners also flips Continuum's master option off
  and verifies the detach, making options['versioning'] a single switch that
  silences Continuum's listeners and the custom baseline writer alike.

Robustness when on:
- The baseline before_flush body is wrapped so an infra error logs and the
  user's save proceeds rather than aborting the transaction.
- The baseline listener honors the master switch (it mints its own
  transaction row via direct SQL, so it can't rely on Continuum being
  detached); the change-record listener already self-gates on the absence of
  a Continuum transaction.
- The baseline shadow writer logs when a content column is dropped for a
  live/shadow name divergence instead of silently storing NULL.

Tests:
- Structural unit tests drive init_versioning's config branch (off, absent,
  on) so a gate/default inversion is caught without a DB.
- The capture-on control now asserts a version_changes row, exercising the
  full pipeline; the integration test config enables capture so the suite
  runs it (production still ships off).
- New guard test fails if a version snapshot would expose a sensitive column.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api Related to the REST API review:draft risk:ci-script PR modifies scripts that execute in CI (supply chain risk) risk:db-migration PRs that require a DB migration size/XXL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants