Skip to content

feat(versioning): capture and expose version history for charts, dashboards, and datasets#39603

Draft
mikebridge wants to merge 93 commits into
apache:masterfrom
mikebridge:sc-103156-versioning
Draft

feat(versioning): capture and expose version history for charts, dashboards, and datasets#39603
mikebridge wants to merge 93 commits into
apache:masterfrom
mikebridge:sc-103156-versioning

Conversation

@mikebridge

@mikebridge mikebridge commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

SUMMARY

Adds backend plumbing to capture a version history for every save of a chart, dashboard, or dataset, and to expose that history via three new REST endpoints per entity (list, get, restore). No frontend in this PR. Second PR in the Versioning epic (sc-103156), depending on #39859 (composite-PK reshape on M2M association tables — sc-105349) and orthogonal to #39286 (sc-103157 soft-delete). See SIP-210 / issue #39492 for full design rationale.

🚧🚧🚧 This is still a draft/spike, not ready for final review. Branch contains two temp(*) commits (demo UI dropdowns + French i18n; URL-param stripping on restore navigation) that will revert before merge. 🚧🚧🚧

Stacked PRs — merge order. This PR is the base of the versioning stack and must merge first. Two follow-up PRs build on it and merge after (either order relative to each other):

  1. Version-history retention (feat(versioning): per-workspace version-history retention cleanup job #41075) — sc-111099, branch sc-111099-version-history-retention. The time-based Celery-beat cleanup job, extracted from this PR so it ships/reviews independently. Hard-depends on the schema introduced here.
  2. Cross-entity activity view (feat(versioning): cross-entity version activity view #41076) — sc-107283, branch sc-107283-versioning-activity-view. Reads this PR's version_changes data; adds no migrations.

Upstream dependency unchanged: #39859 (composite-PK reshape, sc-105349) → this PR → {retention, activity view}.

What changed:

  • Continuum wiring. Adds sqlalchemy-continuum as a base dependency. Wired in superset/extensions/__init__.py with the validity strategy; a custom VersionTransactionFactory renames the transaction table to version_transaction (the word transaction is reserved in several dialects) and a VersioningFlaskPlugin supplies the acting user via get_user_id() (not Flask-Login's current_user) so CLI / Celery / JWT-auth API saves all attribute correctly.

  • Six shadow tables, all Continuum-native:

    • parent shadows: dashboards_version, slices_version, tables_version
    • child shadows: table_columns_version, sql_metrics_version
    • M2M shadow: dashboard_slices_version

    Plus one version_transaction table (the per-flush "who/when/where" envelope) and one version_changes table (structured diff records, FK to version_transaction with ON DELETE CASCADE).

  • Three endpoints per entity type (/chart, /dashboard, /dataset):

    • GET /api/v1/<resource>/<uuid>/versions/ — list history (version_number, version_uuid, issued_at, changed_by)
    • GET /api/v1/<resource>/<uuid>/versions/<version_uuid>/ — one snapshot (scalar fields; plus columns / metrics for datasets, slices for dashboards)
    • POST /api/v1/<resource>/<uuid>/versions/<version_uuid>/restore — restore entity to that version
  • <version_uuid> is a deterministic UUIDv5 (fixed namespace, derived from the entity UUID + Continuum transaction id). Stable across replicas and retention pruning — the same transaction always produces the same version uuid, so API consumers can cache references safely.

  • ETag headers (ETag: W/"<version_uuid>") on all three GET endpoints + the live entity GET. Foundation for optimistic-locking enforcement on writes (Phase 2); not enforced in this PR.

  • Restore uses Continuum's native Reverter wrapped in a single_flush_scope context manager (suppresses autoflush inside the block, emits one trailing flush). The single-revert / single-flush shape was the spike outcome — earlier attempts at split-revert and JSON-snapshot tables were abandoned (see spike-continuum-restore.md and the revised ADR-004 in the spec folder).

  • Baseline capture. First save under versioning of an entity that pre-existed the migration inserts a synthetic operation_type=0 row capturing the pre-edit state, attributed to the entity's existing changed_on / changed_by_fk. Listener runs before Continuum's own before_flush so the baseline transaction_id is lower than the edit's (correct ordering).

  • No-op suppression. A SkipUnmodifiedPlugin marks Continuum Operations processed=True when post-flush column values are content-equal to the previous shadow row — including JSON-aware comparison for Dashboard.json_metadata that strips frontend-stamped audit sub-keys (map_label_colors, chart_configuration, …) so saves that only re-stamp those don't pollute history.

  • Force-parent-dirty on child changes. A before_flush listener flags the versioned parent (SqlaTable) as dirty when only its versioned children (TableColumn / SqlMetric) changed, so child-only edits surface in the parent's version dropdown.

  • Structured change records. Every save writes per-field diff records to version_changes keyed to the same transaction_id. Records carry kind / path / from_value / to_value — backbone for the Phase-2 UI's "Added column X" rendering, captured in V1 so the data is available from day one without a backfill.

  • Retention (the time-based Celery-beat prune) has been extracted to the stacked follow-up PR (sc-111099) so it ships and is reviewed independently. This PR captures history; the cleanup job that ages shadow rows out lands in the retention PR. ON DELETE CASCADE on version_changes.transaction_id (added in this PR) keeps diffs in sync once the prune runs.

  • Composite PK reshape on M2M associations (sc-105349, PR refactor(db): composite PK on M2M association tables (sc-105349) #39859 — required for Continuum's M2M tracker to populate dashboard_slices_version correctly). The PRs are intended to merge in order refactor(db): composite PK on M2M association tables (sc-105349) #39859 → this one; the migration is included on this branch because the rebased history depends on it.

  • Authorisation. Version endpoints reuse the resource's existing can_write permission. No new FAB permissions. Row-level access enforced via security_manager.raise_for_ownership(entity) in the restore command.

What is NOT versioned in v1 (see specs/sc-103156-entity-versioning/future-work.md):

  • Tags, owners, roles — explicitly excluded from the versioned column set; restore leaves these at their live values.
  • Per-chart position inside a dashboardposition_json is versioned as an opaque blob (restored wholesale on dashboard restore); finer-grained layout versioning is Phase 2.
  • Auto-generated human-readable change summary text, change-type icons, search over diff content, AI attribution — all captured as Phase-2 frontend work.

Coordination with #39286 (sc-103157 soft-delete) — orthogonal in design; merge order can go either way. When sc-103157 merges, one small change hooks deleted_at into find_active_by_uuid() and the versioned models' Continuum exclude lists. Tracked as T043 in the spec.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A — backend-only. No UI is wired to the new endpoints yet. The temp(*) commits add non-final demo dropdowns and i18n strings for manual testing only; they will revert before merge.

TESTING INSTRUCTIONS

  1. Save a chart a few times to generate history:
CHART_UUID=$(curl -s http://localhost:8088/api/v1/chart/?q='(page_size:1)' | jq -r '.result[0].uuid')
for i in 1 2 3; do
  curl -s -X PUT http://localhost:8088/api/v1/chart/$CHART_UUID -d "{\"slice_name\":\"Test v$i\"}" -H 'Content-Type: application/json'
done
  1. List the history:
curl http://localhost:8088/api/v1/chart/$CHART_UUID/versions/ | jq

Expect an ordered array with version_number, version_uuid, issued_at, changed_by. Response will include an ETag: W/"<version_uuid>" header for the most recent version.

  1. Fetch one version:
VERSION_UUID=$(curl -s http://localhost:8088/api/v1/chart/$CHART_UUID/versions/ | jq -r '.result[0].version_uuid')
curl -i http://localhost:8088/api/v1/chart/$CHART_UUID/versions/$VERSION_UUID/ | head -20

Expect 200 + ETag header. A second request with If-None-Match: "<that-etag>" returns 304.

  1. Restore:
curl -X POST http://localhost:8088/api/v1/chart/$CHART_UUID/versions/$VERSION_UUID/restore

Expect 200. GET /api/v1/chart/$CHART_UUID should now reflect the restored state, and a new version row (the restore itself) appears in the version list.

  1. Inspect structured diffs:
curl http://localhost:8088/api/v1/chart/$CHART_UUID/versions/$VERSION_UUID/ | jq '.result.changes'

Each change carries {kind, path, from_value, to_value}.

  1. Repeat for dashboards and datasets using /api/v1/dashboard/ and /api/v1/dataset/. Datasets exercise child shadows (columns / metrics); dashboards exercise the M2M shadow (slices).

  2. Run the test suite:

pytest tests/integration_tests/charts/version_history_tests.py \
       tests/integration_tests/dashboards/version_history_tests.py \
       tests/integration_tests/datasets/version_history_tests.py \
       tests/integration_tests/versioning/ -v
  1. Run the performance validation harness (skipped in CI; run on demand):
SUPERSET_PERF_VALIDATION=1 pytest tests/integration_tests/versioning/perf_validation_tests.py -v -s

Asserts the three Success Criteria: list < 1 s, restore < 3 s, save p95 overhead < 50 ms.

ADDITIONAL INFORMATION

Migration list (in dependency order):

Migration Operation Reversible?
2bee73611e32 Composite PK reshape on dashboard_slices + 7 other association tables (sc-105349 / #39859) Yes
56cd24c07170 Create version_transaction + parent shadow tables (dashboards_version, slices_version, tables_version) Yes
e1f3c5a7b9d0 Create version_changes (structured diff records, ON DELETE CASCADE FK to version_transaction) Yes
f7a2b3c4d5e6 Create child shadow tables (table_columns_version, sql_metrics_version) + M2M shadow (dashboard_slices_version) Yes

All migrations are additive on the pre-existing slices / dashboards / tables / child tables — no existing columns altered. The composite-PK migration (2bee73611e32) reshapes the M2M association tables; round-trip tested on PostgreSQL, MySQL, and SQLite, including the MySQL FK / AUTO_INCREMENT quirks that required raw SQL workarounds (commits 56c36fde54, 65a3491861).

Write cost per save:

Table Rows added
version_transaction 1
*_version parent shadow 0 if SkipUnmodifiedPlugin filtered the save; 1 if scalars changed
Child/M2M shadows one per changed child/M2M entry
version_changes one per atomic field-level change (zero for skipped saves)

No write-path retention overhead — pruning is asynchronous via Celery beat (shipped in the stacked retention PR, sc-111099).

Performance:

The numbers below were captured before the ADR-004 reversal (JSON-snapshot → full Continuum). Architecture has changed since — child writes now go through Continuum shadows instead of dataset_snapshots / dashboard_snapshots JSON tables. Re-validation against the final architecture pending before review. Targets unchanged:

Criterion Target
SC-002 list endpoint < 1000 ms
SC-003 restore endpoint < 3000 ms
SC-004 save p95 overhead < 50 ms

Harness: SUPERSET_PERF_VALIDATION=1 pytest tests/integration_tests/versioning/perf_validation_tests.py -v -s.

  • Has associated issue: sc-103156 / SIP-210 #39492
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@github-actions github-actions Bot added risk:db-migration PRs that require a DB migration api Related to the REST API labels Apr 23, 2026
mikebridge pushed a commit to mikebridge/superset that referenced this pull request Apr 23, 2026
Phase 1 of versioning added ``sqlalchemy-continuum==1.6.0`` to
``requirements/base.txt`` directly, but the pin was missing from
``pyproject.toml``'s ``[project.dependencies]``. CI's
``check-python-deps`` job regenerates the pinned files from the
``.in`` sources via ``scripts/uv-pip-compile.sh``; without the
pyproject declaration, regeneration strips the pin out, causing:

  ModuleNotFoundError: No module named 'sqlalchemy_continuum'

…on every Python-based job (test-sqlite, test-postgres, test-mysql,
unit-tests, test-postgres-hive, test-postgres-presto,
test-load-examples, docker-build) because ``superset/extensions/
__init__.py`` unconditionally imports from it at module load time.

Adds ``"sqlalchemy-continuum>=1.6.0, <2.0.0"`` to pyproject and
re-runs ``uv-pip-compile.sh`` to sync ``base.txt`` and
``development.txt``. One package regenerates in place; the only
other diffs are uv-resolver comment-graph updates (numpy's ``# via``
list) which CI's filter ignores.

Fixes CI failures on PR apache#39603.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mikebridge pushed a commit to mikebridge/superset that referenced this pull request Apr 24, 2026
Phase 1 of versioning added ``sqlalchemy-continuum==1.6.0`` to
``requirements/base.txt`` directly, but the pin was missing from
``pyproject.toml``'s ``[project.dependencies]``. CI's
``check-python-deps`` job regenerates the pinned files from the
``.in`` sources via ``scripts/uv-pip-compile.sh``; without the
pyproject declaration, regeneration strips the pin out, causing:

  ModuleNotFoundError: No module named 'sqlalchemy_continuum'

…on every Python-based job (test-sqlite, test-postgres, test-mysql,
unit-tests, test-postgres-hive, test-postgres-presto,
test-load-examples, docker-build) because ``superset/extensions/
__init__.py`` unconditionally imports from it at module load time.

Adds ``"sqlalchemy-continuum>=1.6.0, <2.0.0"`` to pyproject and
re-runs ``uv-pip-compile.sh`` to sync ``base.txt`` and
``development.txt``. One package regenerates in place; the only
other diffs are uv-resolver comment-graph updates (numpy's ``# via``
list) which CI's filter ignores.

Fixes CI failures on PR apache#39603.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mikebridge mikebridge force-pushed the sc-103156-versioning branch from 8774778 to a1f0ddb Compare April 24, 2026 00:09
@codecov

codecov Bot commented Apr 24, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 69.57619% with 603 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.37%. Comparing base (6d08e79) to head (a655e0c).

Files with missing lines Patch % Lines
...end/src/pages/ChartList/VersionHistoryDropdown.tsx 2.32% 126 Missing ⚠️
...src/pages/DashboardList/VersionHistoryDropdown.tsx 2.32% 126 Missing ⚠️
...d/src/pages/DatasetList/VersionHistoryDropdown.tsx 2.32% 126 Missing ⚠️
superset/versioning/diff.py 83.20% 32 Missing and 13 partials ⚠️
superset/versioning/changes/listener.py 85.60% 15 Missing and 4 partials ⚠️
superset/versioning/changes/state.py 76.00% 10 Missing and 8 partials ⚠️
superset/versioning/queries.py 85.24% 11 Missing and 7 partials ⚠️
superset/initialization/__init__.py 51.51% 14 Missing and 2 partials ⚠️
superset/versioning/factory.py 85.18% 12 Missing and 4 partials ⚠️
superset/versioning/baseline/children.py 70.21% 11 Missing and 3 partials ⚠️
... and 19 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #39603      +/-   ##
==========================================
+ Coverage   64.31%   64.37%   +0.05%     
==========================================
  Files        2652     2680      +28     
  Lines      144799   146727    +1928     
  Branches    33415    33852     +437     
==========================================
+ Hits        93122    94449    +1327     
- Misses      50016    50551     +535     
- Partials     1661     1727      +66     
Flag Coverage Δ
hive 39.21% <33.08%> (-0.13%) ⬇️
javascript 68.12% <3.51%> (-0.34%) ⬇️
mysql 58.66% <86.17%> (+0.62%) ⬆️
postgres 58.72% <86.17%> (+0.61%) ⬆️
presto 41.25% <53.66%> (+0.35%) ⬆️
python 60.14% <86.17%> (+0.58%) ⬆️
sqlite 58.35% <86.17%> (+0.63%) ⬆️
unit 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

mikebridge pushed a commit to mikebridge/superset that referenced this pull request Apr 27, 2026
Phase 1 of versioning added ``sqlalchemy-continuum==1.6.0`` to
``requirements/base.txt`` directly, but the pin was missing from
``pyproject.toml``'s ``[project.dependencies]``. CI's
``check-python-deps`` job regenerates the pinned files from the
``.in`` sources via ``scripts/uv-pip-compile.sh``; without the
pyproject declaration, regeneration strips the pin out, causing:

  ModuleNotFoundError: No module named 'sqlalchemy_continuum'

…on every Python-based job (test-sqlite, test-postgres, test-mysql,
unit-tests, test-postgres-hive, test-postgres-presto,
test-load-examples, docker-build) because ``superset/extensions/
__init__.py`` unconditionally imports from it at module load time.

Adds ``"sqlalchemy-continuum>=1.6.0, <2.0.0"`` to pyproject and
re-runs ``uv-pip-compile.sh`` to sync ``base.txt`` and
``development.txt``. One package regenerates in place; the only
other diffs are uv-resolver comment-graph updates (numpy's ``# via``
list) which CI's filter ignores.

Fixes CI failures on PR apache#39603.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mikebridge mikebridge force-pushed the sc-103156-versioning branch from 7979999 to 70e21bc Compare April 27, 2026 22:25
@netlify

netlify Bot commented Apr 27, 2026

Copy link
Copy Markdown

Deploy Preview for superset-docs-preview ready!

Name Link
🔨 Latest commit a655e0c
🔍 Latest deploy log https://app.netlify.com/projects/superset-docs-preview/deploys/6a31713447d3dc0007b7817b
😎 Deploy Preview https://deploy-preview-39603--superset-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

mikebridge pushed a commit to mikebridge/superset that referenced this pull request Apr 28, 2026
Phase 1 of versioning added ``sqlalchemy-continuum==1.6.0`` to
``requirements/base.txt`` directly, but the pin was missing from
``pyproject.toml``'s ``[project.dependencies]``. CI's
``check-python-deps`` job regenerates the pinned files from the
``.in`` sources via ``scripts/uv-pip-compile.sh``; without the
pyproject declaration, regeneration strips the pin out, causing:

  ModuleNotFoundError: No module named 'sqlalchemy_continuum'

…on every Python-based job (test-sqlite, test-postgres, test-mysql,
unit-tests, test-postgres-hive, test-postgres-presto,
test-load-examples, docker-build) because ``superset/extensions/
__init__.py`` unconditionally imports from it at module load time.

Adds ``"sqlalchemy-continuum>=1.6.0, <2.0.0"`` to pyproject and
re-runs ``uv-pip-compile.sh`` to sync ``base.txt`` and
``development.txt``. One package regenerates in place; the only
other diffs are uv-resolver comment-graph updates (numpy's ``# via``
list) which CI's filter ignores.

Fixes CI failures on PR apache#39603.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mikebridge mikebridge force-pushed the sc-103156-versioning branch 2 times, most recently from db1b08c to c338d49 Compare April 30, 2026 17:29
mikebridge pushed a commit to mikebridge/superset that referenced this pull request Apr 30, 2026
Phase 1 of versioning added ``sqlalchemy-continuum==1.6.0`` to
``requirements/base.txt`` directly, but the pin was missing from
``pyproject.toml``'s ``[project.dependencies]``. CI's
``check-python-deps`` job regenerates the pinned files from the
``.in`` sources via ``scripts/uv-pip-compile.sh``; without the
pyproject declaration, regeneration strips the pin out, causing:

  ModuleNotFoundError: No module named 'sqlalchemy_continuum'

…on every Python-based job (test-sqlite, test-postgres, test-mysql,
unit-tests, test-postgres-hive, test-postgres-presto,
test-load-examples, docker-build) because ``superset/extensions/
__init__.py`` unconditionally imports from it at module load time.

Adds ``"sqlalchemy-continuum>=1.6.0, <2.0.0"`` to pyproject and
re-runs ``uv-pip-compile.sh`` to sync ``base.txt`` and
``development.txt``. One package regenerates in place; the only
other diffs are uv-resolver comment-graph updates (numpy's ``# via``
list) which CI's filter ignores.

Fixes CI failures on PR apache#39603.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mikebridge pushed a commit to mikebridge/superset that referenced this pull request May 4, 2026
Phase 1 of versioning added ``sqlalchemy-continuum==1.6.0`` to
``requirements/base.txt`` directly, but the pin was missing from
``pyproject.toml``'s ``[project.dependencies]``. CI's
``check-python-deps`` job regenerates the pinned files from the
``.in`` sources via ``scripts/uv-pip-compile.sh``; without the
pyproject declaration, regeneration strips the pin out, causing:

  ModuleNotFoundError: No module named 'sqlalchemy_continuum'

…on every Python-based job (test-sqlite, test-postgres, test-mysql,
unit-tests, test-postgres-hive, test-postgres-presto,
test-load-examples, docker-build) because ``superset/extensions/
__init__.py`` unconditionally imports from it at module load time.

Adds ``"sqlalchemy-continuum>=1.6.0, <2.0.0"`` to pyproject and
re-runs ``uv-pip-compile.sh`` to sync ``base.txt`` and
``development.txt``. One package regenerates in place; the only
other diffs are uv-resolver comment-graph updates (numpy's ``# via``
list) which CI's filter ignores.

Fixes CI failures on PR apache#39603.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mikebridge mikebridge force-pushed the sc-103156-versioning branch from c338d49 to 04c6b8b Compare May 4, 2026 23:36
mikebridge pushed a commit to mikebridge/superset that referenced this pull request May 5, 2026
Phase 1 of versioning added ``sqlalchemy-continuum==1.6.0`` to
``requirements/base.txt`` directly, but the pin was missing from
``pyproject.toml``'s ``[project.dependencies]``. CI's
``check-python-deps`` job regenerates the pinned files from the
``.in`` sources via ``scripts/uv-pip-compile.sh``; without the
pyproject declaration, regeneration strips the pin out, causing:

  ModuleNotFoundError: No module named 'sqlalchemy_continuum'

…on every Python-based job (test-sqlite, test-postgres, test-mysql,
unit-tests, test-postgres-hive, test-postgres-presto,
test-load-examples, docker-build) because ``superset/extensions/
__init__.py`` unconditionally imports from it at module load time.

Adds ``"sqlalchemy-continuum>=1.6.0, <2.0.0"`` to pyproject and
re-runs ``uv-pip-compile.sh`` to sync ``base.txt`` and
``development.txt``. One package regenerates in place; the only
other diffs are uv-resolver comment-graph updates (numpy's ``# via``
list) which CI's filter ignores.

Fixes CI failures on PR apache#39603.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mikebridge mikebridge force-pushed the sc-103156-versioning branch 2 times, most recently from d632e5e to e4f548e Compare May 7, 2026 21:37
@github-actions github-actions Bot added i18n Namespace | Anything related to localization i18n:french Translation related to French language risk:ci-script PR modifies scripts that execute in CI (supply chain risk) labels May 7, 2026
@mikebridge mikebridge force-pushed the sc-103156-versioning branch 6 times, most recently from 5549d67 to f031a2c Compare May 14, 2026 21:52
@mikebridge mikebridge force-pushed the sc-103156-versioning branch from f031a2c to c2b6db7 Compare May 18, 2026 21:47
Mike Bridge and others added 29 commits June 16, 2026 09:51
Three operational signals on the nightly prune, matching the
activity-view orchestrator's emission shape so a single Grafana filter
``superset.versioning.*`` catches both sides of the feature:

* ``superset.versioning.retention.pruned_transactions`` — gauge of
  rows actually pruned this run. ``0`` is a meaningful signal (means
  retention is enabled but nothing aged out); a sustained ``0`` over
  multiple runs is the signature of a misconfigured beat schedule
  that previously was log-only.

* ``superset.versioning.retention.skipped`` — counter; fires for the
  ``retention_days <= 0`` (operator disabled) and the
  ``no versioned classes resolved`` (init-order regression) branches.
  Lets an alert distinguish "disabled" from "running and producing
  zero" without log scraping.

* ``superset.versioning.retention.retried`` — counter; fires on every
  serialization conflict that triggered an inline retry. A sudden
  rise correlates with concurrent-write pressure and is the leading
  indicator for the ``OperationalError`` give-up path that re-raises
  after ``_MAX_RETRY_ATTEMPTS``.

The activity-view side already uses ``stats_logger_manager`` for the
per-phase ``superset.activity_view.*`` timings; the retention side
was log-only. Bringing it up to the same standard closes the v4 CD
review's "observability deficit on the nightly job" concern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Operators who redefine ``CeleryConfig`` in ``superset_config.py`` —
instead of subclassing or merging the default — silently lose the
``version_history.prune_old_versions`` entry that registers the
nightly prune. The capture path keeps writing shadow rows; the prune
never runs; disk grows until paged.

Add a startup check inside ``init_versioning`` that inspects the
resolved ``CELERY_CONFIG.beat_schedule`` and emits a WARNING when
the entry is absent. The misconfiguration is now visible in the
deploy log instead of waiting for disk pressure to surface it at
03:00 some weeks later.

Same shape as the existing ``CORS_OPTIONS["expose_headers"]`` operator
note in UPDATING.md — a known-misconfiguration mode the codebase
catches at startup so the team doesn't relearn it in production.

From the v4 continuous-delivery review (Farley + Humble): "hidden
coordination" anti-pattern — the change-set assumes the operator
will do something correct (merge their override with the new default)
that the code does not verify at runtime. This commit verifies it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The migration's ``downgrade()`` is correct against an empty schema —
verified throughout development — but the realistic operational
scenario is: we deployed, accumulated versioning rows from real saves
for hours or days, then need to roll back. That code path was
unexercised, and the CD review v4 flagged that absence as load-
bearing: an operator hitting the rollback in anger is a worse place
to find a downgrade bug than CI.

New test file ``versioning_round_trip__tests.py`` matches the
existing ``composite_pk_round_trip__tests.py`` pattern (in-memory
SQLite + Alembic ``MigrationContext``) and walks three scenarios:

* **Populated round-trip** — upgrade both migrations, insert rows
  into all 8 versioning tables (live + closed shadow rows, both
  parent and child, with field-level ``version_changes`` records and
  M2M shadow rows), downgrade both migrations, assert every table
  is gone, then re-upgrade and assert the schema shape matches the
  first upgrade byte-for-byte (idempotency under round-trip).

* **Empty downgrade** — sanity belt-and-braces that downgrade run
  immediately after upgrade (no rows) is also clean. Catches the
  case where the population step somehow influenced the drop path.

* **Indexes-downgrade idempotency** — runs ``8f3a1b2c4d5e.downgrade``
  twice in a row. The second call must be a no-op (the migration
  uses ``if_exists=True`` on every drop) so an operator who
  interrupts and re-runs doesn't hit a missing-index error.

The MEDIUMTEXT cross-backend dimension is delegated to the CI matrix
(SQLite collapses every text column to TEXT regardless of declared
type); the shape pinned here is reversibility under load.

From the v4 continuous-delivery review's "untested operational
rollback" finding.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d code)

Four small follow-ups from the v5 review cycle, all on the operational
infrastructure that landed in v4:

* **`_warn_if_retention_beat_missing` runs before the kill-switch
  early-return** (clean-code C1 / tidy-first W1 / python W3). The
  retention task lives in its own beat entry and runs against existing
  shadow data regardless of capture; the warn-log was previously
  bypassed on exactly the deployment path (kill-switch flipped in
  anger) where the operator is most likely to also have a hand-rolled
  ``CeleryConfig`` with a missing prune entry. Move the call above the
  early-return so both misconfigurations surface at the same restart.

* **`_warn_if_retention_beat_missing` handles dict-form and None
  `CELERY_CONFIG`** (python W2 / sqlalchemy W1 / CD-2). The default
  shape is ``type[CeleryConfig] | None``, but Celery itself accepts a
  dict via ``config_from_object``, and ``None`` is the documented
  "disable Celery entirely" path. The prior ``getattr(_, "beat_schedule",
  None) if _ else None`` fell through to the WARNING in both cases,
  emitting a false positive for operators who chose either shape on
  purpose. Discriminate by ``isinstance(dict)`` and short-circuit on
  ``None``. Also extract the retention task name to a class-level
  ``_RETENTION_TASK_NAME`` constant so the previous 90-char line
  shortens and the literal is no longer duplicated against the default
  in ``config.py``.

* **`ENABLE_VERSIONING_CAPTURE` env-var-to-bool uses
  ``utils.parse_boolean_string``** (python W1). The hand-rolled
  ``.lower() == "true"`` only matched the literal ``"true"``; operators
  setting ``1``, ``yes``, ``on``, ``True`` (no .lower call) silently
  got ``False`` (capture on — the safe direction for a default-on
  kill-switch, but surprising and inconsistent with the rest of
  ``config.py`` which uses the helper).

* **`from __future__ import annotations` in `versioning/factory.py`**
  (python W4). Every other versioning module has it; this was the lone
  outlier. PEP 604 union syntax in a local-variable annotation worked
  without it on Python 3.10+, so this is consistency, not correctness.

* **Drop the dead `last_exc` / unreachable `RuntimeError`** in
  ``_prune_old_versions_impl`` (tidy-first dead-code finding). The
  retry loop always returns or re-raises; the post-loop fallback was
  defensive code for a control-flow path the loop's exit condition
  cannot reach. Replaced with a short ``AssertionError`` that mypy
  needs for type-narrowing and that documents the invariant
  (post-loop = "the loop's exit condition was changed incorrectly").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…etrics

Three v4 operational features shipped without direct test coverage; the
v5 review flagged this as load-bearing for the next refactor pass. Add
ten focused unit tests:

**`tests/unit_tests/initialization_test.py::TestInitVersioning`** — six
tests covering the kill-switch flow and the four ``CELERY_CONFIG``
shapes the warn-log helper now discriminates:

* ``test_kill_switch_off_skips_listener_registration`` — pins the
  contract that ``ENABLE_VERSIONING_CAPTURE=False`` short-circuits
  ``init_versioning`` before either listener registers.
* ``test_warn_when_celery_beat_schedule_missing_retention_entry`` —
  class-shaped config with the entry absent fires WARNING.
* ``test_no_warn_when_celery_beat_schedule_includes_retention_entry``
  — class with the entry present is silent.
* ``test_no_warn_when_celery_config_is_none`` — Celery-disabled
  deployment doesn't see the false-positive that motivated the v5
  ``isinstance / None`` guard.
* ``test_dict_form_celery_config_with_entry_does_not_warn`` — Celery's
  documented dict-shape works.
* ``test_dict_form_celery_config_without_entry_warns`` — and the
  dict-shape symmetry holds when the entry is missing.

**`tests/unit_tests/tasks/test_version_history_retention.py`** — four
tests pinning the statsd-emission contract on every branch:

* ``test_retention_disabled_emits_skipped_metric`` — ``retention_days=0``
  fires ``.skipped``.
* ``test_no_versioned_classes_resolved_emits_skipped_metric`` — the
  init-order-regression branch also fires ``.skipped`` (same metric on
  purpose; dashboard alert is "no work happening", WARNING log carries
  the why).
* ``test_serialization_failure_then_success_increments_retried_once``
  — one ``OperationalError`` on attempt 1 fires ``.retried`` exactly
  once, succeeds on attempt 2, records ``retried=1`` in the stats
  dict, emits the ``.pruned_transactions`` gauge.
* ``test_all_attempts_fail_reraises_after_max_retries`` — exhausted
  retries fire ``.retried`` exactly ``_MAX_RETRY_ATTEMPTS`` times and
  re-raise so the outer Celery wrapper catches.

Total: 10 new tests, all passing in <1s wall-clock. Closes the v5 CD
+ clean-code finding ("the operational instrumentation that just
shipped isn't itself pipeline-gated by tests"). A future refactor that
restructures ``init_versioning`` or renames a metric now has to
deliberately update these tests rather than silently breaking the
contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ed models

The versioning bootstrap probed each model with version_class() and
swallowed any failure with a bare `except Exception: pass`. A model that
failed to wire was silently dropped from VERSIONED_MODELS while the
listeners still registered — change capture would degrade with no log,
metric, or error, making the failure invisible to operators.

Keep degraded-mode boot (don't fail startup), but surface the failure at
WARNING with the model name and traceback so a broken wiring is visible
in the deploy log.

Surfaced by a Codex amin-review pass (M1).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
VERSIONED_MODELS is module-level state; test fixtures that initialize multiple Superset apps per process appended duplicate entries on each re-init. Guard the append.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
op.drop_index(if_exists=True) emits DROP INDEX IF EXISTS, which stock MySQL 5.7/8.x rejects (MariaDB-only grammar) — the downgrade raised a syntax error on the first table. Probe the inspector for existence instead, which keeps the partial-application robustness on every dialect.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The pre-migration graceful-degradation sites caught OperationalError only, but a missing relation raises ProgrammingError (UndefinedTable) on PostgreSQL — and the failed statement aborts the enclosing transaction there, so the user's save failed anyway despite the catch. Catch both exception classes and run each probe under a connection-level SAVEPOINT so a failure can't poison the outer transaction.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
AuditMixinNullable stamps created_on/changed_on with naive datetime.now(); the restore stamp used datetime.utcnow(), skewing changed_on ordering on non-UTC servers (and utcnow is deprecated as of Python 3.12).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The natural-key upsert setattr'd every incoming property, so a payload 'id' on a name-matched column rewrote a live primary key, and a renamed column still carrying its old id INSERTed with a live PK while the old row's DELETE was pending in the same flush — INSERTs flush before DELETEs, colliding on the PK/UNIQUE constraints. Strip id/table_id in both branches; regression test included.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tion

With override_columns, RefreshDatasetCommand commits its own transaction between the update and the new_version reads, so the response attributed the refresh's version to the user's update. Capture the identifiers before the refresh; re-read only the ETag afterwards (it must reflect the current live version).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ion_id)

The dataset child-diff path (shadow_rows_valid_at + the prior-tx probe in
shadow_queries.py) filters table_columns_version / sql_metrics_version by
parent table_id plus a transaction-range bound. The base migration only
indexed transaction_id / end_transaction_id / operation_type, and the
live-row index leads with id — nothing served the table_id access pattern,
so those queries fell to a seq scan as version history grew.

Add a plain composite (table_id, transaction_id) on both child shadow
tables (serves both queries on every dialect, no partial-index split).
Folded into the existing shadow-index migration; round-trip test asserts
the index is present on the child shadows and absent on the parents.

Surfaced by a Codex sqlalchemy-review pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
make_versioned() runs at import of superset.extensions, so ENABLE_VERSIONING_CAPTURE=False skipped only the custom baseline/change-record listeners while Continuum's own mapper/session/engine listeners kept writing shadow and version_transaction rows on every save — contradicting the documented operator contract ('they just stop accumulating new rows'). Verified empirically: a chart save with the flag off grew slices_version and version_transaction.

Detach Continuum's write listeners in the flag-off branch. Deliberately a targeted subset of sqlalchemy_continuum.remove_versioning(): that helper also calls manager.reset(), which clears version_class_map — version_class() would then silently return the live model class and break the read-only /versions/ endpoints the flag promises to keep working. Verified both directions: flag off → no shadow/tx growth, reads intact; flag on → capture unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The marshmallow schemas in versioning/schemas.py were dead code drifting from the actual responses: the real rows carry version_uuid (absent from the schema), operation_type documented a 'restore' value that is never emitted, and the raw dicts went through Flask's jsonify — rendering issued_at as an RFC-1123 http-date instead of ISO-8601 and leaving version_uuid a UUID instance in the list but a string in the snapshot.

Dump list rows and the snapshot's _version block through VersionListItemSchema (ISO datetimes, string UUIDs, single source of shape), add the missing version_uuid field, fix the operation_type description, register the schema in all three APIs' openapi_spec_component_schemas, and $ref it from the endpoint docstrings instead of bare 'type: object'.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
factory.py, api_helpers.py and UPDATING.md referenced superset.versioning.activity and /activity/ endpoints, which live on the follow-up branch (sc-107283) and don't exist in this change. For an apache/superset reviewer these point at nothing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Three cleanups to the unmerged migration, mirrored in changes/table.py:
- Drop the parent shadow tables' audit columns (created_on/changed_on/created_by_fk/changed_by_fk, plus slices' last_saved_at/last_saved_by_fk): every one is in the models' __versioned__ exclude lists, so Continuum's runtime version Table omits them and nothing ever writes them — permanently-NULL drift that autogenerate would flag. Verified against a live app: no runtime shadow table carries any of these.
- Widen version_changes.sequence SmallInteger → Integer: per-entity sequence is assigned by unbounded enumerate(); a pathological diff could overflow 32767.
- Drop ix_version_changes_transaction_id: the UNIQUE(transaction_id, ...) constraint's backing index already serves transaction_id-prefix lookups.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The adhoc_filter subject was derived from chart id + db name — stable across runs, so on a persistent test DB only the FIRST run added a new natural key; every later run re-appended an existing key and the keyed differ correctly emitted nothing, failing the assertion. Use a per-run uuid4 suffix so each run adds a genuinely new filter key. (Belongs upstream on sc-103156 — the file rides this branch until the chain merges.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Version-history UI feedback (PR apache#40988): system-managed writes polluted the activity timeline with phantom rows the client had to suppress.

- Exclude the perm-string class (perm / schema_perm / catalog_perm) from Slice and SqlaTable versioning: bulk permission maintenance rewrites these across many entities, producing phantom transactions (10 'Chart updated' rows for one user save). They're derived security state, not user content — excluding them means Continuum sees no versioned change, so the phantom transactions disappear entirely (records, version rows, and shadow churn). Restore also can no longer resurrect stale permission strings. Shadow DDL columns dropped from the unmerged migration.

- Add shared_label_colors to DASHBOARD_JSON_METADATA_AUDIT_KEYS: the DAO rewrites it when a dashboard is merely viewed, producing phantom 'Properties updated' records. The view-time write itself is a separate round-trip-asymmetry issue (cf. apache#39706); this stops the change-record noise regardless.

Lets the frontend revert the NOISE_PATHS workaround for these paths.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Version-history UI feedback (PR apache#40988): a single dataset save produced two Continuum transactions stamped the same second (update + refresh each committed), rendered as two 'Dataset updated' rows for one user action.

Wrap UpdateDatasetCommand + RefreshDatasetCommand in one @transaction at the API layer — the decorator nests via g.in_transaction, so the commands' own decorators pass through and the wrapper commits once. The pass-through also skips the commands' on_error conversion, so it's replicated inline. This simplifies the earlier attribution fix: with a single transaction the post-commit version reads and the ETag are correct by construction. Intended semantic change: a failed refresh now rolls back the whole save instead of leaving the update applied with the refresh lost.

If the reporter's repro involved two separate HTTP requests rather than the override_columns path, that half is client-side; the regression test pins the backend path to one transaction.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Version-history UI feedback (PR apache#40988): action_kind='restore' carried no pointer to WHICH version was restored, so 'Restored to X from [date]' couldn't be rendered from API data.

Implements the __meta__ headline convention: the restore command sets ACTION_META_KEY on session.info alongside ACTION_KIND_KEY; the change-record listener pops it on its first firing for the transaction and prepends a synthetic record (kind='__meta__', path=['__meta__','restore'], to_value={version_uuid, version_number}) at sequence 0. Same lifecycle as action_kind: popped on use, cleaned up on commit/rollback. The convention generalizes to import/clone headlines later.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e row

uuid is identity, not content. A snapshot can carry a DIFFERENT uuid for the same integer id — id reuse after a hard delete (SQLite and MySQL reuse max(id)+1) while retention still holds the old entity's shadow rows. Continuum's Reverter applied it, rewriting the live row's uuid: every uuid-routed endpoint 404'd immediately after the restore. Surfaced while converting the AV-015 xfail on the activity-view branch — the strict xfail had been 'passing' on this 404 rather than its stated reason. Preserve the pre-restore uuid (with a WARNING when a foreign uuid is detected); regression test plants a foreign uuid on the oldest shadow row and proves the round-trip survives. Verified the test fails with the fix removed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…g test

The schema-wiring fix changed issued_at from Flask's RFC-1123 http-date to ISO-8601 (intended); this test still parsed the old format with email.utils.parsedate_to_datetime and failed on the new shape.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reverts the single-transaction dataset save (one logical save = one Continuum tx). Review + instrumentation uncovered two fatal interactions:

1. The listener's tx-dedup guard silently dropped the refresh's column change records (the refresh's validate() autoflushes the update; the commit flush's records were then skipped as already-processed).
2. DBEventLogger writes request logs through the SHARED scoped session and commits/rolls back mid-request (superset/utils/log.py): with the save held uncommitted across a logged sub-action, a transient logger failure ROLLED BACK the user's save (observed: 'database is locked' on SQLite → silent data loss with a 200), and on Postgres/MySQL the logger's commit would persist half-done state — breaking the atomicity the merge promised.

Until the event logger gets its own session, per-command commit boundaries are the only shape with honest failure modes. The two-transaction consequence for override_columns saves is documented at the call site; the version-history UI's adjacent-merge handling stands for this path. Keeps the attribution fix (capture identifiers before the refresh; ETag re-read after) and the DatasetRefreshFailedError 422 mapping.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Multi-lens review of the headline work (PR apache#40988 feedback):

- build_action_headline() is the single owner of the record shape; the restore command no longer hand-rolls the dict (three-module string contract).
- operation='announce' (new vocabulary member): a headline announces an action, it doesn't edit a field. The verb no longer rides in path — path is pure navigation per the ChangeRecord contract; renderers join kind='__meta__' with the transaction's action_kind.
- The injector peeks instead of popping when the buffer is empty, and runs after the child-record append: a headline-only buffer no longer defeats the multi-flush short-circuit (which would have silently dropped a later flush's records once import/clone headlines arrive).
- _BUFFER_KEY is cleared on the processed-tx early return and popped in both after_commit/after_rollback cleanups — buffered records can no longer leak into the next transaction under the wrong tx id.
- Module docstring re-synced with the session keys it manages.
- Test hygiene: the perm-rewrite probe restores the perm trio in a finally (security-routing state on a persistent DB), and the foreign-uuid test un-plants the shadow row it corrupts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move the time-based version-history retention work out of this branch so
it ships as its own stacked PR (sc-111099). Removed here:

- superset/tasks/version_history_retention.py (the prune_old_versions task)
- config: SUPERSET_VERSION_HISTORY_RETENTION_DAYS + the
  version_history.prune_old_versions CELERYBEAT_SCHEDULE entry
- celery_app.py task import
- initialization: _warn_if_retention_beat_missing beat-schedule check + call
- retention unit test + the TestDashboardVersionRetention integration class
- UPDATING.md retention config row

Doc comments in versioning/queries.py and the init test that referenced the
retention task are softened to point at sc-111099. Net cumulative diff for
this branch no longer contains the retention cleanup job.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
apache/master advances continuously; rebasing the versioning chain onto a
master commit leaves the chain's root (2bee73611e32) pointing at whatever
was master's head at rebase time, which forks alembic into multiple heads
once master lands its next migration (CI runs on the PR-merge commit, so
the fork surfaces there even when the branch alone is single-head). Point
the root at master's current head (78a40c08b4be) so the merge stays linear.
Re-run this re-point whenever master advances with a new migration before
merge.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The rebase onto current master brought ruff 0.9.7 + a stricter mypy that
flagged previously-clean base code:

- drop 4 now-unused `type: ignore[attr-defined]` comments (queries.py,
  changes/state.py, versioning_round_trip__tests.py)
- test_pin_audit_columns: the dynamic `declarative_base()` local can't be
  used as a typed base class — ignore [valid-type, misc] on the class,
  drop the now-unused attr-defined ignore
- initialization_test: type the `_initializer(config)` param as
  dict[str, Any]
- ruff-format reflow (queries.py, test_diff.py)

ruff check + format clean locally (0.9.7). mypy not runnable in this env;
fixes target the exact hook-reported errors.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two CI integration failures (surfaced once the alembic head-fork was
resolved and the test jobs actually ran):

- VersioningFlaskPlugin.transaction_args recorded whatever get_user_id()
  returned, guarding only `is None`. A mocked `g` (master's
  test_query_context_update_requires_chart_access, apache#40648) yields a Mock
  g.user.id, which then hit the integer version_transaction.user_id column
  and blew up the flush with a SQL bind error. Guard on isinstance(int) —
  production-safe (user_id is always int/None there), fixes the test.
- test_list_versions_denies_non_owner (chart/dashboard/dataset) asserted
  Alpha -> 403 on a "row-level ownership" theory the endpoint never
  implemented: it gates via raise_for_access, and Alpha has
  all_datasource_access + can_write (FR-013), so 200 is correct. Renamed to
  _allows_can_write_non_owner and assert 200; model-level denial stays
  covered by the Gamma tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mikebridge mikebridge force-pushed the sc-103156-versioning branch from 7dd245b to a655e0c Compare June 16, 2026 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api Related to the REST API i18n:french Translation related to French language i18n Namespace | Anything related to localization review:draft risk:ci-script PR modifies scripts that execute in CI (supply chain risk) risk:db-migration PRs that require a DB migration size/XXL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants