Skip to content

[SIP] Correct totals/subtotals for non-additive metrics (Table + Pivot)#41184

Draft
rusackas wants to merge 11 commits into
apache:masterfrom
rusackas:sip-non-additive-totals
Draft

[SIP] Correct totals/subtotals for non-additive metrics (Table + Pivot)#41184
rusackas wants to merge 11 commits into
apache:masterfrom
rusackas:sip-non-additive-totals

Conversation

@rusackas

Copy link
Copy Markdown
Member

SUMMARY

Draft SIP + prototype development zone for fixing one of Superset's
longest-standing correctness gaps: totals and subtotals are wrong for
non-additive metrics
(ratios, COUNT_DISTINCT, AVG, percentiles, and
"Percentage metrics") in both the Table and Pivot Table charts.

The proposal lives in SIP.md. This PR is intentionally a draft so
the SIP, a failing-test matrix of the known reported cases, and the POC
implementation can be rounded out together (TDD: the POC drives the matrix
green).

Core principle: totals must be computed by the DB at the grouping
granularity, never by re-aggregating already-aggregated cells. A SQL aggregate
metric is correct at any grouping level if the same expression is evaluated
grouped at that level, so no formula language is needed for the common cases.

Approach: detect additivity (additive-only = current cheap path, unchanged)
→ for non-additive, compute rollup levels via native GROUPING SETS in a
single query → fall back to multi-query (the #34592 approach) on engines lacking
a new supports_grouping_sets capability → recompute (don't sum) post-processing
% columns → one shared mechanism for Table + Pivot.

Consolidates: #25747 (canonical), #32260, #38674, #36165, #37627,
#34350/#34425/#34426. Design thread: #29297. Builds on prior art in #34592.

Out of scope: #39223, SIP-179 (#34245), #35089, SIP-205 (#38586).

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A (proposal/scaffold).

TESTING INSTRUCTIONS

The TDD acceptance matrix is in SIP.md (§ Validation / TDD test matrix).
Failing tests and POC to follow in this PR.

ADDITIONAL INFORMATION

Draft — not for review yet. Opening early as a shared SIP + prototype space.

Draft proposal + prototype dev zone for correcting totals and subtotals
for non-additive metrics across Table and Pivot Table charts. Consolidates
issues apache#25747, apache#32260, apache#38674, apache#36165, apache#37627, apache#34350/apache#34425/apache#34426 and the
design discussion apache#29297.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Superset Dev and others added 2 commits June 17, 2026 23:10
…lvage

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Unit-layer acceptance tests encoding the known reported cases from SIP.md:
additive fast-path (green guard), the contribution zero-trap behind apache#37627
(characterized), and the ratio/apache#25747 + distinct-count/apache#36165 totals as
strict xfails the POC will flip green. CI stays green via xfail(strict).

Refs apache#25747, apache#29297

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@netlify

netlify Bot commented Jun 18, 2026

Copy link
Copy Markdown

Deploy Preview for superset-docs-preview ready!

Name Link
🔨 Latest commit b685383
🔍 Latest deploy log https://app.netlify.com/projects/superset-docs-preview/deploys/6a342007f001a300086c1d2c
😎 Deploy Preview https://deploy-preview-41184--superset-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

@codecov

codecov Bot commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 78.64078% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.31%. Comparing base (ae0b1f0) to head (c246967).
⚠️ Report is 24 commits behind head on master.

Files with missing lines Patch % Lines
...s/plugin-chart-pivot-table/src/PivotTableChart.tsx 0.00% 14 Missing ⚠️
...hart-pivot-table/src/react-pivottable/utilities.ts 90.19% 5 Missing ⚠️
...gin-chart-pivot-table/src/plugin/transformProps.ts 77.77% 2 Missing ⚠️
...ivot-table/src/react-pivottable/TableRenderers.tsx 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #41184      +/-   ##
==========================================
- Coverage   64.31%   64.31%   -0.01%     
==========================================
  Files        2651     2652       +1     
  Lines      144773   144842      +69     
  Branches    33407    33439      +32     
==========================================
+ Hits        93111    93155      +44     
- Misses      49992    50011      +19     
- Partials     1670     1676       +6     
Flag Coverage Δ
hive 39.32% <ø> (-0.01%) ⬇️
javascript 68.44% <78.64%> (-0.01%) ⬇️
mysql 58.07% <ø> (+<0.01%) ⬆️
postgres 58.14% <ø> (+0.01%) ⬆️
presto 40.90% <ø> (-0.01%) ⬇️
python 59.58% <ø> (+<0.01%) ⬆️
sqlite 57.79% <ø> (+<0.01%) ⬆️
unit 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

End-to-end tests on the api/v1/chart/data path against birth_names:
- Bucket A regression guards: ratio + COUNT_DISTINCT grand totals are
  DB-computed at no-GROUP-BY, so they're already correct for the table
  summary (not the sum of per-group cells).
- Bucket B xfail(strict): percent/contribution column is absent from the
  summary row because the totals query is built with post_processing=[]
  (apache#37627, apache#34350).

Uses get_query_result (not get_payload) to avoid the cache layer.

Refs apache#25747, apache#29297

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Superset Dev and others added 2 commits June 18, 2026 11:27
CI confirmed the Bucket A hypothesis: the ratio grand total is DB-computed
(passed) and Bucket B percent column is broken (xfailed). The distinct-count
guard failed on a too-strong assertion (assert 619 < 619) because names don't
recur across states in the birth_names fixture. Switch to COUNT_DISTINCT(gender),
which recurs across every state, so summing per-group counts genuinely
double-counts and the grand total (=2) is strictly less.

Refs apache#25747, apache#29297

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
POC (phase 1, Table chart) for the non-additive totals SIP. CI confirmed the
Table grand total is already DB-correct (Bucket A); the only Table defect was
Bucket B: the show_totals summary query was built with post_processing=[],
dropping the contribution op behind percent metrics, so the summary % column
came back empty (apache#37627, apache#34350). Retain post_processing on that query -- with
no GROUP BY the total's contribution to itself is 100%.

- buildQuery.ts: inherit post_processing on the show_totals totals query
- buildQuery.test.ts: assert the summary query keeps the contribution op
- integration: reframe the Bucket B case as a backend-capability guard
- SIP.md: record the empirical finding and phase-1 progress

Refs apache#25747, apache#37627, apache#29297

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Superset Dev and others added 3 commits June 18, 2026 11:54
Phase 2 of the non-additive totals SIP (pivot table). Adopt the rollup-level
combination generator from apache#34592 plus its 10 unit tests, and add the Groupby
type. Each combination is a prefix of row dims x prefix of column dims; the
empty combination is the grand total. buildQuery will issue one query per level
so the DB computes each total at its own granularity (multi-query fallback;
GROUPING SETS optimization to follow).

Pure addition (not yet wired into buildQuery/transformProps, which must change
together). SIP.md updated with phase-2 plan + the additivity gate refinement.

Refs apache#25747, apache#32260, apache#36165, apache#29297

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Break down the remaining pivot work (QueryData type, gated multi-query
buildQuery, transformProps assembly, and the ~400-line react-pivottable
PivotData rewrite + TableRenderers) in dependency order, and note that the
rendering rewrite needs app-level visual verification and is best landed after
the apache#29297 design call confirms the multi-query approach.

Refs apache#25747, apache#29297

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Adds issue-driven pivot test cases (apache#25747/apache#32260): the production react-
pivottable Sum aggregator totals the per-group completion ratios to 0.75
instead of the correct SUM(actual)/SUM(target)=0.4. A test.failing case
documents that no client-side aggregator can recover the right value, which is
exactly why phase 2 moves totals to DB-computed per-level rollup queries.

Refs apache#25747, apache#32260, apache#29297

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Superset Dev and others added 2 commits June 18, 2026 16:09
Make pivot subtotals/grand totals correct for non-additive metrics by computing
each rollup level in the database instead of re-aggregating already-aggregated
cells client-side (adapted from apache#34592, on current TS code).

- buildQuery: emit one query per rollup level (prefix of rows x prefix of cols)
- transformProps: zip each result with its level into QueryData[]; pick the
  longest-colnames result as the detail "mainQuery" for metadata/formatters
- PivotTableChart: tag every record with the level (rows/columns) that produced
  it; drop the client-side aggregatorsFactory
- react-pivottable/PivotData: replace re-aggregation with placement -- a
  `cellValue` passthrough stores the DB-computed value verbatim and processRecord
  drops it into the single matching slot (allTotal/rowTotals/colTotals/tree)
- remove the now-meaningless aggregateFunction control; TableRenderers label
  "Total" instead of "Total (<agg>)"
- update buildQuery/transformProps/tableRenders tests to the new contract;
  60/60 pivot tests pass

Correctness-first: multi-query runs for all metrics (correct for additive too);
the additivity gate + GROUPING SETS single-query are perf optimizations tracked
as phase 3 in SIP.md. Still needs in-app visual verification.

Refs apache#25747, apache#32260, apache#36165, apache#38674, apache#29297

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… work

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant