[SIP] Correct totals/subtotals for non-additive metrics (Table + Pivot)#41184
Draft
rusackas wants to merge 11 commits into
Draft
[SIP] Correct totals/subtotals for non-additive metrics (Table + Pivot)#41184rusackas wants to merge 11 commits into
rusackas wants to merge 11 commits into
Conversation
Draft proposal + prototype dev zone for correcting totals and subtotals for non-additive metrics across Table and Pivot Table charts. Consolidates issues apache#25747, apache#32260, apache#38674, apache#36165, apache#37627, apache#34350/apache#34425/apache#34426 and the design discussion apache#29297. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…lvage Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Unit-layer acceptance tests encoding the known reported cases from SIP.md: additive fast-path (green guard), the contribution zero-trap behind apache#37627 (characterized), and the ratio/apache#25747 + distinct-count/apache#36165 totals as strict xfails the POC will flip green. CI stays green via xfail(strict). Refs apache#25747, apache#29297 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
✅ Deploy Preview for superset-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #41184 +/- ##
==========================================
- Coverage 64.31% 64.31% -0.01%
==========================================
Files 2651 2652 +1
Lines 144773 144842 +69
Branches 33407 33439 +32
==========================================
+ Hits 93111 93155 +44
- Misses 49992 50011 +19
- Partials 1670 1676 +6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
End-to-end tests on the api/v1/chart/data path against birth_names: - Bucket A regression guards: ratio + COUNT_DISTINCT grand totals are DB-computed at no-GROUP-BY, so they're already correct for the table summary (not the sum of per-group cells). - Bucket B xfail(strict): percent/contribution column is absent from the summary row because the totals query is built with post_processing=[] (apache#37627, apache#34350). Uses get_query_result (not get_payload) to avoid the cache layer. Refs apache#25747, apache#29297 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
CI confirmed the Bucket A hypothesis: the ratio grand total is DB-computed (passed) and Bucket B percent column is broken (xfailed). The distinct-count guard failed on a too-strong assertion (assert 619 < 619) because names don't recur across states in the birth_names fixture. Switch to COUNT_DISTINCT(gender), which recurs across every state, so summing per-group counts genuinely double-counts and the grand total (=2) is strictly less. Refs apache#25747, apache#29297 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
POC (phase 1, Table chart) for the non-additive totals SIP. CI confirmed the Table grand total is already DB-correct (Bucket A); the only Table defect was Bucket B: the show_totals summary query was built with post_processing=[], dropping the contribution op behind percent metrics, so the summary % column came back empty (apache#37627, apache#34350). Retain post_processing on that query -- with no GROUP BY the total's contribution to itself is 100%. - buildQuery.ts: inherit post_processing on the show_totals totals query - buildQuery.test.ts: assert the summary query keeps the contribution op - integration: reframe the Bucket B case as a backend-capability guard - SIP.md: record the empirical finding and phase-1 progress Refs apache#25747, apache#37627, apache#29297 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Phase 2 of the non-additive totals SIP (pivot table). Adopt the rollup-level combination generator from apache#34592 plus its 10 unit tests, and add the Groupby type. Each combination is a prefix of row dims x prefix of column dims; the empty combination is the grand total. buildQuery will issue one query per level so the DB computes each total at its own granularity (multi-query fallback; GROUPING SETS optimization to follow). Pure addition (not yet wired into buildQuery/transformProps, which must change together). SIP.md updated with phase-2 plan + the additivity gate refinement. Refs apache#25747, apache#32260, apache#36165, apache#29297 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Break down the remaining pivot work (QueryData type, gated multi-query buildQuery, transformProps assembly, and the ~400-line react-pivottable PivotData rewrite + TableRenderers) in dependency order, and note that the rendering rewrite needs app-level visual verification and is best landed after the apache#29297 design call confirms the multi-query approach. Refs apache#25747, apache#29297 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Adds issue-driven pivot test cases (apache#25747/apache#32260): the production react- pivottable Sum aggregator totals the per-group completion ratios to 0.75 instead of the correct SUM(actual)/SUM(target)=0.4. A test.failing case documents that no client-side aggregator can recover the right value, which is exactly why phase 2 moves totals to DB-computed per-level rollup queries. Refs apache#25747, apache#32260, apache#29297 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Make pivot subtotals/grand totals correct for non-additive metrics by computing each rollup level in the database instead of re-aggregating already-aggregated cells client-side (adapted from apache#34592, on current TS code). - buildQuery: emit one query per rollup level (prefix of rows x prefix of cols) - transformProps: zip each result with its level into QueryData[]; pick the longest-colnames result as the detail "mainQuery" for metadata/formatters - PivotTableChart: tag every record with the level (rows/columns) that produced it; drop the client-side aggregatorsFactory - react-pivottable/PivotData: replace re-aggregation with placement -- a `cellValue` passthrough stores the DB-computed value verbatim and processRecord drops it into the single matching slot (allTotal/rowTotals/colTotals/tree) - remove the now-meaningless aggregateFunction control; TableRenderers label "Total" instead of "Total (<agg>)" - update buildQuery/transformProps/tableRenders tests to the new contract; 60/60 pivot tests pass Correctness-first: multi-query runs for all metrics (correct for additive too); the additivity gate + GROUPING SETS single-query are perf optimizations tracked as phase 3 in SIP.md. Still needs in-app visual verification. Refs apache#25747, apache#32260, apache#36165, apache#38674, apache#29297 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… work Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
SUMMARY
Draft SIP + prototype development zone for fixing one of Superset's
longest-standing correctness gaps: totals and subtotals are wrong for
non-additive metrics (ratios,
COUNT_DISTINCT,AVG, percentiles, and"Percentage metrics") in both the Table and Pivot Table charts.
The proposal lives in
SIP.md. This PR is intentionally a draft sothe SIP, a failing-test matrix of the known reported cases, and the POC
implementation can be rounded out together (TDD: the POC drives the matrix
green).
Core principle: totals must be computed by the DB at the grouping
granularity, never by re-aggregating already-aggregated cells. A SQL aggregate
metric is correct at any grouping level if the same expression is evaluated
grouped at that level, so no formula language is needed for the common cases.
Approach: detect additivity (additive-only = current cheap path, unchanged)
→ for non-additive, compute rollup levels via native
GROUPING SETSin asingle query → fall back to multi-query (the #34592 approach) on engines lacking
a new
supports_grouping_setscapability → recompute (don't sum) post-processing% columns → one shared mechanism for Table + Pivot.
Consolidates: #25747 (canonical), #32260, #38674, #36165, #37627,
#34350/#34425/#34426. Design thread: #29297. Builds on prior art in #34592.
Out of scope: #39223, SIP-179 (#34245), #35089, SIP-205 (#38586).
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
N/A (proposal/scaffold).
TESTING INSTRUCTIONS
The TDD acceptance matrix is in
SIP.md(§ Validation / TDD test matrix).Failing tests and POC to follow in this PR.
ADDITIONAL INFORMATION