Skip to content

[UT][VL] GlutenTPCHPlanStabilitySuite q19 golden file stale — Brand#12 ExprId normalization collision #12375

Description

@brijrajk

Summary

GlutenTPCHPlanStabilitySuitetpch/q19 fails in spark-test-spark40 CI for any PR that touches Velox backend Scala files. The failure is caused by a stale golden file combined with a known limitation in the ExprId normalizer.

Affected check

spark-test-spark40 (and spark-test-spark41)

Root cause

GlutenPlanStabilitySuite.glutenNormalizeIds() uses the regex (?<prefix>(?<!id=)#)\\d+L? which matches any #<number> in the explain text — including TPC-H string literals. The p_brand filter in q19 uses values Brand#11, Brand#12, Brand#13 (actual TPC-H spec data values). These appear unquoted in the explain output:

EqualTo(p_brand, Brand#12)

The normalizer incorrectly treats #12 as an ExprId and remaps it sequentially based on encounter order. The suite code itself documents this limitation at line 67–68:

"Running all suites together in one JVM is recommended to avoid ExprId normalization issues where string constants (e.g., Brand#23 in TPCH q19) may collide with ExprId numbers."

How it manifests

The golden file was committed in #11805 (c37fee4e5, 2026-03-24). Over the 264 commits since then, new optimizer rules and expressions shifted the ExprId counter. Brand#12 now normalizes to Brand#6 and _pre_1#14 shifts to _pre_1#13, causing a spurious mismatch.

Reproduced on main at commit 6097b59a6 (2026-06-25, [MINOR][VL] Build Arrow 18 with patch for Power #12344) without any pending PR:

Tests: succeeded 21, failed 1  ← tpch/q19
BUILD FAILURE

Short-term fix

Refresh q19/explain.txt via SPARK_GENERATE_GOLDEN_FILES=1 — tracked in #12374.

Long-term fix

Make glutenNormalizeIds skip #N patterns that appear inside string literal contexts (i.e., where the # is preceded by non-whitespace word characters that are not a column/expression name). This would prevent TPC-H brand values like Brand#12 from being incorrectly normalized.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions