[BugFix] Fix dedup aggregation pushdown nullifying renamed fields (#5150) by qianheng-aws · Pull Request #27 · qianheng-aws/sql

qianheng-aws · 2026-04-07T16:02:55Z

Description

When a field is renamed via rename and a subsequent dedup operates on a different field, the aggregation-based DedupPushdownRule returns null for all renamed fields. This is because the top_hits response uses original index field names from _source, but the OpenSearchIndexEnumerator expects the renamed names from the logical plan's rowType.

Root cause: In AggregateAnalyzer.createTopHitsBuilder(), helper.inferNamedField() resolves RexInputRef fields to original index names (e.g., value), but the enumerator's field list uses renamed names (e.g., val). No mapping bridges this gap.

Fix: Build a rename mapping (originalName -> renamedName) in the LITERAL_AGG case of AggregateAnalyzer by comparing the resolved index field name with the project's field name. Pass this mapping to TopHitsParser, which applies it when constructing result maps from the _source response.

Related Issues

Resolves opensearch-project#5150

Check List

…pensearch-project#5269) (opensearch-project#5293) When querying across indices with conflicting field mappings (boolean vs text), numeric values like 0 were not coerced to boolean, causing "node must be a boolean, found NUMBER" error. Added numeric handling to parseBooleanValue() consistent with ObjectContent.booleanValue(). Signed-off-by: Heng Qian <qianheng@amazon.com>

…ensearch-project#5150) When a field is renamed via `rename` and a subsequent `dedup` operates on a different field, the aggregation-based DedupPushdownRule returns null for all renamed fields. The root cause is that the top_hits response uses original index field names from _source, but the enumerator expects the renamed names from the logical plan's rowType. Fix: Build a rename mapping (originalName -> renamedName) in the LITERAL_AGG case of AggregateAnalyzer and pass it to TopHitsParser. The parser applies this mapping when constructing result maps, so field names match what the enumerator expects. Signed-off-by: Heng Qian <qianheng@amazon.com>

qianheng-aws · 2026-04-07T16:03:18Z

Decision Log

Root Cause: The DedupPushdownRule converts dedup to a composite aggregation with top_hits. When building the top_hits, AggregateAnalyzer.createTopHitsBuilder() uses helper.inferNamedField() to resolve RexInputRef fields to original index field names (e.g., value). The OpenSearch response returns _source with these original names. However, OpenSearchIndexEnumerator.current() resolves field values using the renamed names from dedup.getRowType() (e.g., val). This mismatch causes all renamed fields to resolve as MISSING/null.

Approach: Added a fieldRenameMapping (Map<String, String>) to TopHitsParser. In the LITERAL_AGG case of AggregateAnalyzer, we build this mapping by comparing each RexInputRef field's resolved original name with the project's field name. When the parser constructs result maps from the _source response, it applies this mapping to rename keys before they reach the enumerator. The fix is minimal and backward-compatible: when no rename exists, the mapping is empty and the behavior is unchanged.

Alternatives Rejected:

Post-processing LogicalProject: Would add an extra plan node to rename fields after the scan. Rejected because it's more invasive and the scan already handles schema mapping via copyWithNewSchema.
Modify inferNamedField to return renamed names: Would break the fetchField call which needs the original index name to fetch data from OpenSearch.
Fix in OpenSearchIndexEnumerator: Would require the enumerator to maintain a name mapping, adding complexity to a general-purpose component for a dedup-specific issue.

Pitfalls:

The composite aggregation parser returns separate maps for bucket keys and metric values. The TopHitsParser rename mapping only affects the metric (top_hits) maps, not the bucket keys which already use renamed names from the composite key configuration.
Fields that are RexCall (eval expressions) or RexLiteral go through the ScriptField path which already uses the correct renamed name, so they are unaffected.

Things to Watch:

The TopHitsParser is also used for FIRST/LAST aggregations with returnSingleValue=true. The rename mapping only affects the else branch (multi-value/dedup path), so those aggregations are not impacted.
If future code adds a fetchSource with field-specific includes for the dedup path, the rename mapping would need to be applied to _source field names too (currently it handles both _source and fields response sections).

qianheng-aws added 2 commits April 3, 2026 17:00

qianheng-aws closed this Apr 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] Fix dedup aggregation pushdown nullifying renamed fields (#5150)#27

[BugFix] Fix dedup aggregation pushdown nullifying renamed fields (#5150)#27
qianheng-aws wants to merge 2 commits intorefactor/dedupe-reusable-workflowfrom
worktree-agent-a91850db

qianheng-aws commented Apr 7, 2026

Uh oh!

qianheng-aws commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

qianheng-aws commented Apr 7, 2026

Description

Related Issues

Check List

Uh oh!

qianheng-aws commented Apr 7, 2026

Decision Log

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant