Skip to content

[MSE] Queries with filtering such as A IS NULL OR A = 'value' don't return rows with null values #18726

@cristianpop

Description

@cristianpop

Queries that look for rows with either nulls or specific values for a column don't return the rows with null values.

Table schema excerpt:

{
	"schemaName": "test",
	"enableColumnBasedNullHandling": true,
	"dimensionFieldSpecs": [
		...
		{
		  "name": "prop",
		  "dataType": "STRING",
		  "fieldType": "DIMENSION"
		},
		...
	]
}

Using SSE the following query returns the correct count (102493):

SET enableNullHandling = true;

SELECT COUNT(*)
FROM test
WHERE prop IS NULL
  OR prop = 'value'

On the other hand, using MSE the same query doesn't return the correct count (80):

SET useMultistageEngine=true;
SET enableNullHandling = true;

SELECT COUNT(*)
FROM test
WHERE prop IS NULL
  OR prop = 'value'

If I query for rows with the given value for prop using MSE I get the correct count (80):

SET useMultistageEngine=true;
SET enableNullHandling = true;

SELECT COUNT(*)
FROM test
WHERE prop = 'value'

If I query for rows with null values for prop using MSE I get the correct count (102413):

SET useMultistageEngine=true;
SET enableNullHandling = true;

SELECT COUNT(*)
FROM test
WHERE prop IS NULL

As can be seen from the queries above, even if I ask for rows with prop as null or with given value, MSE only returns the rows that have the given value, ignoring those with nulls.

I looked at the execution plan and it looks fine since it has NULL AS TRUE set:

Execution Plan
PinotLogicalAggregate(group=[{}], agg#0=[COUNT($0)], aggType=[FINAL])
  PinotLogicalExchange(distribution=[hash])
    PinotLogicalAggregate(group=[{}], agg#0=[COUNT()], aggType=[LEAF])
      LogicalFilter(condition=[SEARCH($5, Sarg[_UTF-8'value':VARCHAR CHARACTER SET "UTF-8"; NULL AS TRUE]:VARCHAR CHARACTER SET "UTF-8")])
        PinotLogicalTableScan(table=[[default, test]])

It seems like the null as true flag gets lost in RexExpressionUtils::handleSearch because it doesn't take it into account at all. I already have a commit right here that fixes this issue so I can open a PR for it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething is not working as expectedmulti-stageRelated to the multi-stage query enginenull supportRelated to NULL value handling

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions