Queries that look for rows with either nulls or specific values for a column don't return the rows with null values.
Table schema excerpt:
{
"schemaName": "test",
"enableColumnBasedNullHandling": true,
"dimensionFieldSpecs": [
...
{
"name": "prop",
"dataType": "STRING",
"fieldType": "DIMENSION"
},
...
]
}
Using SSE the following query returns the correct count (102493):
SET enableNullHandling = true;
SELECT COUNT(*)
FROM test
WHERE prop IS NULL
OR prop = 'value'
On the other hand, using MSE the same query doesn't return the correct count (80):
SET useMultistageEngine=true;
SET enableNullHandling = true;
SELECT COUNT(*)
FROM test
WHERE prop IS NULL
OR prop = 'value'
If I query for rows with the given value for prop using MSE I get the correct count (80):
SET useMultistageEngine=true;
SET enableNullHandling = true;
SELECT COUNT(*)
FROM test
WHERE prop = 'value'
If I query for rows with null values for prop using MSE I get the correct count (102413):
SET useMultistageEngine=true;
SET enableNullHandling = true;
SELECT COUNT(*)
FROM test
WHERE prop IS NULL
As can be seen from the queries above, even if I ask for rows with prop as null or with given value, MSE only returns the rows that have the given value, ignoring those with nulls.
I looked at the execution plan and it looks fine since it has NULL AS TRUE set:
Execution Plan
PinotLogicalAggregate(group=[{}], agg#0=[COUNT($0)], aggType=[FINAL])
PinotLogicalExchange(distribution=[hash])
PinotLogicalAggregate(group=[{}], agg#0=[COUNT()], aggType=[LEAF])
LogicalFilter(condition=[SEARCH($5, Sarg[_UTF-8'value':VARCHAR CHARACTER SET "UTF-8"; NULL AS TRUE]:VARCHAR CHARACTER SET "UTF-8")])
PinotLogicalTableScan(table=[[default, test]])
It seems like the null as true flag gets lost in RexExpressionUtils::handleSearch because it doesn't take it into account at all. I already have a commit right here that fixes this issue so I can open a PR for it.
Queries that look for rows with either nulls or specific values for a column don't return the rows with null values.
Table schema excerpt:
{ "schemaName": "test", "enableColumnBasedNullHandling": true, "dimensionFieldSpecs": [ ... { "name": "prop", "dataType": "STRING", "fieldType": "DIMENSION" }, ... ] }Using SSE the following query returns the correct count (102493):
On the other hand, using MSE the same query doesn't return the correct count (80):
If I query for rows with the given value for prop using MSE I get the correct count (80):
If I query for rows with null values for prop using MSE I get the correct count (102413):
As can be seen from the queries above, even if I ask for rows with prop as null or with given value, MSE only returns the rows that have the given value, ignoring those with nulls.
I looked at the execution plan and it looks fine since it has NULL AS TRUE set:
It seems like the null as true flag gets lost in RexExpressionUtils::handleSearch because it doesn't take it into account at all. I already have a commit right here that fixes this issue so I can open a PR for it.