Add nomv command #5130
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new PPL command Changes
Sequence Diagram(s)sequenceDiagram
participant User as User
participant Parser as Parser / AstBuilder
participant AST as NoMv AST
participant Analyzer as Analyzer / FieldResolution
participant Calcite as CalciteRelNodeVisitor
participant Exec as Execution
User->>Parser: submit "nomv <field>" query
Parser->>AST: build NoMv(field)
AST->>Analyzer: accept(visitor) -> visitNoMv
Analyzer->>Analyzer: collect required fields (FieldResolutionVisitor)
Analyzer->>Calcite: rewrite NoMv -> Eval (rewriteAsEval) and request RelNode
Calcite->>Exec: produce plan with ARRAY_JOIN (via Eval translation)
Exec->>User: return explain or execute results
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 10
🤖 Fix all issues with AI agents
In `@core/src/main/java/org/opensearch/sql/analysis/Analyzer.java`:
- Around line 545-548: Add JavaDoc to the public visitor method visitNoMv(NoMv
node, AnalysisContext context) in Analyzer: document the parameters (`@param` node
description, `@param` context description), the return value (`@return` LogicalPlan
describing what is returned), and the exception thrown (`@throws`
SomeExceptionType - e.g., the exception produced by
getOnlyForCalciteException("nomv")) so the method signature visitNoMv(NoMv,
AnalysisContext) and its throw behavior are covered by the JavaDoc. Ensure the
JavaDoc is placed immediately above the method declaration and uses proper
JavaDoc tags `@param`, `@return`, and `@throws`.
In `@core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java`:
- Around line 476-478: Add a JavaDoc comment to the new public visitor method
visitNoMv(NoMv node, C context) in AbstractNodeVisitor describing its purpose
and including `@param` for "node" and "context", `@return` describing the returned
T, and `@throws` for any runtime exceptions it may propagate (e.g.,
RuntimeException or specific exceptions thrown by visitChildren), ensuring the
JavaDoc follows the project's standard wording and tags.
In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`:
- Around line 3343-3346: Add JavaDoc to the public visitor method visitNoMv in
class CalciteRelNodeVisitor: document the method purpose, include `@param` entries
for the NoMv node and CalcitePlanContext context, an `@return` describing the
returned RelNode, and an `@throws` describing any runtime exceptions it may
propagate (e.g., if visitEval((Eval) node.rewriteAsEval(), context) can throw).
Reference the involved symbols in the comment (NoMv, Eval, visitEval,
CalcitePlanContext, RelNode) so readers know what is being described.
In
`@core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java`:
- Line 81: Remove the unused NOMV enum constant from the BuiltinFunctionName
enum: locate the BuiltinFunctionName declaration and delete the line defining
NOMV(FunctionName.of("nomv")), ensuring enum punctuation remains valid (adjust
trailing commas/semicolon as needed); no other code changes are required because
NoMv is rewritten to Eval(mvjoin(field, "\n")) before function resolution and
NOMV is not referenced elsewhere.
In `@docs/user/ppl/cmd/nomv.md`:
- Around line 53-61: The heading "Example 2: nomv with multiple fields" is
misleading because the nomv command accepts a single field; update the heading
to something like "Example 2: nomv with a single field" or "Example 2: nomv
usage" so it doesn't imply multi-field support, and ensure the example under the
nomv command still shows a single field (location) to match the heading;
reference the nomv command and the location field in the example when making the
change.
- Around line 11-15: Update the nomv.md documentation to explicitly state that
the target argument must be a direct field reference (not an expression,
function call, or computed value) and that the command will reject non-field
expressions; mention this constraint alongside the existing bullets about ARRAY
type, joining behavior, NULL filtering, and error-on-missing-field, and
reference Calcite's ARRAY_JOIN implementation so readers know the underlying
behavior.
In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteNoMvCommandIT.java`:
- Around line 45-178: Several test methods use underscore-based names; rename
each to follow the test<Functionality><Condition> pattern (camelCase starting
with "test"). Specifically rename: testNoMv_basicUsage_withArrayLiterals ->
testNoMvBasicUsageWithArrayLiterals, testNoMv_withArrayFromFields ->
testNoMvWithArrayFromFields, testNoMv_multipleArrays_appliedInSequence ->
testNoMvMultipleArraysAppliedInSequence,
testNoMv_inComplexPipeline_withWhereAndSort ->
testNoMvInComplexPipelineWithWhereAndSort,
testNoMv_fieldUsableInSubsequentOperations ->
testNoMvFieldUsableInSubsequentOperations, testNoMv_withStats_afterAggregation
-> testNoMvWithStatsAfterAggregation, testNoMv_withEval_worksOnComputedArrays ->
testNoMvWithEvalWorksOnComputedArrays, and testNoMv_preservesFieldInPlace ->
testNoMvPreservesFieldInPlace; update each method declaration accordingly and
ensure any references (if any) to these method names in test runners or
annotations are updated.
- Around line 210-222: Update the testNoMv_arrayWithNullValues method in
CalciteNoMvCommandIT to actually include a null element in the array passed to
eval (e.g., array('first', null, 'third')) and adjust the expected schema/data
assertion in verifySchema/verifyDataRows to reflect the null element;
additionally add a new test (e.g., testNoMv_nullFieldValue) that creates a
record with a field whose value is null and runs the same nomv/head/fields flow
to assert schema shows nullable type and verifyDataRows contains an explicit
null row entry; locate these changes around the existing
testNoMv_arrayWithNullValues, using executeQuery, verifySchema, and
verifyDataRows to update expectations.
In `@ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java`:
- Around line 899-903: The visitNomvCommand implementation currently casts the
result of internalVisitExpression(ctx.fieldExpression()) to Field, which can
cause a ClassCastException; instead, call
internalVisitExpression(ctx.fieldExpression()), check that the returned
Expression is an instance of Field, and if not throw a SemanticCheckException
with a “non-direct reference” (4xx) message; if it is a Field, pass it to new
NoMv(field) as before. Ensure you reference the visitNomvCommand method,
internalVisitExpression, the Field type, NoMv constructor, and
SemanticCheckException in your change so the validation is explicit and fails
fast.
In `@ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLNoMvTest.java`:
- Around line 199-265: Add unit tests verifying NULL handling for the
nomv/ARRAY_JOIN behavior: implement testNoMvWithNullElements and
testNoMvWithAllNullElements (similar to existing testNoMv* methods) that call
getRelNode(...) and assert plans with verifyLogical(...) and SQL with
verifyPPLToSparkSQL(...). For testNoMvWithNullElements, use ppl "source=EMP |
eval arr = array('a', null, 'b') | nomv arr | head 1 | fields arr" and assert
that ARRAY_JOIN ignores null elements (no extra delimiters) in both the logical
plan and generated Spark SQL; for testNoMvWithAllNullElements use "source=EMP |
eval arr = array(null, null) | nomv arr | head 1 | fields arr" and assert
ARRAY_JOIN produces an empty string; also add a test (or extend one) to confirm
that a null field value (arr = null) remains null in the logical plan/SQL. Use
getRelNode, verifyLogical, verifyPPLToSparkSQL and reuse expected formatting
from adjacent tests to build the expected strings.
🧹 Nitpick comments (3)
core/src/main/java/org/opensearch/sql/ast/tree/NoMv.java (1)
33-40: Consider adding JavaDoc for public methods.Per coding guidelines for
core/src/main/java/**/*.java, public methods should have JavaDoc with@param,@return, and@throws. The constructor andattach()method would benefit from brief documentation.📝 Suggested JavaDoc additions
+ /** + * Constructs a NoMv node for the given field. + * + * `@param` field the multivalue field to convert to a single-value string + */ public NoMv(Field field) { this.field = field; } + /** + * Attaches a child plan to this node. + * + * `@param` child the child plan to attach + * `@return` this NoMv node for method chaining + */ public NoMv attach(UnresolvedPlan child) { this.child = child; return this; }integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)
2722-2746: Tighten the assertions to avoid false positives.The substring checks can pass even if the plan regresses elsewhere. Consider comparing against dedicated expected-plan files (or assert key plan structure + projection) to make the test deterministic. As per coding guidelines, “Check test assertions are meaningful and specific.”
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLNoMvTest.java (1)
116-126: Good error condition test, but consider adding a non-array target test.This test correctly validates the error path for a non-existent field. However, per the PR objectives,
nomvshould also fail when applied to a non-array field. Consider adding a test like:`@Test` public void testNoMvNonArrayField() { // ENAME is a scalar VARCHAR, not an array String ppl = "source=EMP | nomv ENAME | head 1"; Exception ex = assertThrows(Exception.class, () -> getRelNode(ppl)); String msg = String.valueOf(ex.getMessage()); org.junit.Assert.assertTrue( "Expected error message to indicate non-array type. Actual: " + msg, msg.toLowerCase().contains("array") || msg.toLowerCase().contains("type")); }
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java
Outdated
Show resolved
Hide resolved
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteNoMvCommandIT.java
Show resolved
Hide resolved
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteNoMvCommandIT.java
Show resolved
Hide resolved
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLNoMvTest.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteNoMvCommandIT.java`:
- Around line 129-145: The test testNoMvWithStatsAfterAggregation currently only
asserts datarows length > 0; change it to assert the result contains exactly 2
rows (since the query uses head 2) and for each returned row validate the "arr"
field contains the literal "count" and also contains the age value as a string
(extract rows from result.getJSONArray("datarows") and for each row check
row.getString(2) contains "count" and String.valueOf(row.getInt(1)) or the
age_str is present), keeping existing schema checks; update assertions in the
testNoMvWithStatsAfterAggregation method accordingly to perform these precise
checks.
- Around line 271-284: The test testNoMvResultUsedInComparison currently only
asserts result.getJSONArray("datarows").length() > 0 which is too weak; update
it to assert a specific row count (e.g., equals 1 since the query uses head 1)
and assert the returned arr value is "test" by extracting the first datarow and
checking the appropriate column (fields order: account_number, arr so verify
datarows[0][1] == "test"), and optionally assert account_number is non-null or
of expected type; keep using executeQuery and result for these checks.
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteNoMvCommandIT.java
Show resolved
Hide resolved
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteNoMvCommandIT.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLNoMvTest.java`:
- Around line 45-69: Add a new multi-document test alongside testNoMvBasic that
verifies per-row behavior with more than one document: create a method (e.g.,
testNoMvMultiDocument) using the same PPL but with "head 2" instead of "head 1",
call getRelNode(ppl) and assert the logical plan shows LogicalSort(fetch=[2])
and the LogicalProject/LogicalTableScan remain correct using verifyLogical(root,
expectedLogical), and assert the generated SQL uses "LIMIT 2" in
verifyPPLToSparkSQL(root, expectedSparkSql); update the expectedLogical string
to replace fetch=[1]→fetch=[2] and the expectedSparkSql to use LIMIT 2 while
keeping the same ARRAY_JOIN expression so verifyLogical and verifyPPLToSparkSQL
validate multi-row behavior.
- Around line 150-160: Add two new unit tests in CalcitePPLNoMvTest: one (e.g.,
testNoMvNonArrayField) that runs getRelNode on a PPL query where nomv targets a
non-array field (for example after eval x = 1 | nomv x) and asserts an exception
is thrown whose message mentions "array" or "not an array"; and another (e.g.,
testNoMvNonDirectReference) that runs getRelNode on a PPL query where nomv
targets a non-direct reference (for example nomv on an expression like nomv
concat('a','b') or a computed field reference) and asserts an exception is
thrown whose message mentions "direct reference" or similar; use the existing
pattern (assertThrows, capture ex.getMessage(), and assertTrue checks) and
reference getRelNode and the nomv usage to locate where to add these tests.
- Around line 239-255: The test testNoMvEmptyArray hardcodes "\n" in
expectedSparkSql which should use the platform-agnostic LS constant; update the
expectedSparkSql string to replace the literal "\n" with the class-level LS
constant (used by verifyPPLToSparkSQL) so the assertion is consistent across
platforms and matches other tests that use LS.
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLNoMvTest.java
Show resolved
Hide resolved
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLNoMvTest.java
Show resolved
Hide resolved
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLNoMvTest.java
Show resolved
Hide resolved
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteNoMvCommandIT.java
Outdated
Show resolved
Hide resolved
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteNoMvCommandIT.java
Show resolved
Hide resolved
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteNoMvCommandIT.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> # Conflicts: # core/src/main/java/org/opensearch/sql/ast/analysis/FieldResolutionVisitor.java # integ-test/src/test/java/org/opensearch/sql/security/CalciteCrossClusterSearchIT.java # ppl/src/main/antlr/OpenSearchPPLParser.g4 # ppl/src/test/java/org/opensearch/sql/ppl/parser/FieldResolutionVisitorTest.java
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
…\n' delimiter Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
…\n' delimiter Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
…\n' delimiter Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
e71ca08 to
8632bc0
Compare
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Description
Converts values of the specified multivalue field into one single value. Separates the values using a new line "\n delimiter.
Related Issues
Resolves #5082
#5082
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.