Skip to content

Add nomv command #5130

Open
srikanthpadakanti wants to merge 7 commits intoopensearch-project:mainfrom
srikanthpadakanti:feature/nomv
Open

Add nomv command #5130
srikanthpadakanti wants to merge 7 commits intoopensearch-project:mainfrom
srikanthpadakanti:feature/nomv

Conversation

@srikanthpadakanti
Copy link
Contributor

Description

Converts values of the specified multivalue field into one single value. Separates the values using a new line "\n delimiter.

Related Issues

Resolves #5082
#5082

Check List

  • [ X ] New functionality includes testing.
  • [ X ] New functionality has been documented.
  • [ X ] New functionality has javadoc added.
  • [ X ] New functionality has a user manual doc added.
  • [ X ] New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • [ X ] Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 10, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new PPL command nomv: lexer/parser rule, AST node NoMv, parser builder, visitors across analysis and Calcite (including rewrite-to-Eval), anonymizer support, documentation, and extensive unit/integration tests with explain-plan expectations.

Changes

Cohort / File(s) Summary
Grammar & Lexer
ppl/src/main/antlr/OpenSearchPPLLexer.g4, ppl/src/main/antlr/OpenSearchPPLParser.g4
Add NOMV token and nomvCommand rule; include nomvCommand in commands and commandName alternatives.
AST Node
core/src/main/java/org/opensearch/sql/ast/tree/NoMv.java
New NoMv AST node holding a Field and optional child; supports attach/getChild, visitor accept, and rewriteAsEval() that produces an Eval representing ARRAY_JOIN/mvjoin semantics.
Parser & Builder
ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
Add visitor to construct NoMv from parsed nomv command (maps fieldExpressionNoMv).
Core Visitors / Analysis
core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java, core/src/main/java/org/opensearch/sql/analysis/Analyzer.java, core/src/main/java/org/opensearch/sql/ast/analysis/FieldResolutionVisitor.java
Add visitNoMv implementations: generic passthrough in AbstractNodeVisitor, Analyzer delegating to Calcite-exception mechanism, and FieldResolutionVisitor updating required fields from the NoMv field.
Calcite Integration
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Add visitNoMv that rewrites NoMv to Eval (rewriteAsEval()) and delegates to existing Eval handling to produce ARRAY_JOIN in plans.
PPL Anonymization
ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java
Add visitNoMv to anonymize nomv command segments (mirrors mvcombine handling).
Docs & TOC
docs/user/ppl/cmd/nomv.md, docs/user/ppl/index.md, docs/category.json
Add nomv documentation (syntax, semantics, examples, error cases) and register the page in docs/category.json and the PPL commands index.
Unit Tests (PPL / Parser / Anonymizer)
ppl/src/test/java/.../CalcitePPLNoMvTest.java, ppl/src/test/java/.../FieldResolutionVisitorTest.java, ppl/src/test/java/.../PPLQueryDataAnonymizerTest.java
Add comprehensive unit tests covering logical plan generation, field resolution effects, anonymization, edge cases, and error scenarios for nomv.
Integration Tests & Resources
integ-test/src/test/java/.../CalciteNoMvCommandIT.java, integ-test/src/test/java/.../CalciteExplainIT.java, integ-test/src/test/java/.../CalciteCrossClusterSearchIT.java, integ-test/src/test/java/.../NewAddedCommandsIT.java, integ-test/src/test/resources/.../explain_nomv.yaml, integ-test/src/test/resources/.../calcite_no_pushdown/explain_nomv.yaml
Add new integration test class and test cases for nomv, include them in test suites, and add expected explain-plan YAMLs for pushdown and no-pushdown cases.

Sequence Diagram(s)

sequenceDiagram
  participant User as User
  participant Parser as Parser / AstBuilder
  participant AST as NoMv AST
  participant Analyzer as Analyzer / FieldResolution
  participant Calcite as CalciteRelNodeVisitor
  participant Exec as Execution

  User->>Parser: submit "nomv <field>" query
  Parser->>AST: build NoMv(field)
  AST->>Analyzer: accept(visitor) -> visitNoMv
  Analyzer->>Analyzer: collect required fields (FieldResolutionVisitor)
  Analyzer->>Calcite: rewrite NoMv -> Eval (rewriteAsEval) and request RelNode
  Calcite->>Exec: produce plan with ARRAY_JOIN (via Eval translation)
  Exec->>User: return explain or execute results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

PPL, feature, calcite, backport 2.19-dev

Suggested reviewers

  • ps48
  • kavithacm
  • derek-ho
  • joshuali925
  • anirudha
  • LantaoJin
  • ykmr1224
  • penghuo
🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.27% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add nomv command' clearly and directly describes the main change in the PR, which is the addition of a new nomv command to the PPL language.
Description check ✅ Passed The description accurately explains that the PR adds a nomv command for converting multivalue field values into a single string using newline delimiters, and directly relates to the implementation shown in the changeset.
Linked Issues check ✅ Passed The PR implementation fully meets the requirements from issue #5082: adds nomv command with correct syntax, field validation, row-local transformation semantics, null/empty array handling, proper error conditions, and comprehensive tests and documentation.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing the nomv command as specified in issue #5082. Grammar, AST, parser, analysis, execution, documentation, and test additions are all within scope.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🤖 Fix all issues with AI agents
In `@core/src/main/java/org/opensearch/sql/analysis/Analyzer.java`:
- Around line 545-548: Add JavaDoc to the public visitor method visitNoMv(NoMv
node, AnalysisContext context) in Analyzer: document the parameters (`@param` node
description, `@param` context description), the return value (`@return` LogicalPlan
describing what is returned), and the exception thrown (`@throws`
SomeExceptionType - e.g., the exception produced by
getOnlyForCalciteException("nomv")) so the method signature visitNoMv(NoMv,
AnalysisContext) and its throw behavior are covered by the JavaDoc. Ensure the
JavaDoc is placed immediately above the method declaration and uses proper
JavaDoc tags `@param`, `@return`, and `@throws`.

In `@core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java`:
- Around line 476-478: Add a JavaDoc comment to the new public visitor method
visitNoMv(NoMv node, C context) in AbstractNodeVisitor describing its purpose
and including `@param` for "node" and "context", `@return` describing the returned
T, and `@throws` for any runtime exceptions it may propagate (e.g.,
RuntimeException or specific exceptions thrown by visitChildren), ensuring the
JavaDoc follows the project's standard wording and tags.

In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`:
- Around line 3343-3346: Add JavaDoc to the public visitor method visitNoMv in
class CalciteRelNodeVisitor: document the method purpose, include `@param` entries
for the NoMv node and CalcitePlanContext context, an `@return` describing the
returned RelNode, and an `@throws` describing any runtime exceptions it may
propagate (e.g., if visitEval((Eval) node.rewriteAsEval(), context) can throw).
Reference the involved symbols in the comment (NoMv, Eval, visitEval,
CalcitePlanContext, RelNode) so readers know what is being described.

In
`@core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java`:
- Line 81: Remove the unused NOMV enum constant from the BuiltinFunctionName
enum: locate the BuiltinFunctionName declaration and delete the line defining
NOMV(FunctionName.of("nomv")), ensuring enum punctuation remains valid (adjust
trailing commas/semicolon as needed); no other code changes are required because
NoMv is rewritten to Eval(mvjoin(field, "\n")) before function resolution and
NOMV is not referenced elsewhere.

In `@docs/user/ppl/cmd/nomv.md`:
- Around line 53-61: The heading "Example 2: nomv with multiple fields" is
misleading because the nomv command accepts a single field; update the heading
to something like "Example 2: nomv with a single field" or "Example 2: nomv
usage" so it doesn't imply multi-field support, and ensure the example under the
nomv command still shows a single field (location) to match the heading;
reference the nomv command and the location field in the example when making the
change.
- Around line 11-15: Update the nomv.md documentation to explicitly state that
the target argument must be a direct field reference (not an expression,
function call, or computed value) and that the command will reject non-field
expressions; mention this constraint alongside the existing bullets about ARRAY
type, joining behavior, NULL filtering, and error-on-missing-field, and
reference Calcite's ARRAY_JOIN implementation so readers know the underlying
behavior.

In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteNoMvCommandIT.java`:
- Around line 45-178: Several test methods use underscore-based names; rename
each to follow the test<Functionality><Condition> pattern (camelCase starting
with "test"). Specifically rename: testNoMv_basicUsage_withArrayLiterals ->
testNoMvBasicUsageWithArrayLiterals, testNoMv_withArrayFromFields ->
testNoMvWithArrayFromFields, testNoMv_multipleArrays_appliedInSequence ->
testNoMvMultipleArraysAppliedInSequence,
testNoMv_inComplexPipeline_withWhereAndSort ->
testNoMvInComplexPipelineWithWhereAndSort,
testNoMv_fieldUsableInSubsequentOperations ->
testNoMvFieldUsableInSubsequentOperations, testNoMv_withStats_afterAggregation
-> testNoMvWithStatsAfterAggregation, testNoMv_withEval_worksOnComputedArrays ->
testNoMvWithEvalWorksOnComputedArrays, and testNoMv_preservesFieldInPlace ->
testNoMvPreservesFieldInPlace; update each method declaration accordingly and
ensure any references (if any) to these method names in test runners or
annotations are updated.
- Around line 210-222: Update the testNoMv_arrayWithNullValues method in
CalciteNoMvCommandIT to actually include a null element in the array passed to
eval (e.g., array('first', null, 'third')) and adjust the expected schema/data
assertion in verifySchema/verifyDataRows to reflect the null element;
additionally add a new test (e.g., testNoMv_nullFieldValue) that creates a
record with a field whose value is null and runs the same nomv/head/fields flow
to assert schema shows nullable type and verifyDataRows contains an explicit
null row entry; locate these changes around the existing
testNoMv_arrayWithNullValues, using executeQuery, verifySchema, and
verifyDataRows to update expectations.

In `@ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java`:
- Around line 899-903: The visitNomvCommand implementation currently casts the
result of internalVisitExpression(ctx.fieldExpression()) to Field, which can
cause a ClassCastException; instead, call
internalVisitExpression(ctx.fieldExpression()), check that the returned
Expression is an instance of Field, and if not throw a SemanticCheckException
with a “non-direct reference” (4xx) message; if it is a Field, pass it to new
NoMv(field) as before. Ensure you reference the visitNomvCommand method,
internalVisitExpression, the Field type, NoMv constructor, and
SemanticCheckException in your change so the validation is explicit and fails
fast.

In `@ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLNoMvTest.java`:
- Around line 199-265: Add unit tests verifying NULL handling for the
nomv/ARRAY_JOIN behavior: implement testNoMvWithNullElements and
testNoMvWithAllNullElements (similar to existing testNoMv* methods) that call
getRelNode(...) and assert plans with verifyLogical(...) and SQL with
verifyPPLToSparkSQL(...). For testNoMvWithNullElements, use ppl "source=EMP |
eval arr = array('a', null, 'b') | nomv arr | head 1 | fields arr" and assert
that ARRAY_JOIN ignores null elements (no extra delimiters) in both the logical
plan and generated Spark SQL; for testNoMvWithAllNullElements use "source=EMP |
eval arr = array(null, null) | nomv arr | head 1 | fields arr" and assert
ARRAY_JOIN produces an empty string; also add a test (or extend one) to confirm
that a null field value (arr = null) remains null in the logical plan/SQL. Use
getRelNode, verifyLogical, verifyPPLToSparkSQL and reuse expected formatting
from adjacent tests to build the expected strings.
🧹 Nitpick comments (3)
core/src/main/java/org/opensearch/sql/ast/tree/NoMv.java (1)

33-40: Consider adding JavaDoc for public methods.

Per coding guidelines for core/src/main/java/**/*.java, public methods should have JavaDoc with @param, @return, and @throws. The constructor and attach() method would benefit from brief documentation.

📝 Suggested JavaDoc additions
+  /**
+   * Constructs a NoMv node for the given field.
+   *
+   * `@param` field the multivalue field to convert to a single-value string
+   */
   public NoMv(Field field) {
     this.field = field;
   }

+  /**
+   * Attaches a child plan to this node.
+   *
+   * `@param` child the child plan to attach
+   * `@return` this NoMv node for method chaining
+   */
   public NoMv attach(UnresolvedPlan child) {
     this.child = child;
     return this;
   }
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)

2722-2746: Tighten the assertions to avoid false positives.

The substring checks can pass even if the plan regresses elsewhere. Consider comparing against dedicated expected-plan files (or assert key plan structure + projection) to make the test deterministic. As per coding guidelines, “Check test assertions are meaningful and specific.”

ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLNoMvTest.java (1)

116-126: Good error condition test, but consider adding a non-array target test.

This test correctly validates the error path for a non-existent field. However, per the PR objectives, nomv should also fail when applied to a non-array field. Consider adding a test like:

`@Test`
public void testNoMvNonArrayField() {
  // ENAME is a scalar VARCHAR, not an array
  String ppl = "source=EMP | nomv ENAME | head 1";
  
  Exception ex = assertThrows(Exception.class, () -> getRelNode(ppl));
  
  String msg = String.valueOf(ex.getMessage());
  org.junit.Assert.assertTrue(
      "Expected error message to indicate non-array type. Actual: " + msg,
      msg.toLowerCase().contains("array") || msg.toLowerCase().contains("type"));
}

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteNoMvCommandIT.java`:
- Around line 129-145: The test testNoMvWithStatsAfterAggregation currently only
asserts datarows length > 0; change it to assert the result contains exactly 2
rows (since the query uses head 2) and for each returned row validate the "arr"
field contains the literal "count" and also contains the age value as a string
(extract rows from result.getJSONArray("datarows") and for each row check
row.getString(2) contains "count" and String.valueOf(row.getInt(1)) or the
age_str is present), keeping existing schema checks; update assertions in the
testNoMvWithStatsAfterAggregation method accordingly to perform these precise
checks.
- Around line 271-284: The test testNoMvResultUsedInComparison currently only
asserts result.getJSONArray("datarows").length() > 0 which is too weak; update
it to assert a specific row count (e.g., equals 1 since the query uses head 1)
and assert the returned arr value is "test" by extracting the first datarow and
checking the appropriate column (fields order: account_number, arr so verify
datarows[0][1] == "test"), and optionally assert account_number is non-null or
of expected type; keep using executeQuery and result for these checks.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLNoMvTest.java`:
- Around line 45-69: Add a new multi-document test alongside testNoMvBasic that
verifies per-row behavior with more than one document: create a method (e.g.,
testNoMvMultiDocument) using the same PPL but with "head 2" instead of "head 1",
call getRelNode(ppl) and assert the logical plan shows LogicalSort(fetch=[2])
and the LogicalProject/LogicalTableScan remain correct using verifyLogical(root,
expectedLogical), and assert the generated SQL uses "LIMIT 2" in
verifyPPLToSparkSQL(root, expectedSparkSql); update the expectedLogical string
to replace fetch=[1]→fetch=[2] and the expectedSparkSql to use LIMIT 2 while
keeping the same ARRAY_JOIN expression so verifyLogical and verifyPPLToSparkSQL
validate multi-row behavior.
- Around line 150-160: Add two new unit tests in CalcitePPLNoMvTest: one (e.g.,
testNoMvNonArrayField) that runs getRelNode on a PPL query where nomv targets a
non-array field (for example after eval x = 1 | nomv x) and asserts an exception
is thrown whose message mentions "array" or "not an array"; and another (e.g.,
testNoMvNonDirectReference) that runs getRelNode on a PPL query where nomv
targets a non-direct reference (for example nomv on an expression like nomv
concat('a','b') or a computed field reference) and asserts an exception is
thrown whose message mentions "direct reference" or similar; use the existing
pattern (assertThrows, capture ex.getMessage(), and assertTrue checks) and
reference getRelNode and the nomv usage to locate where to add these tests.
- Around line 239-255: The test testNoMvEmptyArray hardcodes "\n" in
expectedSparkSql which should use the platform-agnostic LS constant; update the
expectedSparkSql string to replace the literal "\n" with the class-level LS
constant (used by verifyPPLToSparkSQL) so the assertion is consistent across
platforms and matches other tests that use LS.

@mengweieric mengweieric added enhancement New feature or request PPL Piped processing language backport 2.19-dev labels Feb 10, 2026
Srikanth Padakanti added 6 commits February 12, 2026 21:47
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>

# Conflicts:
#	core/src/main/java/org/opensearch/sql/ast/analysis/FieldResolutionVisitor.java
#	integ-test/src/test/java/org/opensearch/sql/security/CalciteCrossClusterSearchIT.java
#	ppl/src/main/antlr/OpenSearchPPLParser.g4
#	ppl/src/test/java/org/opensearch/sql/ppl/parser/FieldResolutionVisitorTest.java
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
…\n' delimiter

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
…\n' delimiter

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
…\n' delimiter

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19-dev enhancement New feature or request PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC] Add nomv command in PPL

3 participants