[Feature] Support bi-directional graph traversal command `graphlookup` by LantaoJin · Pull Request #5138 · opensearch-project/sql

LantaoJin · 2026-02-11T12:34:18Z

Description

The graphlookup command performs recursive graph traversal on OpenSearch indices using a breadth-first search (BFS) algorithm.

┌───────────────────────────────────────────────────────────────┐
│              CalciteEnumerableGraphLookup                     │
│  ┌─────────────┐     ┌──────────────────────────────────────┐ │
│  │   Source    │────▶│         BFS Enumerator               │ │
│  │  Scannable  │     │  ┌────────────────────────────────┐  │ │
│  └─────────────┘     │  │  For each source row:          │  │ │
│                      │  │  1. Extract startField value   │  │ │
│  ┌─────────────┐     │  │  2. Query: toField = value     │  │ │
│  │   Lookup    │◀────│  │  3. Collect fromField values   │  │ │
│  │ IndexScan   │     │  │  4. Repeat until maxDepth      │  │ │
│  └─────────────┘     │  └────────────────────────────────┘  │ │
│                      └──────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────────┘

Syntax

Option 1

graphLookup <lookupIndex> startField=<field> fromField=<field> toField=<field> [maxDepth=<maxDepth>] [depthField=<field>] [direction=(uni | bi)] [supportArray=(true | false)] [batchMode=(true | false)] [usePIT=(true | false)] [filter=(<condition>)] as <outputField>

Example Query

  source = employees
    | graphLookup employees
      startField=reportsTo
      fromField=reportsTo
      toField=name
      as reportingHierarchy

Option 2 (follow MongoDB naming convention)

graphLookup <lookupIndex> startWith=<field> connectFromField=<field> connectToField=<field> [maxDepth=<maxDepth>] [depthField=<field>] [direction=(uni | bi)] [supportArray=(true | false)] [batchMode=(true | false)] [usePIT=(true | false)] [filter=(<condition>)] as <outputField>

Example Query

  source = employees
    | graphLookup employees
      startWith=reportsTo
      connectFromField=reportsTo
      connectToField=name
      as reportingHierarchy

Performance

OpenSearch graphlookup command has implemented several features, which make it more applicable on RAG scenarios as well as suitable for different requirements.

Using params of direction=uni and usePIT=true, OpenSearch graphlookup achieves the same result as MongoDB's $graphlookup aggregation command with 5-6x faster on SNB SF-1 dataset, 29x faster on SNB SF-10. And OpenSearch can finish all benchmark on SNB SF-10 in 734ms, SNB SF-30 in 8000ms while MongoDB fails to complete.
In multiple start value scenarios, OpenSearch supports batchMode, which is more compatible with the requirements of RAG. Apart from already deduped on the final results, it also gets 58x faster than MongoDB on SNB SF-1 dataset. And also OpenSearch can finish multiple start value benchmark on SNB SF-10 in 3508ms, on SNB SF-30 in 10992ms while MongoDB fails to complete.
OpenSearch supports both uni-direction and bi-direction traversal. The latter will cost 4x on query time than the former but also can retrieve 4x more vertices.
OpenSearch also support PIT and non-PIT mode, the latter will get better performance on large dataset by sacrificing the accuracy. It gets 5x faster than the using PIT mode on SNB SF-30 but getting 27% nodes missed.

Related Issues

Resolves #5088

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
New PPL command checklist all confirmed.
API changes companion pull request created.
Commits are signed per the DCO using --signoff or -s.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

* succeed to graph lookup single index Signed-off-by: Lantao Jin <ltjin@amazon.com> * Implement graph lookup RelNode Signed-off-by: Heng Qian <qianheng@amazon.com> * Refine - remove depth from BFS node Signed-off-by: Heng Qian <qianheng@amazon.com> * Support bidirectional mode Signed-off-by: Heng Qian <qianheng@amazon.com> * Support anonymize graph lookup Signed-off-by: Heng Qian <qianheng@amazon.com> * Fix UT Signed-off-by: Heng Qian <qianheng@amazon.com> * Add IT Signed-off-by: Heng Qian <qianheng@amazon.com> * Add limitation for GraphLookup Signed-off-by: Heng Qian <qianheng@amazon.com> * Simplify GraphLookup param names Signed-off-by: Heng Qian <qianheng@amazon.com> * Refine IT Signed-off-by: Heng Qian <qianheng@amazon.com> * Support value of list; Support retrieve circle edges also Signed-off-by: Heng Qian <qianheng@amazon.com> * Add documentation for graph lookup Signed-off-by: Heng Qian <qianheng@amazon.com> * Don't include loop edges Signed-off-by: Heng Qian <qianheng@amazon.com> * Refine code Signed-off-by: Heng Qian <qianheng@amazon.com> * spotlessApply Signed-off-by: Heng Qian <qianheng@amazon.com> * Refine code Signed-off-by: Heng Qian <qianheng@amazon.com> * Filter visited nodes in search query Signed-off-by: Heng Qian <qianheng@amazon.com> * Fix UT Signed-off-by: Heng Qian <qianheng@amazon.com> * Add parameter supportArray for handling fields with array values Signed-off-by: Heng Qian <qianheng@amazon.com> * Remove unused code Signed-off-by: Heng Qian <qianheng@amazon.com> * Support batch mode Signed-off-by: Heng Qian <qianheng@amazon.com> * Close lookup table scan Signed-off-by: Heng Qian <qianheng@amazon.com> * refine code Signed-off-by: Heng Qian <qianheng@amazon.com> * Add param usePIT Signed-off-by: Heng Qian <qianheng@amazon.com> --------- Signed-off-by: Lantao Jin <ltjin@amazon.com> Signed-off-by: Heng Qian <qianheng@amazon.com> Co-authored-by: Lantao Jin <ltjin@amazon.com>

* Struct return array value instead of string Signed-off-by: Heng Qian <qianheng@amazon.com> * Support filter in GraphLookup Signed-off-by: Heng Qian <qianheng@amazon.com> * Fix IT Signed-off-by: Heng Qian <qianheng@amazon.com> * Add experimental tag in doc Signed-off-by: Heng Qian <qianheng@amazon.com> --------- Signed-off-by: Heng Qian <qianheng@amazon.com>

coderabbitai · 2026-02-11T12:34:31Z

📝 Walkthrough

Summary by CodeRabbit

New Features
- Added experimental graphlookup command for graph traversal operations in PPL, enabling BFS-based lookups with configurable start fields, direction (unidirectional/bidirectional), and depth control.
- Support for batch mode aggregation, optional depth field tracking, filters, and PIT-based search optimization.
Documentation
- Added comprehensive user guide for the graphlookup command with syntax, parameters, examples, and usage scenarios.
Tests
- Added extensive integration tests validating graph traversal across employee hierarchies, airport connections, and multi-index scenarios.

Walkthrough

This PR introduces a new graphLookup command for PPL to support multi-hop graph traversal queries. The implementation spans grammar rules, AST nodes, Calcite relational operators, execution engines, and comprehensive integration tests with sample graph data (employees, airports, travelers indices).

Changes

Cohort / File(s)	Summary
Grammar & Lexer `ppl/src/main/antlr/OpenSearchPPLLexer.g4`, `ppl/src/main/antlr/OpenSearchPPLParser.g4`	Added 12 new lexer tokens (GRAPHLOOKUP, START_FIELD, FROM_FIELD, etc.) and parser rules for graphLookupCommand and graphLookupOption to recognize graph lookup syntax with configurable parameters.
AST & Parser `core/src/main/java/org/opensearch/sql/ast/tree/GraphLookup.java`, `core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java`, `ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java`	New GraphLookup AST node with Direction enum, field configuration, and visitor pattern support. AstBuilder parses graphLookup command options and validates required parameters (fromField, toField).
Analysis `core/src/main/java/org/opensearch/sql/analysis/Analyzer.java`	Added visitGraphLookup override to delegate to Calcite-specific exception path, marking GraphLookup as Calcite-only.
Calcite Relational Nodes `core/src/main/java/org/opensearch/sql/calcite/plan/rel/GraphLookup.java`, `core/src/main/java/org/opensearch/sql/calcite/plan/rel/LogicalGraphLookup.java`	Abstract GraphLookup extends BiRel with row type derivation, cost estimation, and explain terms. LogicalGraphLookup provides concrete implementation with factory method and copy support for plan transformations.
Calcite Planning `core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`, `opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableGraphLookupRule.java`, `opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java`	CalciteRelNodeVisitor converts AST GraphLookup to LogicalGraphLookup. EnumerableGraphLookupRule converts LogicalGraphLookup to CalciteEnumerableGraphLookup with OpenSearchIndex extraction. Added GRAPH_LOOKUP_RULE to non-pushdown rules.
Execution Engine `opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java`	Comprehensive BFS-based graph traversal implementation (~545 lines) supporting batch mode, depth limiting, bidirectional traversal, filter push-down, PIT-based limiting, and array field handling via Enumerable framework.
Supporting Infrastructure `opensearch/src/main/java/org/opensearch/sql/opensearch/storage/OpenSearchIndex.java`, `opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchExecutionEngine.java`, `opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java`, `core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java`	Exposed getIndexName() getter in OpenSearchIndex. Updated OpenSearchExecutionEngine to handle StructImpl in processValue. Increased visibility of PredicateAnalyzer.QueryExpression.create and LiteralExpression. Updated MULTISET type mapping to STRING.
PPL Utilities `ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java`	Added visitGraphLookup to anonymize graphLookup queries with masked table names and expression filtering.
Test Data & Infrastructure `integ-test/src/test/resources/graph_.json`, `integ-test/src/test/resources/indexDefinitions/graph__index_mapping.json`, `integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java`, `integ-test/src/test/java/org/opensearch/sql/legacy/TestUtils.java`, `integ-test/src/test/java/org/opensearch/sql/legacy/TestsConstants.java`	Added three graph test indices (GRAPH_EMPLOYEES, GRAPH_TRAVELERS, GRAPH_AIRPORTS) with mappings and sample data in JSON Lines format. Extended test infrastructure with index constants and helper methods.
Integration Tests `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLGraphLookupIT.java`	Comprehensive integration test suite with 23+ test cases covering employee hierarchies, airport connectivity, bidirectional traversal, depth tracking, filtering, batch mode, stats aggregation, and field projection.
Unit Tests `ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLGraphLookupTest.java`, `ppl/src/test/java/org/opensearch/sql/ppl/parser/AstBuilderTest.java`, `ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java`	Calcite PPL unit tests validating graphLookup with depthField, maxDepth, filter, and bidirectional options. AstBuilder tests verify parsing and validation. Anonymizer tests cover various parameter combinations.
Documentation `docs/user/ppl/cmd/graphlookup.md`, `docs/user/ppl/index.md`	New user-facing documentation for graphLookup command (357 lines) with syntax, parameters, examples (employee hierarchy, airport connectivity, cross-index lookups, batch mode), and limitations sections.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

calcite, enhancement

Suggested reviewers

penghuo
anirudha
ps48
kavithacm
derek-ho
joshuali925
Swiddis
GumpacG
dai-chen

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 41.18% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main feature being added: a bi-directional graph traversal command called graphlookup, which is accurately reflected in the extensive code changes implementing this command across the codebase.
Description check	✅ Passed	The description provides comprehensive details about the graphlookup command, including syntax, performance metrics, usage examples, and architecture diagrams, all of which directly relate to the code changes implementing this feature.
Linked Issues check	✅ Passed	The PR successfully addresses the requirements from linked issue `#5088`: implements a graphlookup command semantically equivalent to MongoDB's $graphLookup, supports multi-hop queries via BFS enumerator, includes both uni- and bi-directional traversal, and enables GraphRAG use cases.
Out of Scope Changes check	✅ Passed	All changes are directly related to implementing the graphlookup command. Minor changes to OpenSearchIndex, PredicateAnalyzer, and OpenSearchTypeFactory are supporting modifications necessary for graphlookup functionality. No unrelated or out-of-scope changes detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/graphlookup

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Lantao Jin <ltjin@amazon.com>

coderabbitai

Actionable comments posted: 11

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchExecutionEngine.java (1)
14-23: ⚠️ Potential issue | 🔴 Critical

StructImpl attributes should be recursively converted to handle nested structures.

StructImpl may contain nested maps, lists, or geo points. The current implementation at line 250 returns raw attributes without passing them through processValue, unlike the recursive handling for Maps (line 245) and Lists (line 256). This can leak unconverted types.

Replace the current logic with iteration and recursive conversion:
🛠️ Proposed fix
    if (value instanceof StructImpl) {
-     return Arrays.asList(((StructImpl) value).getAttributes());
+     Object[] attributes = ((StructImpl) value).getAttributes();
+     List<Object> converted = new ArrayList<>(attributes.length);
+     for (Object attribute : attributes) {
+       converted.add(processValue(attribute));
+     }
+     return converted;
    }
Note: ArrayList is already imported; only the logic needs updating.

🤖 Fix all issues with AI agents

In `@core/src/main/java/org/opensearch/sql/analysis/Analyzer.java`:
- Around line 550-553: Add missing JavaDoc to the public method
Analyzer.visitGraphLookup(GraphLookup node, AnalysisContext context) documenting
`@param` node, `@param` context, `@return` (LogicalPlan), and `@throws` for the
Calcite-only exception thrown by getOnlyForCalciteException("graphlookup"); then
add a unit test in the AnalyzerTest (or equivalent test class) that calls
visitGraphLookup with a mocked/constructed GraphLookup and AnalysisContext and
asserts that the getOnlyForCalciteException is thrown (verify exception type and
message contains "graphlookup"); ensure the test and JavaDoc are committed
together.

In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`:
- Around line 2534-2536: The hard-coded builder.limit(0, 100) in
CalciteRelNodeVisitor (using context.relBuilder / builder) unconditionally
truncates graphLookup sources; remove this unconditional limit or replace it
with a configurable cap: delete the builder.limit(0, 100) call (or wrap it
behind a configuration check), add a new planner/config option (e.g.,
graphLookupSourceRowLimit) accessed from the visitor/context, and only apply
builder.limit(0, limit) when that config is set to a positive value so the
default preserves full source rows.

In
`@core/src/main/java/org/opensearch/sql/calcite/plan/rel/LogicalGraphLookup.java`:
- Around line 129-146: The public method copy() in class LogicalGraphLookup
lacks the required JavaDoc and there are no accompanying unit tests for this new
logical node; add a JavaDoc block above copy() that documents the method purpose
and includes `@param` traitSet, `@param` inputs, `@return` RelNode and any `@throws` (if
applicable), then add unit tests for LogicalGraphLookup exercising copy(),
equality, and basic plan behavior (e.g., when given two inputs and various flags
like bidirectional/supportArray/batchMode) in the same commit to satisfy core
guidelines.

In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLGraphLookupIT.java`:
- Around line 115-152: Add boundary and invalid-parameter tests for graphLookup
by creating new test methods alongside testEmployeeHierarchyWithMaxDepth that
call executeQuery with invalid/edge values and assert proper validation errors:
(1) maxDepth=0 to verify minimum depth rejection, (2) negative maxDepth (e.g.,
-1) to verify numeric validation, (3) invalid direction value to verify
enum/domain validation, and (4) queries missing required fields (e.g., omit
startField or fromField) to verify required-field validation; use the same
helpers (executeQuery, verifySchema, verifyDataRows) where success is expected
and assert the returned JSONObject contains the appropriate error
messages/status when validation should fail.
- Around line 54-82: Add explicit NULL-input tests by creating two new test
methods in CalcitePPLGraphLookupIT: testEmployeeHierarchyNullStartValue and
testEmployeeHierarchyNullFieldParameters. For
testEmployeeHierarchyNullStartValue call executeQuery with the same graphLookup
but ensure it exercises documents whose startField value is NULL (use reportsTo
which already has a null row) and verify the reportingHierarchy for that row is
an empty array via verifyDataRows (use the existing rows/verify helpers). For
testEmployeeHierarchyNullFieldParameters call executeQuery with
invalid/empty/missing startField/fromField/toField names (e.g., empty string or
a non-existent field) and assert the current behavior: if the implementation
throws an exception assert that exception is raised, otherwise assert the query
returns rows with empty reportingHierarchy; reference the existing
testEmployeeHierarchyBasicTraversal, executeQuery, verifyDataRows, and schema
helpers to implement these checks.
- Around line 239-262: The graphLookup invocation in
testAirportConnectionsWithDepthField uses array-valued fields
(startField=connects and fromField=connects) but missing the supportArray=true
flag; update the query built in testAirportConnectionsWithDepthField (the
graphLookup operator call) to include supportArray=true so the graphLookup
optimization/visited-filter behavior is correct and consistent with other tests
that handle array-valued fields.

In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java`:
- Around line 371-429: The loop increments currentLevelDepth and unconditionally
compares it to graphLookup.maxDepth, but graphLookup.maxDepth uses -1 to mean
"unlimited", so change the break condition in CalciteEnumerableGraphLookup (the
BFS loop using currentLevelDepth and graphLookup.maxDepth) to only perform the
comparison when graphLookup.maxDepth is non-negative (e.g., if
(graphLookup.maxDepth >= 0 && ++currentLevelDepth > graphLookup.maxDepth)
break), ensuring negative maxDepth skips the depth check and preserves existing
increment semantics.
- Around line 393-425: The loop over forwardResults in
CalciteEnumerableGraphLookup currently only adds rows when nextValues is
non-empty, dropping matched rows that don't expand traversal; change it to
always add the current row to results (respecting graphLookup.depthField by
appending currentLevelDepth when needed) and then use nextValues solely to
enqueue new nodes and mark visitedNodes — i.e., move the results.add(...) /
rowWithDepth logic out of the if (!nextValues.isEmpty()) block so rows are
unconditionally preserved while still only offering non-null, unvisited
nextValues to queue.offer and visitedNodes.add.

In `@ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java`:
- Around line 1498-1555: The visitGraphLookupCommand method currently only
enforces fromField/toField but not startField, so add a validation check before
building the GraphLookup plan: if startField is null, throw a
SemanticCheckException with a clear message (e.g., "startField must be specified
for graphLookup"); update the same validation block where fromField/toField are
checked in visitGraphLookupCommand to include startField to ensure GraphLookup
(or any later use of startField) cannot receive a null value.

In
`@ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLGraphLookupTest.java`:
- Around line 132-148: Add null, boundary, and error-condition test cases to
CalcitePPLGraphLookupTest alongside the existing testGraphLookupBidirectional:
create new `@Test` methods (e.g., testGraphLookupNullStartField,
testGraphLookupMissingRequiredField, testGraphLookupInvalidDirection,
testGraphLookupMaxDepthZero) that call getRelNode with crafted PPL strings
(null/empty startField, omit required fields, invalid direction value, explicit
maxDepth=0) and assert expected failures or specific logical plans using
verifyLogical or assertThrows as appropriate; reuse the existing helper methods
getRelNode and verifyLogical to validate correct error handling and boundary
behavior.
- Around line 150-170: The current CalcitePPLGraphLookupTest.config(...) only
sets up the schema and logical plan inputs but lacks assertions for SQL
generation and optimization; add test cases that exercise Calcite's SQL
generation and optimizer for the graphLookup scenario by invoking the
planner/validator to produce the generated SQL and the optimized plan from the
same setup (use the test class CalcitePPLGraphLookupTest and its config method
to build the SchemaPlus with the "employee" EmployeeTable), then assert the
generated SQL contains the expected graphLookup/recursive patterns and assert
the optimized plan (planner output or Programs.heuristicJoinOrder result)
contains expected optimization transformations (e.g., join order or pushed
predicates); ensure these assertions are added alongside existing logical-plan
checks so the test covers SQL generation and optimization paths.

🟡 Minor comments (11)

opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java-1816-1822 (1)
1816-1822: ⚠️ Potential issue | 🟡 Minor

Add a null guard now that LiteralExpression is public.

With a public constructor, passing null will fail later in less obvious places; fail fast here. Line 1820.
🔧 Suggested fix
 public LiteralExpression(RexLiteral literal) {
-  this.literal = literal;
+  this.literal = requireNonNull(literal, "literal");
 }
integ-test/src/test/resources/indexDefinitions/graph_travelers_index_mapping.json-1-12 (1)

1-12: ⚠️ Potential issue | 🟡 Minor

Address schema mismatch and incomplete edge case coverage in test data.

The test data file (graph_travelers.json) contains an extra index field not defined in the mapping. Additionally, while the data correctly includes null values for nearestAirport (3 of 6 documents), it lacks multi-valued entries to test array handling. Either update the mapping to include the index field or remove it from the test data, and add at least one document with an array value for nearestAirport to exercise boundary conditions.
integ-test/src/test/resources/graph_travelers.json-1-6 (1)
1-6: ⚠️ Potential issue | 🟡 Minor

Add an edge-case traveler for missing/unknown nearestAirport.

This dataset only includes valid single values, so null/missing start values aren’t exercised. Consider adding a record that lacks nearestAirport (or sets it to null) to cover boundary behavior.
➕ Suggested NDJSON addition
 {"index":{"_id":"3"}}
 {"name":"Jeff","nearestAirport":"BOS"}
+{"index":{"_id":"4"}}
+{"name":"Ava","nearestAirport":null}
As per coding guidelines, Ensure test data covers edge cases and boundary conditions.
core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java-480-482 (1)
480-482: ⚠️ Potential issue | 🟡 Minor

Add JavaDoc for the new public visitor method.

The new public visitGraphLookup lacks the required JavaDoc with @param/@return.
✍️ Suggested JavaDoc
+  /**
+   * Visit GraphLookup node.
+   *
+   * `@param` node GraphLookup node
+   * `@param` context Context
+   * `@return` Return Type
+   */
   public T visitGraphLookup(GraphLookup node, C context) {
     return visitChildren(node, context);
   }
As per coding guidelines, Public methods MUST have JavaDoc with @param, @return, and @throws.
integ-test/src/test/resources/graph_airports.json-1-10 (1)
1-10: ⚠️ Potential issue | 🟡 Minor

Add a leaf airport with an empty connects list.

All airports currently have at least one connection; adding a no‑outbound node helps test traversal boundaries.
➕ Suggested NDJSON addition
 {"index":{"_id":"5"}}
 {"airport":"LHR","connects":["PWM"]}
+{"index":{"_id":"6"}}
+{"airport":"SFO","connects":[]}
As per coding guidelines, Ensure test data covers edge cases and boundary conditions.
integ-test/src/test/resources/graph_employees.json-1-12 (1)
1-12: ⚠️ Potential issue | 🟡 Minor

Add a cyclic/self-loop record to cover visited-node edge cases.

The dataset is a simple chain; adding a self-loop (or small cycle) will exercise visited-node/loop-edge handling in graph traversal tests.
💡 Suggested test data addition
@@
 {"index":{"_id":"6"}}
 {"id":6,"name":"Dan","reportsTo":"Andrew"}
+{"index":{"_id":"7"}}
+{"id":7,"name":"Loop","reportsTo":"Loop"}
As per coding guidelines: Verify test data covers edge cases and boundary conditions; Verify test data is realistic and representative.
docs/user/ppl/cmd/graphlookup.md-33-36 (1)
33-36: ⚠️ Potential issue | 🟡 Minor

Fix typos/grammar in the parameter descriptions.

Spelling and hyphenation issues reduce clarity in the parameter table.
✍️ Suggested edits
@@
-| `startField=<startField>` | Required | The field in the source documents whose value is used to start the recursive search. The value of this field is matched against `toField` in the lookup index. We support both single value and array values as starting points.   |
+| `startField=<startField>` | Required | The field in the source documents whose value is used to start the recursive search. The value of this field is matched against `toField` in the lookup index. We support both single-value and array values as starting points.   |
@@
-| `maxDepth=<maxDepth>` | Optional | The maximum recursion depth of hops. Default is `0`. A value of `0` means only the direct connections to the statr values are returned. A value of `1` means 1 hop connections (initial match plus one recursive step), and so on. |
+| `maxDepth=<maxDepth>` | Optional | The maximum recursion depth of hops. Default is `0`. A value of `0` means only the direct connections to the start values are returned. A value of `1` means 1 hop connections (initial match plus one recursive step), and so on. |
As per coding guidelines: Check for consistent formatting and style.
core/src/main/java/org/opensearch/sql/calcite/plan/rel/GraphLookup.java-109-190 (1)

109-190: ⚠️ Potential issue | 🟡 Minor

Add JavaDoc tags for GraphLookup public APIs.

getSource/getLookup/copy/estimateRowCount/explainTerms are public but lack @param/@return/@throws tags required by core guidelines.
As per coding guidelines: Public methods MUST have JavaDoc with @param, @return, and @throws.

ppl/src/main/antlr/OpenSearchPPLParser.g4-94-95 (1)

94-95: ⚠️ Potential issue | 🟡 Minor

Add missing graphLookup option keywords to searchableKeyWord.

START_FIELD, SUPPORT_ARRAY, BATCH_MODE, and USE_PIT are defined as lexer tokens and used in graphLookupOption (lines 636–651) but are not added to searchableKeyWord, unlike FROM_FIELD and TO_FIELD (lines 1716–1723). Since the lexer tokenizes them as keywords, field names with these values will fail or require quoting. Add all four to searchableKeyWord for consistency.
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLGraphLookupIT.java-392-422 (1)
392-422: ⚠️ Potential issue | 🟡 Minor

Update Javadoc to match asserted behavior.

The comment says allConnections is empty due to array type, but the assertions expect non-empty results. Please align the comment with the actual expectation.
📝 Suggested comment update
-  /**
-   * Test 12: Bidirectional airport connections for ORD. Note: Currently returns empty
-   * allConnections array because the connects field is an array type.
-   */
+  /**
+   * Test 12: Bidirectional airport connections for ORD.
+   */
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLGraphLookupIT.java-666-700 (1)
666-700: ⚠️ Potential issue | 🟡 Minor

Fix maxDepth comment mismatch.

The comment says maxDepth=1, but the query uses maxDepth=3. Please update the comment to reflect the actual parameter.
📝 Suggested comment update
-   * travelers' nearest airports: JFK (Dev, Eliot), BOS (Jeff) Unified BFS from {JFK, BOS} with
-   * maxDepth=1 finds connected airports.
+   * travelers' nearest airports: JFK (Dev, Eliot), BOS (Jeff) Unified BFS from {JFK, BOS} with
+   * maxDepth=3 finds connected airports.

🧹 Nitpick comments (4)

core/src/main/java/org/opensearch/sql/ast/tree/GraphLookup.java (1)
29-83: Prefer AST immutability by removing class-level @Setter.
attach already mutates the child; exposing setChild weakens immutability without clear need.
♻️ Suggested change
-import lombok.Setter;
@@
-@Setter
 `@ToString`
As per coding guidelines: `core/src/main/java/org/opensearch/sql/ast/**/*.java` requires AST nodes to be immutable where possible.
ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java (1)
646-712: Add anonymizer coverage for Mongo-style graphLookup and usePIT.

The test only exercises fromField/toField syntax. The alternate startWith/connectFromField/connectToField form and the usePIT flag aren’t covered yet.
✅ Suggested test additions
     // graphLookup with compound filter
     assertEquals(
         "source=table | graphlookup table fromField=identifier toField=identifier"
             + " direction=uni filter=(identifier = *** and identifier > ***) as identifier",
         anonymize(
             "source=t | graphLookup employees fromField=manager toField=name"
                 + " filter=(status = 'active' AND id > 2) as reportingHierarchy"));
+
+    // graphLookup with Mongo-style syntax
+    assertEquals(
+        "source=table | graphlookup table startWith=*** connectFromField=identifier"
+            + " connectToField=identifier direction=uni as identifier",
+        anonymize(
+            "source=t | graphLookup employees startWith='CEO' connectFromField=manager"
+                + " connectToField=name as reportingHierarchy"));
+
+    // graphLookup with usePIT
+    assertEquals(
+        "source=table | graphlookup table fromField=identifier toField=identifier"
+            + " direction=uni usePIT=true as identifier",
+        anonymize(
+            "source=t | graphLookup employees fromField=manager toField=name"
+                + " usePIT=true as reportingHierarchy"));
ppl/src/test/java/org/opensearch/sql/ppl/parser/AstBuilderTest.java (1)

1648-1718: Add parser coverage for Mongo-style syntax and optional flags.

The test validates required fields and some options, but doesn’t cover the startWith/connectFromField/connectToField variant (or flags like supportArray, batchMode, usePIT). Adding at least one positive case for that syntax would better exercise the parser path.

As per coding guidelines, Include edge cases and boundary conditions in parser and grammar tests.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)

2530-2589: Extract visitGraphLookup into helper methods.

This method is now >50 lines; splitting parameter extraction, source/lookup materialization, and RelNode construction would keep the visitor readable.
As per coding guidelines: core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java: Flag methods >50 lines - this file is known to be hard to read; Suggest extracting complex logic into helper methods.

core/src/main/java/org/opensearch/sql/analysis/Analyzer.java

coderabbitai · 2026-02-12T03:09:52Z

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

+    RelBuilder builder = context.relBuilder;
+    // TODO: Limit the number of source rows to 100 for now, make it configurable.
+    builder.limit(0, 100);


⚠️ Potential issue | 🟠 Major

Remove the hard-coded 100-row cap on graphLookup sources.

The unconditional limit truncates the start set and will silently drop traversal results once there are more than 100 source rows. Please make this configurable or remove it to preserve correctness.

🛠️ Suggested change

- // TODO: Limit the number of source rows to 100 for now, make it configurable. - builder.limit(0, 100);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

RelBuilder builder = context.relBuilder;

// TODO: Limit the number of source rows to 100 for now, make it configurable.

builder.limit(0, 100);

RelBuilder builder = context.relBuilder;

🤖 Prompt for AI Agents

In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java` around lines 2534 - 2536, The hard-coded builder.limit(0, 100) in CalciteRelNodeVisitor (using context.relBuilder / builder) unconditionally truncates graphLookup sources; remove this unconditional limit or replace it with a configurable cap: delete the builder.limit(0, 100) call (or wrap it behind a configuration check), add a new planner/config option (e.g., graphLookupSourceRowLimit) accessed from the visitor/context, and only apply builder.limit(0, limit) when that config is set to a positive value so the default preserves full source rows.

_{👋 Leave emoji reaction (👍/👎) to track effectiveness of CodeRabbit.}

core/src/main/java/org/opensearch/sql/calcite/plan/rel/LogicalGraphLookup.java

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLGraphLookupIT.java

...h/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java

coderabbitai · 2026-02-12T03:09:52Z

...h/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java

+        for (Object row : forwardResults) {
+          Object[] rowArray = (Object[]) (row);
+          Object fromValue = rowArray[fromFieldIdx];
+          // Collect next values to traverse (may be single value or list)
+          // For forward traversal: extract fromField values for next level
+          // For bidirectional: also extract toField values.
+          // Skip visited values while keep null value
+          List<Object> nextValues = new ArrayList<>();
+          collectValues(fromValue, nextValues, visitedNodes);
+          if (graphLookup.bidirectional) {
+            Object toValue = rowArray[toFieldIdx];
+            collectValues(toValue, nextValues, visitedNodes);
+          }
+
+          // Add row to results if the nextValues is not empty
+          if (!nextValues.isEmpty()) {
+            if (graphLookup.depthField != null) {
+              Object[] rowWithDepth = new Object[rowArray.length + 1];
+              System.arraycopy(rowArray, 0, rowWithDepth, 0, rowArray.length);
+              rowWithDepth[rowArray.length] = currentLevelDepth;
+              results.add(rowWithDepth);
+            } else {
+              results.add(rowArray);
+            }
+
+            // Add unvisited non-null values to queue for next level traversal
+            for (Object val : nextValues) {
+              if (val != null) {
+                visitedNodes.add(val);
+                queue.offer(val);
+              }
+            }
+          }


⚠️ Potential issue | 🟠 Major

Include matched rows even when they don’t expand traversal.

Rows are only added when nextValues is non-empty, which can drop reachable docs when multiple start values or converging edges yield already-visited next nodes. Add the row unconditionally and use nextValues only for queueing.

🛠️ Suggested change

- // Add row to results if the nextValues is not empty - if (!nextValues.isEmpty()) { - if (graphLookup.depthField != null) { - Object[] rowWithDepth = new Object[rowArray.length + 1]; - System.arraycopy(rowArray, 0, rowWithDepth, 0, rowArray.length); - rowWithDepth[rowArray.length] = currentLevelDepth; - results.add(rowWithDepth); - } else { - results.add(rowArray); - } - - // Add unvisited non-null values to queue for next level traversal - for (Object val : nextValues) { - if (val != null) { - visitedNodes.add(val); - queue.offer(val); - } - } - } + // Always include the matched row; use nextValues only for expansion. + if (graphLookup.depthField != null) { + Object[] rowWithDepth = new Object[rowArray.length + 1]; + System.arraycopy(rowArray, 0, rowWithDepth, 0, rowArray.length); + rowWithDepth[rowArray.length] = currentLevelDepth; + results.add(rowWithDepth); + } else { + results.add(rowArray); + } + + // Add unvisited non-null values to queue for next level traversal + for (Object val : nextValues) { + if (val != null) { + visitedNodes.add(val); + queue.offer(val); + } + }

🤖 Prompt for AI Agents

In `@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java` around lines 393 - 425, The loop over forwardResults in CalciteEnumerableGraphLookup currently only adds rows when nextValues is non-empty, dropping matched rows that don't expand traversal; change it to always add the current row to results (respecting graphLookup.depthField by appending currentLevelDepth when needed) and then use nextValues solely to enqueue new nodes and mark visitedNodes — i.e., move the results.add(...) / rowWithDepth logic out of the if (!nextValues.isEmpty()) block so rows are unconditionally preserved while still only offering non-null, unvisited nextValues to queue.offer and visitedNodes.add.

_{👋 Leave emoji reaction (👍/👎) to track effectiveness of CodeRabbit.}

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java

ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLGraphLookupTest.java

LantaoJin · 2026-02-13T04:32:27Z

@penghuo can you review and merge this?

qianheng-aws and others added 2 commits February 10, 2026 11:10

Merge remote-tracking branch 'upstream/main' into feature/graphlookup

5eadcc5

Signed-off-by: Lantao Jin <ltjin@amazon.com>

LantaoJin force-pushed the feature/graphlookup branch from bfd31e0 to 5eadcc5 Compare February 12, 2026 02:35

LantaoJin added feature PPL Piped processing language labels Feb 12, 2026

LantaoJin force-pushed the feature/graphlookup branch 2 times, most recently from 7d49e0d to 5eadcc5 Compare February 12, 2026 02:43

LantaoJin marked this pull request as ready for review February 12, 2026 02:55

LantaoJin requested review from GumpacG, MaxKsyunz, Swiddis, YANG-DB, Yury-Fridlyand, acarbonetto, anirudha, dai-chen, derek-ho, forestmvey, joshuali925, kavithacm, mengweieric, noCharger, penghuo, ps48, qianheng-aws, seankao-az, vamsimanohar and ykmr1224 as code owners February 12, 2026 02:55

LantaoJin requested review from RyanL1997 and yuancu as code owners February 12, 2026 02:55

coderabbitai bot reviewed Feb 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support bi-directional graph traversal command `graphlookup`#5138

[Feature] Support bi-directional graph traversal command `graphlookup`#5138
LantaoJin wants to merge 3 commits intomainfrom
feature/graphlookup

LantaoJin commented Feb 11, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Feb 11, 2026 •

edited

Loading

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot Feb 12, 2026 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Feb 12, 2026 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LantaoJin commented Feb 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LantaoJin commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Syntax

Performance

Related Issues

Check List

Uh oh!

coderabbitai bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 12, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Feb 12, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LantaoJin commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LantaoJin commented Feb 11, 2026 •

edited

Loading

coderabbitai bot commented Feb 11, 2026 •

edited

Loading

coderabbitai bot Feb 12, 2026 •

edited by github-actions bot

Loading

coderabbitai bot Feb 12, 2026 •

edited by github-actions bot

Loading

LantaoJin commented Feb 13, 2026 •

edited

Loading