[Feature] Support bi-directional graph traversal command graphlookup#5138
[Feature] Support bi-directional graph traversal command graphlookup#5138
graphlookup#5138Conversation
* succeed to graph lookup single index Signed-off-by: Lantao Jin <ltjin@amazon.com> * Implement graph lookup RelNode Signed-off-by: Heng Qian <qianheng@amazon.com> * Refine - remove depth from BFS node Signed-off-by: Heng Qian <qianheng@amazon.com> * Support bidirectional mode Signed-off-by: Heng Qian <qianheng@amazon.com> * Support anonymize graph lookup Signed-off-by: Heng Qian <qianheng@amazon.com> * Fix UT Signed-off-by: Heng Qian <qianheng@amazon.com> * Add IT Signed-off-by: Heng Qian <qianheng@amazon.com> * Add limitation for GraphLookup Signed-off-by: Heng Qian <qianheng@amazon.com> * Simplify GraphLookup param names Signed-off-by: Heng Qian <qianheng@amazon.com> * Refine IT Signed-off-by: Heng Qian <qianheng@amazon.com> * Support value of list; Support retrieve circle edges also Signed-off-by: Heng Qian <qianheng@amazon.com> * Add documentation for graph lookup Signed-off-by: Heng Qian <qianheng@amazon.com> * Don't include loop edges Signed-off-by: Heng Qian <qianheng@amazon.com> * Refine code Signed-off-by: Heng Qian <qianheng@amazon.com> * spotlessApply Signed-off-by: Heng Qian <qianheng@amazon.com> * Refine code Signed-off-by: Heng Qian <qianheng@amazon.com> * Filter visited nodes in search query Signed-off-by: Heng Qian <qianheng@amazon.com> * Fix UT Signed-off-by: Heng Qian <qianheng@amazon.com> * Add parameter supportArray for handling fields with array values Signed-off-by: Heng Qian <qianheng@amazon.com> * Remove unused code Signed-off-by: Heng Qian <qianheng@amazon.com> * Support batch mode Signed-off-by: Heng Qian <qianheng@amazon.com> * Close lookup table scan Signed-off-by: Heng Qian <qianheng@amazon.com> * refine code Signed-off-by: Heng Qian <qianheng@amazon.com> * Add param usePIT Signed-off-by: Heng Qian <qianheng@amazon.com> --------- Signed-off-by: Lantao Jin <ltjin@amazon.com> Signed-off-by: Heng Qian <qianheng@amazon.com> Co-authored-by: Lantao Jin <ltjin@amazon.com>
* Struct return array value instead of string Signed-off-by: Heng Qian <qianheng@amazon.com> * Support filter in GraphLookup Signed-off-by: Heng Qian <qianheng@amazon.com> * Fix IT Signed-off-by: Heng Qian <qianheng@amazon.com> * Add experimental tag in doc Signed-off-by: Heng Qian <qianheng@amazon.com> --------- Signed-off-by: Heng Qian <qianheng@amazon.com>
📝 WalkthroughSummary by CodeRabbit
WalkthroughThis PR introduces a new graphLookup command for PPL to support multi-hop graph traversal queries. The implementation spans grammar rules, AST nodes, Calcite relational operators, execution engines, and comprehensive integration tests with sample graph data (employees, airports, travelers indices). Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
Signed-off-by: Lantao Jin <ltjin@amazon.com>
bfd31e0 to
5eadcc5
Compare
7d49e0d to
5eadcc5
Compare
There was a problem hiding this comment.
Actionable comments posted: 11
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchExecutionEngine.java (1)
14-23:⚠️ Potential issue | 🔴 CriticalStructImpl attributes should be recursively converted to handle nested structures.
StructImplmay contain nested maps, lists, or geo points. The current implementation at line 250 returns raw attributes without passing them throughprocessValue, unlike the recursive handling for Maps (line 245) and Lists (line 256). This can leak unconverted types.Replace the current logic with iteration and recursive conversion:
🛠️ Proposed fix
if (value instanceof StructImpl) { - return Arrays.asList(((StructImpl) value).getAttributes()); + Object[] attributes = ((StructImpl) value).getAttributes(); + List<Object> converted = new ArrayList<>(attributes.length); + for (Object attribute : attributes) { + converted.add(processValue(attribute)); + } + return converted; }Note:
ArrayListis already imported; only the logic needs updating.
🤖 Fix all issues with AI agents
In `@core/src/main/java/org/opensearch/sql/analysis/Analyzer.java`:
- Around line 550-553: Add missing JavaDoc to the public method
Analyzer.visitGraphLookup(GraphLookup node, AnalysisContext context) documenting
`@param` node, `@param` context, `@return` (LogicalPlan), and `@throws` for the
Calcite-only exception thrown by getOnlyForCalciteException("graphlookup"); then
add a unit test in the AnalyzerTest (or equivalent test class) that calls
visitGraphLookup with a mocked/constructed GraphLookup and AnalysisContext and
asserts that the getOnlyForCalciteException is thrown (verify exception type and
message contains "graphlookup"); ensure the test and JavaDoc are committed
together.
In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`:
- Around line 2534-2536: The hard-coded builder.limit(0, 100) in
CalciteRelNodeVisitor (using context.relBuilder / builder) unconditionally
truncates graphLookup sources; remove this unconditional limit or replace it
with a configurable cap: delete the builder.limit(0, 100) call (or wrap it
behind a configuration check), add a new planner/config option (e.g.,
graphLookupSourceRowLimit) accessed from the visitor/context, and only apply
builder.limit(0, limit) when that config is set to a positive value so the
default preserves full source rows.
In
`@core/src/main/java/org/opensearch/sql/calcite/plan/rel/LogicalGraphLookup.java`:
- Around line 129-146: The public method copy() in class LogicalGraphLookup
lacks the required JavaDoc and there are no accompanying unit tests for this new
logical node; add a JavaDoc block above copy() that documents the method purpose
and includes `@param` traitSet, `@param` inputs, `@return` RelNode and any `@throws` (if
applicable), then add unit tests for LogicalGraphLookup exercising copy(),
equality, and basic plan behavior (e.g., when given two inputs and various flags
like bidirectional/supportArray/batchMode) in the same commit to satisfy core
guidelines.
In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLGraphLookupIT.java`:
- Around line 115-152: Add boundary and invalid-parameter tests for graphLookup
by creating new test methods alongside testEmployeeHierarchyWithMaxDepth that
call executeQuery with invalid/edge values and assert proper validation errors:
(1) maxDepth=0 to verify minimum depth rejection, (2) negative maxDepth (e.g.,
-1) to verify numeric validation, (3) invalid direction value to verify
enum/domain validation, and (4) queries missing required fields (e.g., omit
startField or fromField) to verify required-field validation; use the same
helpers (executeQuery, verifySchema, verifyDataRows) where success is expected
and assert the returned JSONObject contains the appropriate error
messages/status when validation should fail.
- Around line 54-82: Add explicit NULL-input tests by creating two new test
methods in CalcitePPLGraphLookupIT: testEmployeeHierarchyNullStartValue and
testEmployeeHierarchyNullFieldParameters. For
testEmployeeHierarchyNullStartValue call executeQuery with the same graphLookup
but ensure it exercises documents whose startField value is NULL (use reportsTo
which already has a null row) and verify the reportingHierarchy for that row is
an empty array via verifyDataRows (use the existing rows/verify helpers). For
testEmployeeHierarchyNullFieldParameters call executeQuery with
invalid/empty/missing startField/fromField/toField names (e.g., empty string or
a non-existent field) and assert the current behavior: if the implementation
throws an exception assert that exception is raised, otherwise assert the query
returns rows with empty reportingHierarchy; reference the existing
testEmployeeHierarchyBasicTraversal, executeQuery, verifyDataRows, and schema
helpers to implement these checks.
- Around line 239-262: The graphLookup invocation in
testAirportConnectionsWithDepthField uses array-valued fields
(startField=connects and fromField=connects) but missing the supportArray=true
flag; update the query built in testAirportConnectionsWithDepthField (the
graphLookup operator call) to include supportArray=true so the graphLookup
optimization/visited-filter behavior is correct and consistent with other tests
that handle array-valued fields.
In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java`:
- Around line 371-429: The loop increments currentLevelDepth and unconditionally
compares it to graphLookup.maxDepth, but graphLookup.maxDepth uses -1 to mean
"unlimited", so change the break condition in CalciteEnumerableGraphLookup (the
BFS loop using currentLevelDepth and graphLookup.maxDepth) to only perform the
comparison when graphLookup.maxDepth is non-negative (e.g., if
(graphLookup.maxDepth >= 0 && ++currentLevelDepth > graphLookup.maxDepth)
break), ensuring negative maxDepth skips the depth check and preserves existing
increment semantics.
- Around line 393-425: The loop over forwardResults in
CalciteEnumerableGraphLookup currently only adds rows when nextValues is
non-empty, dropping matched rows that don't expand traversal; change it to
always add the current row to results (respecting graphLookup.depthField by
appending currentLevelDepth when needed) and then use nextValues solely to
enqueue new nodes and mark visitedNodes — i.e., move the results.add(...) /
rowWithDepth logic out of the if (!nextValues.isEmpty()) block so rows are
unconditionally preserved while still only offering non-null, unvisited
nextValues to queue.offer and visitedNodes.add.
In `@ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java`:
- Around line 1498-1555: The visitGraphLookupCommand method currently only
enforces fromField/toField but not startField, so add a validation check before
building the GraphLookup plan: if startField is null, throw a
SemanticCheckException with a clear message (e.g., "startField must be specified
for graphLookup"); update the same validation block where fromField/toField are
checked in visitGraphLookupCommand to include startField to ensure GraphLookup
(or any later use of startField) cannot receive a null value.
In
`@ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLGraphLookupTest.java`:
- Around line 132-148: Add null, boundary, and error-condition test cases to
CalcitePPLGraphLookupTest alongside the existing testGraphLookupBidirectional:
create new `@Test` methods (e.g., testGraphLookupNullStartField,
testGraphLookupMissingRequiredField, testGraphLookupInvalidDirection,
testGraphLookupMaxDepthZero) that call getRelNode with crafted PPL strings
(null/empty startField, omit required fields, invalid direction value, explicit
maxDepth=0) and assert expected failures or specific logical plans using
verifyLogical or assertThrows as appropriate; reuse the existing helper methods
getRelNode and verifyLogical to validate correct error handling and boundary
behavior.
- Around line 150-170: The current CalcitePPLGraphLookupTest.config(...) only
sets up the schema and logical plan inputs but lacks assertions for SQL
generation and optimization; add test cases that exercise Calcite's SQL
generation and optimizer for the graphLookup scenario by invoking the
planner/validator to produce the generated SQL and the optimized plan from the
same setup (use the test class CalcitePPLGraphLookupTest and its config method
to build the SchemaPlus with the "employee" EmployeeTable), then assert the
generated SQL contains the expected graphLookup/recursive patterns and assert
the optimized plan (planner output or Programs.heuristicJoinOrder result)
contains expected optimization transformations (e.g., join order or pushed
predicates); ensure these assertions are added alongside existing logical-plan
checks so the test covers SQL generation and optimization paths.
🟡 Minor comments (11)
opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java-1816-1822 (1)
1816-1822:⚠️ Potential issue | 🟡 MinorAdd a null guard now that LiteralExpression is public.
With a public constructor, passing
nullwill fail later in less obvious places; fail fast here. Line 1820.🔧 Suggested fix
public LiteralExpression(RexLiteral literal) { - this.literal = literal; + this.literal = requireNonNull(literal, "literal"); }integ-test/src/test/resources/indexDefinitions/graph_travelers_index_mapping.json-1-12 (1)
1-12:⚠️ Potential issue | 🟡 MinorAddress schema mismatch and incomplete edge case coverage in test data.
The test data file (graph_travelers.json) contains an extra
indexfield not defined in the mapping. Additionally, while the data correctly includes null values fornearestAirport(3 of 6 documents), it lacks multi-valued entries to test array handling. Either update the mapping to include theindexfield or remove it from the test data, and add at least one document with an array value fornearestAirportto exercise boundary conditions.integ-test/src/test/resources/graph_travelers.json-1-6 (1)
1-6:⚠️ Potential issue | 🟡 MinorAdd an edge-case traveler for missing/unknown nearestAirport.
This dataset only includes valid single values, so null/missing start values aren’t exercised. Consider adding a record that lacks
nearestAirport(or sets it to null) to cover boundary behavior.➕ Suggested NDJSON addition
{"index":{"_id":"3"}} {"name":"Jeff","nearestAirport":"BOS"} +{"index":{"_id":"4"}} +{"name":"Ava","nearestAirport":null}As per coding guidelines, Ensure test data covers edge cases and boundary conditions.
core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java-480-482 (1)
480-482:⚠️ Potential issue | 🟡 MinorAdd JavaDoc for the new public visitor method.
The new public
visitGraphLookuplacks the required JavaDoc with@param/@return.✍️ Suggested JavaDoc
+ /** + * Visit GraphLookup node. + * + * `@param` node GraphLookup node + * `@param` context Context + * `@return` Return Type + */ public T visitGraphLookup(GraphLookup node, C context) { return visitChildren(node, context); }As per coding guidelines, Public methods MUST have JavaDoc with
@param,@return, and@throws.integ-test/src/test/resources/graph_airports.json-1-10 (1)
1-10:⚠️ Potential issue | 🟡 MinorAdd a leaf airport with an empty
connectslist.All airports currently have at least one connection; adding a no‑outbound node helps test traversal boundaries.
➕ Suggested NDJSON addition
{"index":{"_id":"5"}} {"airport":"LHR","connects":["PWM"]} +{"index":{"_id":"6"}} +{"airport":"SFO","connects":[]}As per coding guidelines, Ensure test data covers edge cases and boundary conditions.
integ-test/src/test/resources/graph_employees.json-1-12 (1)
1-12:⚠️ Potential issue | 🟡 MinorAdd a cyclic/self-loop record to cover visited-node edge cases.
The dataset is a simple chain; adding a self-loop (or small cycle) will exercise visited-node/loop-edge handling in graph traversal tests.
As per coding guidelines: Verify test data covers edge cases and boundary conditions; Verify test data is realistic and representative.💡 Suggested test data addition
@@ {"index":{"_id":"6"}} {"id":6,"name":"Dan","reportsTo":"Andrew"} +{"index":{"_id":"7"}} +{"id":7,"name":"Loop","reportsTo":"Loop"}docs/user/ppl/cmd/graphlookup.md-33-36 (1)
33-36:⚠️ Potential issue | 🟡 MinorFix typos/grammar in the parameter descriptions.
Spelling and hyphenation issues reduce clarity in the parameter table.
As per coding guidelines: Check for consistent formatting and style.✍️ Suggested edits
@@ -| `startField=<startField>` | Required | The field in the source documents whose value is used to start the recursive search. The value of this field is matched against `toField` in the lookup index. We support both single value and array values as starting points. | +| `startField=<startField>` | Required | The field in the source documents whose value is used to start the recursive search. The value of this field is matched against `toField` in the lookup index. We support both single-value and array values as starting points. | @@ -| `maxDepth=<maxDepth>` | Optional | The maximum recursion depth of hops. Default is `0`. A value of `0` means only the direct connections to the statr values are returned. A value of `1` means 1 hop connections (initial match plus one recursive step), and so on. | +| `maxDepth=<maxDepth>` | Optional | The maximum recursion depth of hops. Default is `0`. A value of `0` means only the direct connections to the start values are returned. A value of `1` means 1 hop connections (initial match plus one recursive step), and so on. |core/src/main/java/org/opensearch/sql/calcite/plan/rel/GraphLookup.java-109-190 (1)
109-190:⚠️ Potential issue | 🟡 MinorAdd JavaDoc tags for GraphLookup public APIs.
getSource/getLookup/copy/estimateRowCount/explainTerms are public but lack
@param/@return/@throws tags required by core guidelines.
As per coding guidelines: Public methods MUST have JavaDoc with@param,@return, and@throws.ppl/src/main/antlr/OpenSearchPPLParser.g4-94-95 (1)
94-95:⚠️ Potential issue | 🟡 MinorAdd missing graphLookup option keywords to searchableKeyWord.
START_FIELD, SUPPORT_ARRAY, BATCH_MODE, and USE_PIT are defined as lexer tokens and used in graphLookupOption (lines 636–651) but are not added to searchableKeyWord, unlike FROM_FIELD and TO_FIELD (lines 1716–1723). Since the lexer tokenizes them as keywords, field names with these values will fail or require quoting. Add all four to searchableKeyWord for consistency.
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLGraphLookupIT.java-392-422 (1)
392-422:⚠️ Potential issue | 🟡 MinorUpdate Javadoc to match asserted behavior.
The comment says
allConnectionsis empty due to array type, but the assertions expect non-empty results. Please align the comment with the actual expectation.📝 Suggested comment update
- /** - * Test 12: Bidirectional airport connections for ORD. Note: Currently returns empty - * allConnections array because the connects field is an array type. - */ + /** + * Test 12: Bidirectional airport connections for ORD. + */integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLGraphLookupIT.java-666-700 (1)
666-700:⚠️ Potential issue | 🟡 MinorFix maxDepth comment mismatch.
The comment says
maxDepth=1, but the query usesmaxDepth=3. Please update the comment to reflect the actual parameter.📝 Suggested comment update
- * travelers' nearest airports: JFK (Dev, Eliot), BOS (Jeff) Unified BFS from {JFK, BOS} with - * maxDepth=1 finds connected airports. + * travelers' nearest airports: JFK (Dev, Eliot), BOS (Jeff) Unified BFS from {JFK, BOS} with + * maxDepth=3 finds connected airports.
🧹 Nitpick comments (4)
core/src/main/java/org/opensearch/sql/ast/tree/GraphLookup.java (1)
29-83: Prefer AST immutability by removing class-level@Setter.
attachalready mutates the child; exposingsetChildweakens immutability without clear need.As per coding guidelines: `core/src/main/java/org/opensearch/sql/ast/**/*.java` requires AST nodes to be immutable where possible.♻️ Suggested change
-import lombok.Setter; @@ -@Setter `@ToString`ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java (1)
646-712: Add anonymizer coverage for Mongo-style graphLookup andusePIT.The test only exercises
fromField/toFieldsyntax. The alternatestartWith/connectFromField/connectToFieldform and theusePITflag aren’t covered yet.✅ Suggested test additions
// graphLookup with compound filter assertEquals( "source=table | graphlookup table fromField=identifier toField=identifier" + " direction=uni filter=(identifier = *** and identifier > ***) as identifier", anonymize( "source=t | graphLookup employees fromField=manager toField=name" + " filter=(status = 'active' AND id > 2) as reportingHierarchy")); + + // graphLookup with Mongo-style syntax + assertEquals( + "source=table | graphlookup table startWith=*** connectFromField=identifier" + + " connectToField=identifier direction=uni as identifier", + anonymize( + "source=t | graphLookup employees startWith='CEO' connectFromField=manager" + + " connectToField=name as reportingHierarchy")); + + // graphLookup with usePIT + assertEquals( + "source=table | graphlookup table fromField=identifier toField=identifier" + + " direction=uni usePIT=true as identifier", + anonymize( + "source=t | graphLookup employees fromField=manager toField=name" + + " usePIT=true as reportingHierarchy"));ppl/src/test/java/org/opensearch/sql/ppl/parser/AstBuilderTest.java (1)
1648-1718: Add parser coverage for Mongo-style syntax and optional flags.The test validates required fields and some options, but doesn’t cover the
startWith/connectFromField/connectToFieldvariant (or flags likesupportArray,batchMode,usePIT). Adding at least one positive case for that syntax would better exercise the parser path.As per coding guidelines, Include edge cases and boundary conditions in parser and grammar tests.
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)
2530-2589: Extract visitGraphLookup into helper methods.This method is now >50 lines; splitting parameter extraction, source/lookup materialization, and RelNode construction would keep the visitor readable.
As per coding guidelines: core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java: Flag methods >50 lines - this file is known to be hard to read; Suggest extracting complex logic into helper methods.
| RelBuilder builder = context.relBuilder; | ||
| // TODO: Limit the number of source rows to 100 for now, make it configurable. | ||
| builder.limit(0, 100); |
There was a problem hiding this comment.
Remove the hard-coded 100-row cap on graphLookup sources.
The unconditional limit truncates the start set and will silently drop traversal results once there are more than 100 source rows. Please make this configurable or remove it to preserve correctness.
🛠️ Suggested change
- // TODO: Limit the number of source rows to 100 for now, make it configurable.
- builder.limit(0, 100);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| RelBuilder builder = context.relBuilder; | |
| // TODO: Limit the number of source rows to 100 for now, make it configurable. | |
| builder.limit(0, 100); | |
| RelBuilder builder = context.relBuilder; |
🤖 Prompt for AI Agents
In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`
around lines 2534 - 2536, The hard-coded builder.limit(0, 100) in
CalciteRelNodeVisitor (using context.relBuilder / builder) unconditionally
truncates graphLookup sources; remove this unconditional limit or replace it
with a configurable cap: delete the builder.limit(0, 100) call (or wrap it
behind a configuration check), add a new planner/config option (e.g.,
graphLookupSourceRowLimit) accessed from the visitor/context, and only apply
builder.limit(0, limit) when that config is set to a positive value so the
default preserves full source rows.
👋 Leave emoji reaction (👍/👎) to track effectiveness of CodeRabbit.
core/src/main/java/org/opensearch/sql/calcite/plan/rel/LogicalGraphLookup.java
Show resolved
Hide resolved
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLGraphLookupIT.java
Show resolved
Hide resolved
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLGraphLookupIT.java
Show resolved
Hide resolved
...h/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java
Show resolved
Hide resolved
| for (Object row : forwardResults) { | ||
| Object[] rowArray = (Object[]) (row); | ||
| Object fromValue = rowArray[fromFieldIdx]; | ||
| // Collect next values to traverse (may be single value or list) | ||
| // For forward traversal: extract fromField values for next level | ||
| // For bidirectional: also extract toField values. | ||
| // Skip visited values while keep null value | ||
| List<Object> nextValues = new ArrayList<>(); | ||
| collectValues(fromValue, nextValues, visitedNodes); | ||
| if (graphLookup.bidirectional) { | ||
| Object toValue = rowArray[toFieldIdx]; | ||
| collectValues(toValue, nextValues, visitedNodes); | ||
| } | ||
|
|
||
| // Add row to results if the nextValues is not empty | ||
| if (!nextValues.isEmpty()) { | ||
| if (graphLookup.depthField != null) { | ||
| Object[] rowWithDepth = new Object[rowArray.length + 1]; | ||
| System.arraycopy(rowArray, 0, rowWithDepth, 0, rowArray.length); | ||
| rowWithDepth[rowArray.length] = currentLevelDepth; | ||
| results.add(rowWithDepth); | ||
| } else { | ||
| results.add(rowArray); | ||
| } | ||
|
|
||
| // Add unvisited non-null values to queue for next level traversal | ||
| for (Object val : nextValues) { | ||
| if (val != null) { | ||
| visitedNodes.add(val); | ||
| queue.offer(val); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Include matched rows even when they don’t expand traversal.
Rows are only added when nextValues is non-empty, which can drop reachable docs when multiple start values or converging edges yield already-visited next nodes. Add the row unconditionally and use nextValues only for queueing.
🛠️ Suggested change
- // Add row to results if the nextValues is not empty
- if (!nextValues.isEmpty()) {
- if (graphLookup.depthField != null) {
- Object[] rowWithDepth = new Object[rowArray.length + 1];
- System.arraycopy(rowArray, 0, rowWithDepth, 0, rowArray.length);
- rowWithDepth[rowArray.length] = currentLevelDepth;
- results.add(rowWithDepth);
- } else {
- results.add(rowArray);
- }
-
- // Add unvisited non-null values to queue for next level traversal
- for (Object val : nextValues) {
- if (val != null) {
- visitedNodes.add(val);
- queue.offer(val);
- }
- }
- }
+ // Always include the matched row; use nextValues only for expansion.
+ if (graphLookup.depthField != null) {
+ Object[] rowWithDepth = new Object[rowArray.length + 1];
+ System.arraycopy(rowArray, 0, rowWithDepth, 0, rowArray.length);
+ rowWithDepth[rowArray.length] = currentLevelDepth;
+ results.add(rowWithDepth);
+ } else {
+ results.add(rowArray);
+ }
+
+ // Add unvisited non-null values to queue for next level traversal
+ for (Object val : nextValues) {
+ if (val != null) {
+ visitedNodes.add(val);
+ queue.offer(val);
+ }
+ }🤖 Prompt for AI Agents
In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableGraphLookup.java`
around lines 393 - 425, The loop over forwardResults in
CalciteEnumerableGraphLookup currently only adds rows when nextValues is
non-empty, dropping matched rows that don't expand traversal; change it to
always add the current row to results (respecting graphLookup.depthField by
appending currentLevelDepth when needed) and then use nextValues solely to
enqueue new nodes and mark visitedNodes — i.e., move the results.add(...) /
rowWithDepth logic out of the if (!nextValues.isEmpty()) block so rows are
unconditionally preserved while still only offering non-null, unvisited
nextValues to queue.offer and visitedNodes.add.
👋 Leave emoji reaction (👍/👎) to track effectiveness of CodeRabbit.
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLGraphLookupTest.java
Show resolved
Hide resolved
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLGraphLookupTest.java
Show resolved
Hide resolved
|
@penghuo can you review and merge this? |
Description
The
graphlookupcommand performs recursive graph traversal on OpenSearch indices using a breadth-first search (BFS) algorithm.Syntax
Option 1
Example Query
Option 2 (follow MongoDB naming convention)
Example Query
Performance
OpenSearch
graphlookupcommand has implemented several features, which make it more applicable on RAG scenarios as well as suitable for different requirements.direction=uniandusePIT=true, OpenSearchgraphlookupachieves the same result as MongoDB's$graphlookupaggregation command with 5-6x faster on SNB SF-1 dataset, 29x faster on SNB SF-10. And OpenSearch can finish all benchmark on SNB SF-10 in 734ms, SNB SF-30 in 8000ms while MongoDB fails to complete.batchMode, which is more compatible with the requirements of RAG. Apart from already deduped on the final results, it also gets 58x faster than MongoDB on SNB SF-1 dataset. And also OpenSearch can finish multiple start value benchmark on SNB SF-10 in 3508ms, on SNB SF-30 in 10992ms while MongoDB fails to complete.PITandnon-PITmode, the latter will get better performance on large dataset by sacrificing the accuracy. It gets 5x faster than the using PIT mode on SNB SF-30 but getting 27% nodes missed.Related Issues
Resolves #5088
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.