update ppl doc examples to use otel data by ritvibhatt · Pull Request #5303 · opensearch-project/sql

ritvibhatt · 2026-04-01T22:22:05Z

Description

Updates PPL command docs to use otel sample data instead of accounts/bank/etc
Applies feedback from docs website to keep in sync (only diff from exporter script is commented out sections for data sources section on documentation website)

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
New PPL command checklist all confirmed.
API changes companion pull request created.
Commits are signed per the DCO using --signoff or -s.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2026-04-01T22:23:25Z

PR Reviewer Guide 🔍

(Review updated until commit `844e0fe`)

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: Replace otellogs test data with new OTel dataset Relevant files: doctest/test_data/otellogs.json Sub-PR theme: Add playground button support for otellogs PPL blocks in docs exporter Relevant files: scripts/docs_exporter/export_to_docs_website.py Sub-PR theme: Update PPL command documentation examples to use OTel data Relevant files: docs/user/ppl/cmd/search.md docs/user/ppl/cmd/reverse.md docs/user/ppl/cmd/graphlookup.md docs/user/ppl/cmd/showdatasources.md
⚡ Recommended focus areas for review Regex Ordering The new PPL block regex (lines 267-277) converts `ppl` fences with `otellogs` to `sql` fences with playground buttons, and then line 280 converts remaining `ppl` fences to `sql`. However, the new regex at line 267 uses `re.DOTALL` and matches the full block including the closing fence. If the first regex matches and replaces a block, the second regex at line 280 (which only matches the opening fence line) should not re-match it since the fence is now `sql`. This ordering appears correct, but should be validated that no edge cases exist where a block is double-processed. content = re.sub( r'^```ppl\b[^\n]\n(.?)^```$', lambda m: ( '```sql\n' + m.group(1) + '```\n{% include copy.html %}\n{% include try-in-playground.html %}' if _is_playground_eligible(m.group(1), current_file_path) else ( '```ppl\n' + m.group(1) + '```' ) ), content, flags=re.MULTILINE \| re.DOTALL, ) # Convert remaining PPL code fences to SQL content = re.sub(r'^```ppl\b.$', '```sql', content, flags=re.MULTILINE) Playground Check* The `_is_playground_eligible` function checks for `source=otellogs` as a substring of the block text. This could produce false positives if a comment or string literal within the PPL block contains `source=otellogs` without it being the actual source command. A more precise check (e.g., matching `source\s=\sotellogs` as a word boundary or at the start of a line) would be more robust. def _is_playground_eligible(block_text, file_path): """Check if a PPL block should get a Try in Playground button.""" if 'source=otellogs' not in block_text: return False if file_path: stem = file_path.stem if hasattr(file_path, 'stem') else str(file_path).rsplit('/', 1)[-1].replace('.md', '') if stem in PLAYGROUND_UNSUPPORTED_CMDS: return False return True Formatting Inconsistency In the wildcard example result table at line 414, the `ERROR` value appears to have inconsistent spacing/padding (`\| ERROR \|` vs `\| ERROR \|`), which may indicate a copy-paste error in the table formatting. \| ERROR \| Missing Index Directives The old data file used explicit `{"index": {"_id": "N"}}` bulk API directives before each document. The new file omits these, which changes the bulk indexing format. If this file is used directly with the OpenSearch Bulk API, the missing action lines will cause indexing failures. Verify that the ingestion pipeline handles this format correctly or adds the action lines automatically. {"@timestamp": "2024-02-01T09:10:00Z", "time": "2024-02-01T09:10:00Z", "severityNumber": 9, "severityText": "INFO", "body": "[2024-02-01T09:10:00.123Z] \"GET /api/products HTTP/1.1\" 200 - 1024 45 frontend-6b7b4c9f-x2kl9", "traceId": "abcd1234efgh5678", "spanId": "span0001", "flags": 0, "instrumentationScope": {"name": "@opentelemetry/instrumentation-http", "version": "0.57.0", "droppedAttributesCount": 0}, "resource": {"attributes": {"service": {"name": "frontend"}, "host": {"name": "frontend-6b7b4c9f-x2kl9"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:11:00Z", "time": "2024-02-01T09:11:00Z", "severityNumber": 9, "severityText": "INFO", "body": "Order #1234 placed successfully by user U100", "traceId": "abcd1234efgh5678", "spanId": "span0002", "flags": 0, "instrumentationScope": {"name": "Microsoft.Extensions.Hosting", "version": "9.0.0", "droppedAttributesCount": 0}, "resource": {"attributes": {"service": {"name": "cart"}, "host": {"name": "cart-5d8f7b-mk29s"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:12:00Z", "time": "2024-02-01T09:12:00Z", "severityNumber": 13, "severityText": "WARN", "body": "Slow query detected: SELECT * FROM products WHERE category = 'electronics' took 3200ms", "traceId": "abcd1234efgh5678", "spanId": "span0003", "flags": 0, "instrumentationScope": {"name": "go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc", "version": "0.49.0", "droppedAttributesCount": 0}, "resource": {"attributes": {"service": {"name": "product-catalog"}, "host": {"name": "productcatalog-7c9d-zn4p2"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:13:00Z", "time": "2024-02-01T09:13:00Z", "severityNumber": 17, "severityText": "ERROR", "body": "Payment failed: connection timeout to payment gateway after 30000ms", "traceId": "ijkl9012mnop3456", "spanId": "span0004", "flags": 0, "instrumentationScope": {"name": "@opentelemetry/instrumentation-http", "version": "0.57.0", "droppedAttributesCount": 0}, "resource": {"attributes": {"service": {"name": "payment"}, "host": {"name": "payment-6f8d4b-ht7q3"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:14:00Z", "time": "2024-02-01T09:14:00Z", "severityNumber": 5, "severityText": "DEBUG", "body": "Cache miss for key user:session:U200 in Valkey cluster", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "cart"}, "host": {"name": "cart-5d8f7b-mk29s"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:15:00Z", "time": "2024-02-01T09:15:00Z", "severityNumber": 17, "severityText": "ERROR", "body": "NullPointerException in CheckoutService.placeOrder at line 142", "traceId": "qrst5678uvwx9012", "spanId": "span0006", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "checkout"}, "host": {"name": "checkout-8b4c2d-jp5r7"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:16:00Z", "time": "2024-02-01T09:16:00Z", "severityNumber": 9, "severityText": "INFO", "body": "User U300 authenticated via OAuth2 from 10.0.0.5", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "frontend"}, "host": {"name": "frontend-6b7b4c9f-x2kl9"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:17:00Z", "time": "2024-02-01T09:17:00Z", "severityNumber": 17, "severityText": "ERROR", "body": "Out of memory: Java heap space - shutting down pod payment-6f8d4b-ht7q3", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "payment"}, "host": {"name": "payment-6f8d4b-ht7q3"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:18:00Z", "time": "2024-02-01T09:18:00Z", "severityNumber": 13, "severityText": "WARN", "body": "Connection pool 80% utilized on database replica db-replica-02", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "product-catalog"}, "host": {"name": "productcatalog-7c9d-zn4p2"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:19:00Z", "time": "2024-02-01T09:19:00Z", "severityNumber": 9, "severityText": "INFO", "body": "Kafka consumer group rebalanced: 4 partitions assigned to consumer checkout-consumer-01", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "checkout"}, "host": {"name": "checkout-8b4c2d-jp5r7"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:20:00Z", "time": "2024-02-01T09:20:00Z", "severityNumber": 17, "severityText": "ERROR", "body": "[2024-02-01T09:20:00.456Z] \"POST /api/checkout HTTP/1.1\" 503 - 0 30000 checkout-8d4f7b-mk2p9", "traceId": "yzab3456cdef7890", "spanId": "span0011", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "frontend-proxy"}, "host": {"name": "frontendproxy-envoy-7d4b8c-xk2q9"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:21:00Z", "time": "2024-02-01T09:21:00Z", "severityNumber": 5, "severityText": "DEBUG", "body": "gRPC call /ProductCatalogService/GetProduct completed in 12ms", "traceId": "abcd1234efgh5678", "spanId": "span0012", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "product-catalog"}, "host": {"name": "productcatalog-7c9d-zn4p2"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:22:00Z", "time": "2024-02-01T09:22:00Z", "severityNumber": 13, "severityText": "WARN", "body": "SSL certificate for api.example.com expires in 14 days", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "frontend-proxy"}, "host": {"name": "frontendproxy-envoy-7d4b8c-xk2q9"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:23:00Z", "time": "2024-02-01T09:23:00Z", "severityNumber": 9, "severityText": "INFO", "body": "Deployment frontend-v2.2.0 rolled out successfully to 3/3 replicas", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "frontend"}, "host": {"name": "frontend-6b7b4c9f-x2kl9"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:24:00Z", "time": "2024-02-01T09:24:00Z", "severityNumber": 17, "severityText": "ERROR", "body": "Failed to process recommendation request: invalid product ID from 203.0.113.50", "traceId": "ghij7890klmn1234", "spanId": "span0015", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "recommendation"}, "host": {"name": "recommendation-5f7c-bn3k8"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:25:00Z", "time": "2024-02-01T09:25:00Z", "severityNumber": 17, "severityText": "ERROR", "body": "Database primary node unreachable: connection refused to db-primary-01:5432", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "product-catalog"}, "host": {"name": "productcatalog-7c9d-zn4p2"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:26:00Z", "time": "2024-02-01T09:26:00Z", "severityNumber": 9, "severityText": "INFO", "body": "Health check passed: all 5 dependencies healthy", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "frontend"}, "host": {"name": "frontend-6b7b4c9f-x2kl9"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:27:00Z", "time": "2024-02-01T09:27:00Z", "severityNumber": 13, "severityText": "WARN", "body": "Rate limit threshold reached: 450/500 requests per minute for API key ending in ...abc789", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "frontend-proxy"}, "host": {"name": "frontendproxy-envoy-7d4b8c-xk2q9"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:28:00Z", "time": "2024-02-01T09:28:00Z", "severityNumber": 5, "severityText": "DEBUG", "body": "Valkey SETEX user:session:U300 3600 - session refreshed", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "cart"}, "host": {"name": "cart-5d8f7b-mk29s"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:29:00Z", "time": "2024-02-01T09:29:00Z", "severityNumber": 17, "severityText": "ERROR", "body": "Kafka producer delivery failed: message too large for topic order-events (max 1048576 bytes)", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "checkout"}, "host": {"name": "checkout-8b4c2d-jp5r7"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}

github-actions · 2026-04-01T22:24:07Z

PR Code Suggestions ✨

Latest suggestions up to 844e0fe

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Fix malformed code block with stray lines The PPL code block for Example 3 is malformed. The closing triple backtick and the subsequent lines `\| where age > 30` and `\| fields state, age` appear to be leftover from the old example and are outside the code block, making the documentation incorrect and confusing. docs/user/ppl/cmd/replace.md [83-91] ```ppl source=otellogs \| where severityText = 'ERROR' \| replace "frontend-proxy" WITH "frontend proxy" IN `resource.attributes.service.name` \| fields `resource.attributes.service.name`, body \| head 3 -\| where age > 30 -\| fields state, age -``` <details><summary>Suggestion importance[1-10]: 9</summary> __ Why: The code block for Example 3 is genuinely malformed - the closing backticks appear mid-block and the lines `\| where age > 30` and `\| fields state, age` are leftover from the old example, appearing outside the new code block. This is a real documentation bug that would confuse readers. </details></details></td><td align=center>High </td></tr><tr><td> <details><summary>Fix unclosed syntax code block</summary> ___ The syntax code block is not closed before the <code>## Parameters</code> section begins, causing <br>the Parameters heading and all subsequent content to be rendered inside the code <br>block. A closing triple backtick is missing after the syntax line. [docs/user/ppl/cmd/transpose.md [9-15]](https://github.com/opensearch-project/sql/pull/5303/files#diff-ec1fe93388c8d21c3152db80ad223e173eeb0bfb9685a46b9d3ea389f31748b0R9-R15) ```diff The `transpose` command has the following syntax: transpose [int] [column_name=] +``` Parameters <details><summary>Suggestion importance[1-10]: 9</summary> __ Why: The syntax code block is missing its closing triple backtick before the `## Parameters` heading, which would cause the entire Parameters section and all subsequent content to be rendered inside the code block, breaking the document structure significantly. </details></details></td><td align=center>High </td></tr><tr><td> <details><summary>Non-playground PPL blocks lose copy button</summary> ___ *The non-playground branch reconstructs the block as <code> </code><code></code>ppl <code> but the subsequent regex </code><br><code></code>re.sub(r'^<code></code><code>ppl\b.$', '</code><code></code>sql', ...)<code> will then convert it to SQL anyway, effectively </code><br><code>making the playground-ineligible branch a no-op and losing the </code>{% include copy.html <br>%}<code> button for those blocks. The non-playground branch should emit </code> <code></code><code>sql </code> with a copy <br>button directly, or the second regex should be guarded to skip blocks already <br>converted.** [scripts/docs_exporter/export_to_docs_website.py [267-277]](https://github.com/opensearch-project/sql/pull/5303/files#diff-181ca7ed47f0c6ab9ec4a7da4a3f26e0c13c35b9d3cfe596f072eb1df128c84eR267-R277) ```diff content = re.sub( r'^```ppl\b[^\n]\n(.?)^```$', lambda m: ( '```sql\n' + m.group(1) + '```\n{% include copy.html %}\n{% include try-in-playground.html %}' if _is_playground_eligible(m.group(1), current_file_path) else ( - '```ppl\n' + m.group(1) + '```' + '```sql\n' + m.group(1) + '```\n{% include copy.html %}' ) ), content, flags=re.MULTILINE \| re.DOTALL, ) Suggestion importance[1-10]: 8 __ Why: The suggestion correctly identifies a logic bug: the non-playground branch emits ```ppl which is then converted to ```sql by the subsequent regex on line 280, but without adding a `{% include copy.html %}` button. The improved code directly emits ```sql with a copy button, which is the correct behavior and avoids the silent loss of copy buttons for non-playground PPL blocks.	Medium
	Fix duplicate queries obscuring keepempty behavior Both the `keepempty=true` and the default (drop nulls) sub-examples in Example 3 use identical PPL queries, making it impossible to distinguish the two behaviors. The first query should include `keepempty=true` to demonstrate including null values, matching the description that says "By default, records with null values are dropped." docs/user/ppl/cmd/dedup.md [86-128] The following query deduplicates by instrumentation scope name to see which OTel SDKs are reporting. By default, records with null values are dropped: ```ppl source=otellogs -\| dedup instrumentationScope.name +\| dedup instrumentationScope.name keepempty=true \| fields instrumentationScope.name \| sort instrumentationScope.name The query returns the following results: `-fetched rows / total rows = 3/3 +fetched rows / total rows = 4/4 ...` The following query deduplicates while ignoring documents with empty values in the specified field: `source=otellogs \| dedup instrumentationScope.name \| fields instrumentationScope.name \| sort instrumentationScope.name` <details><summary>Suggestion importance[1-10]: 8</summary> __ Why: Both sub-examples in Example 3 use identical PPL queries (`dedup instrumentationScope.name` without `keepempty=true`), making it impossible to distinguish the two behaviors described. The first query should include `keepempty=true` to properly demonstrate the difference between keeping and dropping null values. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Fix duplicate row in override example output</summary> ___ The result table for Example 3 shows <code>WARN</code> appearing twice with <code>agg=4</code>, which is <br>inconsistent with the explanation that <code>override=true</code> replaces main search values <br>with subsearch values. The subsearch only covers <code>ERROR</code> and <code>WARN</code>, so <code>INFO</code> and <code>DEBUG</code> <br>rows should retain their original counts, and <code>WARN</code> should appear only once. The <br>duplicate <code>WARN</code> row and the presence of <code>INFO</code> with an overridden value suggest the <br>expected output is incorrect. [docs/user/ppl/cmd/appendcol.md [95-104]](https://github.com/opensearch-project/sql/pull/5303/files#diff-d780d2b5919b1afcebbdbc20cc6a898119e430a7387532d8355a251ad895f203R95-R104) ```diff fetched rows / total rows = 4/4 +-----+--------------+ \| agg \| severityText \| \|-----+--------------\| \| 7 \| ERROR \| \| 4 \| WARN \| \| 6 \| INFO \| -\| 4 \| WARN \| +\| 3 \| DEBUG \| +-----+--------------+ Suggestion importance[1-10]: 5 __ Why: The result table shows `WARN` appearing twice and `INFO` with an overridden value, which is inconsistent with the described behavior of `override=true`. However, the suggested fix introduces a `DEBUG` row that may not match the actual query behavior, making the correction uncertain.	Low
	General	Add missing query results for prefix pattern example The "Matching with a prefix pattern" subsection shows a query but does not include its expected output results. Every other subsection in the document includes a results block, so the missing output makes this example inconsistent and incomplete for readers trying to verify their queries. docs/user/ppl/cmd/where.md [108-130] ### Matching with a prefix pattern The following query uses a percent sign (`%`) to find all services starting with `frontend`: ```ppl source=otellogs \| where LIKE(`resource.attributes.service.name`, 'frontend%') \| fields severityText, `resource.attributes.service.name`, body \| head 3 -### Matching with a wildcard pattern -... The query returns the following results: +text +fetched rows / total rows = 3/3 ++--------------+----------------------------------+-------------------------------------------------------------------------------------------+ +\| severityText \| resource.attributes.service.name \| body \| +\|--------------+----------------------------------+-------------------------------------------------------------------------------------------\| +\| WARN \| frontend-proxy \| SSL certificate for api.example.com expires in 14 days \| +\| ERROR \| frontend-proxy \| [2024-02-01T09:20:00.456Z] "POST /api/checkout HTTP/1.1" 503 - 0 30000 checkout-8d4f7b-mk2p9 \| +\| INFO \| frontend \| Incoming request: GET /api/products from 203.0.113.50 \| ++--------------+----------------------------------+-------------------------------------------------------------------------------------------+ + + +### Matching with a wildcard pattern + <details><summary>Suggestion importance[1-10]: 5</summary> __ Why: The "Matching with a prefix pattern" subsection is missing its expected output results, making it inconsistent with all other examples in the document. However, the suggested output data is fabricated and cannot be verified from the PR diff, so the score is moderate. </details></details></td><td align=center>Low </td></tr><tr><td> <details><summary>Fragile substring check for playground eligibility</summary> ___ The check <code>'source=otellogs' not in block_text</code> is a plain substring match and is <br>case-sensitive. A block containing <code>source=OtelLogs</code> or <code>source=otellogs_v2</code> would be <br>incorrectly handled. Consider using a more precise regex match to avoid false <br>positives and improve robustness. [scripts/docs_exporter/export_to_docs_website.py [258-259]](https://github.com/opensearch-project/sql/pull/5303/files#diff-181ca7ed47f0c6ab9ec4a7da4a3f26e0c13c35b9d3cfe596f072eb1df128c84eR258-R259) ```diff -if 'source=otellogs' not in block_text: +if not re.search(r'\bsource\s=\sotellogs\b', block_text): return False Suggestion importance[1-10]: 4 __ Why: The suggestion raises a valid concern about the plain substring match for `source=otellogs` potentially matching unintended strings like `source=otellogs_v2`. Using a regex with word boundaries would be more precise, though in practice the current dataset may not have such edge cases.	Low
		Fix inconsistent table column padding in results The table formatting for the `resource.attributes.service.name` column is inconsistent — `cart` and `product-catalog` rows have truncated padding compared to other rows. All rows should be padded to the same column width to maintain proper table alignment. docs/user/ppl/cmd/sort.md [52-61] fetched rows / total rows = 4/4 +--------------+----------------+----------------------------------+ \| severityText \| severityNumber \| resource.attributes.service.name \| \|--------------+----------------+----------------------------------\| -\| DEBUG \| 5 \| cart \| -\| DEBUG \| 5 \| product-catalog \| -\| DEBUG \| 5 \| cart \| +\| DEBUG \| 5 \| cart \| +\| DEBUG \| 5 \| product-catalog \| +\| DEBUG \| 5 \| cart \| \| INFO \| 9 \| frontend \| +--------------+----------------+----------------------------------+ Suggestion importance[1-10]: 4 __ Why: The `cart` and `product-catalog` rows have inconsistent padding in the `resource.attributes.service.name` column, breaking the visual alignment of the ASCII table. The fix correctly pads all values to match the column width.	Low

Previous suggestions

Suggestions up to commit 3df36ba

Category Suggestion Impact

Possible issue

Fix unclosed syntax code block

The syntax code block is missing its closing triple-backtick fence. The ##
Parameters section header and everything that follows it is being rendered inside
the code block instead of as normal Markdown. A closing fence should be added after
the syntax line.

docs/user/ppl/cmd/transpose.md [9-15]

 The `transpose` command has the following syntax:

transpose [int] [column_name=]
+```

Parameters

<details><summary>Suggestion importance[1-10]: 9</summary>

__

Why: The syntax code block is missing its closing triple-backtick fence, which would cause the `## Parameters` section and everything following it to be rendered inside the code block. This is a critical formatting bug that breaks the entire document structure.


</details></details></td><td align=center>High

</td></tr><tr><td>



<details><summary>Fix malformed code block in example</summary>

___


**The PPL code block for Example 3 is malformed. The closing triple-backtick of the <br>first code block is missing, causing the leftover lines <code>| where age > 30</code> and <code>| </code><br><code>fields state, age</code> to appear as stray text outside the code block. The closing fence <br>should be added after <code>| head 3</code>.**

[docs/user/ppl/cmd/replace.md [83-91]](https://github.com/opensearch-project/sql/pull/5303/files#diff-8644862a762e96b24031ad83e31729768a40b3387fda49b9a3a4f0c6f4c3de39R83-R91)

```diff
 ```ppl
 source=otellogs
 | where severityText = 'ERROR'
 | replace "frontend-proxy" WITH "frontend proxy" IN `resource.attributes.service.name`
 | fields `resource.attributes.service.name`, body
 | head 3

-| where age > 30
-| fields state, age
-```

<details><summary>Suggestion importance[1-10]: 8</summary>

__

Why: The code block for Example 3 is genuinely malformed - the closing triple-backtick is missing, causing `| where age > 30` and `| fields state, age` to appear as stray text outside the code block. This is a real rendering bug that would confuse readers.


</details></details></td><td align=center>Medium

</td></tr><tr><td>



<details><summary>Fix duplicate identical queries with different described behaviors</summary>

___


**Both queries in Example 3 are identical (both use <code>dedup instrumentationScope.name</code> <br>without <code>keepempty</code>), yet the text describes them as having different behaviors (one <br>drops nulls by default, the other "ignores documents with empty values"). The second <br>query should use <code>keepempty=true</code> to demonstrate including null values, matching the <br>original intent of the example.**

[docs/user/ppl/cmd/dedup.md [88-115]](https://github.com/opensearch-project/sql/pull/5303/files#diff-532601ea92e852acbb8b9bcdd2e269377bc05a311ec141be38148539d20c5bafR88-R115)

```diff
-The following query deduplicates by instrumentation scope name to see which OTel SDKs are reporting. By default, records with null values are dropped:
+The following query deduplicates while keeping documents with empty values in the specified field:
   
 ```ppl
 source=otellogs
-| dedup instrumentationScope.name
-| fields instrumentationScope.name
-| sort instrumentationScope.name
-```
-  
-The query returns the following results:
-  
-```text
-fetched rows / total rows = 3/3
-...
-```
-  
-The following query deduplicates while ignoring documents with empty values in the specified field:
-  
-```ppl
-source=otellogs
-| dedup instrumentationScope.name
+| dedup instrumentationScope.name keepempty=true
 | fields instrumentationScope.name
 | sort instrumentationScope.name

<details><summary>Suggestion importance[1-10]: 8</summary>

__

Why: Both queries in Example 3 are identical (both use `dedup instrumentationScope.name` without `keepempty`), yet the surrounding text describes them as having different behaviors. The second query should use `keepempty=true` to correctly demonstrate including null values, as the text implies.


</details></details></td><td align=center>Medium

</td></tr><tr><td rowspan=2>General</td>
<td>



<details><summary>Use robust file extension stripping for stem extraction</summary>

___


**The <code>stem</code> extraction for non-Path objects uses <code>rsplit('/', 1)[-1]</code> which returns the <br>full filename including the <code>.md</code> extension before calling <code>.replace('.md', '')</code>. <br>However, if the file has no <code>/</code> in its path, <code>rsplit('/', 1)[-1]</code> still returns the full <br>string correctly. The bigger issue is that <code>PLAYGROUND_UNSUPPORTED_CMDS</code> checks the <br>file stem (e.g., <code>multisearch</code>) but the set only contains <code>'multisearch'</code> — if the <br>actual filename is <code>multisearch.md</code>, the stem would be <code>multisearch</code>, which works. But <br>the check <code>stem in PLAYGROUND_UNSUPPORTED_CMDS</code> uses a set lookup on a <code>set</code>, which is <br>correct. This logic is fine, but the <code>str(file_path).rsplit('/', </code><br><code>1)[-1].replace('.md', '')</code> approach will fail for files with extensions other than <br><code>.md</code> (e.g., <code>.mdx</code>). Consider using <code>os.path.splitext</code> for robustness.**

[scripts/docs_exporter/export_to_docs_website.py [258-264]](https://github.com/opensearch-project/sql/pull/5303/files#diff-181ca7ed47f0c6ab9ec4a7da4a3f26e0c13c35b9d3cfe596f072eb1df128c84eR258-R264)

```diff
 if 'source=otellogs' not in block_text:
     return False
 if file_path:
-    stem = file_path.stem if hasattr(file_path, 'stem') else str(file_path).rsplit('/', 1)[-1].replace('.md', '')
+    import os
+    stem = file_path.stem if hasattr(file_path, 'stem') else os.path.splitext(str(file_path).rsplit('/', 1)[-1])[0]
     if stem in PLAYGROUND_UNSUPPORTED_CMDS:
         return False
 return True

Suggestion importance[1-10]: 3

__

Why: The suggestion to use os.path.splitext instead of .replace('.md', '') is a minor robustness improvement for non-.md extensions, but the codebase appears to only process .md files, making this an edge case with low practical impact. Also, adding an import inside a nested function is not ideal style.

Low

Fix misaligned table column values in example output

The unknown values in the result table for Example 1 are not padded to align with
the column width, causing the table to appear misformatted compared to the other
rows. The values should be padded with trailing spaces to match the column width
defined by the header.

docs/user/ppl/cmd/fillnull.md [53-57]

-| ERROR        | checkout                         | unknown                                                            |
-| ERROR        | checkout                         | unknown                                                            |
-| ERROR        | frontend-proxy                   | unknown                                                            |
-| WARN         | frontend-proxy                   | unknown                                                            |
-| WARN         | frontend-proxy                   | unknown                                                            |
+| ERROR        | checkout                         | unknown                                                                     |
+| ERROR        | checkout                         | unknown                                                                     |
+| ERROR        | frontend-proxy                   | unknown                                                                     |
+| WARN         | frontend-proxy                   | unknown                                                                     |
+| WARN         | frontend-proxy                   | unknown                                                                     |

Suggestion importance[1-10]: 2

__

Why: The misalignment of unknown values in the ASCII table is a cosmetic issue. In Markdown rendered output, the table borders are defined by the separator row, so the padding difference has minimal visual impact on the final rendered documentation.

Low

Suggestions up to commit 5121c46

Category	Suggestion	Impact
Possible issue	Fix malformed code block with stray lines The PPL code block for Example 3 is malformed. The closing triple backtick for the first code block is missing, and leftover lines from the old example (`\| where age >` `30` and `\| fields state, age`) appear outside the code block. These stray lines should be removed and the code block properly closed. docs/user/ppl/cmd/replace.md [83-91] ```ppl source=otellogs \| where severityText = 'ERROR' \| replace "frontend-proxy" WITH "frontend proxy" IN `resource.attributes.service.name` \| fields `resource.attributes.service.name`, body \| head 3 -\| where age > 30 -\| fields state, age -``` <details><summary>Suggestion importance[1-10]: 9</summary> __ Why: The code block for Example 3 is clearly malformed - the closing triple backtick is missing and stray lines (`\| where age > 30` and `\| fields state, age`) from the old example appear outside the code block, which would break the rendered documentation significantly. </details></details></td><td align=center>High </td></tr><tr><td> <details><summary>Fix unclosed syntax code block</summary> ___ The syntax code block is not closed before the <code>## Parameters</code> section begins, causing <br>the entire Parameters section and everything that follows to be rendered as part of <br>the code block. A closing triple backtick must be added after the syntax line. [docs/user/ppl/cmd/transpose.md [9-15]](https://github.com/opensearch-project/sql/pull/5303/files#diff-ec1fe93388c8d21c3152db80ad223e173eeb0bfb9685a46b9d3ea389f31748b0R9-R15) ```diff The `transpose` command has the following syntax: transpose [int] [column_name=] +``` Parameters <details><summary>Suggestion importance[1-10]: 9</summary> __ Why: The syntax code block starting with triple backticks is never closed before the `## Parameters` heading, which would cause the entire Parameters section and subsequent content to be rendered as part of the code block, severely breaking the documentation layout. </details></details></td><td align=center>High </td></tr><tr><td> <details><summary>Non-playground PPL blocks lose copy button</summary> ___ *The non-playground branch reconstructs the block as <code> </code><code></code>ppl <code> but the subsequent regex </code><br><code></code>re.sub(r'^<code></code><code>ppl\b.$', '</code><code></code>sql', ...)<code> will then convert it to SQL anyway, effectively </code><br><code>making the playground-ineligible branch a no-op and losing the </code>{% include copy.html <br>%}<code> button for those blocks. The non-playground branch should emit </code> <code></code><code>sql </code> with a copy <br>button directly, or the second regex should be guarded to skip already-converted <br>blocks.** [scripts/docs_exporter/export_to_docs_website.py [267-277]](https://github.com/opensearch-project/sql/pull/5303/files#diff-181ca7ed47f0c6ab9ec4a7da4a3f26e0c13c35b9d3cfe596f072eb1df128c84eR267-R277) ```diff content = re.sub( r'^```ppl\b[^\n]\n(.?)^```$', lambda m: ( '```sql\n' + m.group(1) + '```\n{% include copy.html %}\n{% include try-in-playground.html %}' if _is_playground_eligible(m.group(1), current_file_path) else ( - '```ppl\n' + m.group(1) + '```' + '```sql\n' + m.group(1) + '```\n{% include copy.html %}' ) ), content, flags=re.MULTILINE \| re.DOTALL, ) Suggestion importance[1-10]: 8 __ Why: This is a valid and important bug: the non-playground branch emits ```ppl which is then converted to ```sql by the subsequent regex on line 280, but without adding a `{% include copy.html %}` button. The fix correctly changes the non-playground branch to emit ```sql with a copy button directly, ensuring consistent behavior.	Medium
	Fix duplicate queries with different described behaviors Example 3 contains two identical PPL queries but the description says one uses `keepempty=true` and the other does not. The first query should include `keepempty=true` to demonstrate the difference in behavior, otherwise the two sub-examples are indistinguishable and the explanation is misleading. docs/user/ppl/cmd/dedup.md [86-115] The following query deduplicates by instrumentation scope name to see which OTel SDKs are reporting. By default, records with null values are dropped: ```ppl source=otellogs -\| dedup instrumentationScope.name +\| dedup instrumentationScope.name keepempty=true \| fields instrumentationScope.name \| sort instrumentationScope.name The query returns the following results: `fetched rows / total rows = 3/3 ...` The following query deduplicates while ignoring documents with empty values in the specified field: `source=otellogs \| dedup instrumentationScope.name \| fields instrumentationScope.name \| sort instrumentationScope.name` <details><summary>Suggestion importance[1-10]: 8</summary> __ Why: Both PPL queries in Example 3 are identical (`dedup instrumentationScope.name` without `keepempty=true`), but the descriptions say they demonstrate different behaviors. The first query should include `keepempty=true` to correctly illustrate the contrast between keeping and dropping null values. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Incorrect example output for override parameter behavior</summary> ___ The result table for Example 3 (override parameter) appears inconsistent: <code>WARN</code> <br>appears twice with the same count (<code>4</code>), and <code>INFO</code> and <code>DEBUG</code> rows are present even <br>though the subsearch only covers <code>ERROR</code> and <code>WARN</code>. With <code>override=true</code>, the main search <br>rows for <code>INFO</code> and <code>DEBUG</code> should retain their original <code>agg</code> values (since the subsearch <br>has no matching row to override them), and <code>WARN</code> should appear only once. The example <br>output is misleading and may confuse readers about how <code>override</code> works. [docs/user/ppl/cmd/appendcol.md [94-103]](https://github.com/opensearch-project/sql/pull/5303/files#diff-d780d2b5919b1afcebbdbc20cc6a898119e430a7387532d8355a251ad895f203R94-R103) ```diff fetched rows / total rows = 4/4 +-----+--------------+ \| agg \| severityText \| \|-----+--------------\| \| 7 \| ERROR \| -\| 4 \| WARN \| \| 6 \| INFO \| +\| 3 \| DEBUG \| \| 4 \| WARN \| +-----+--------------+ Suggestion importance[1-10]: 4 __ Why: The result table shows `WARN` appearing twice with the same count, which is inconsistent with how `override=true` should work. However, the "improved_code" provided may also not be accurate since it's unclear what the actual query would return, making this a potentially misleading correction.	Low
	General	Missing result block for a query example The prefix pattern subsection shows a PPL query but is missing its corresponding result block. Every other subsection in the document includes a "The query returns the following results:" block with a sample output table. Without it, the documentation is incomplete and inconsistent. docs/user/ppl/cmd/where.md [108-119] ### Matching with a prefix pattern The following query uses a percent sign (`%`) to find all services starting with `frontend`: ```ppl source=otellogs \| where LIKE(`resource.attributes.service.name`, 'frontend%') \| fields severityText, `resource.attributes.service.name`, body \| head 3 +The query returns the following results: + +text +fetched rows / total rows = 3/3 ++--------------+----------------------------------+-----------------------------------------------------------------------+ +\| severityText \| resource.attributes.service.name \| body \| +\|--------------+----------------------------------+-----------------------------------------------------------------------\| +\| ... \| frontend \| ... \| +\| ... \| frontend-proxy \| ... \| +\| ... \| frontend-proxy \| ... \| ++--------------+----------------------------------+-----------------------------------------------------------------------+ + + Matching with a wildcard pattern <details><summary>Suggestion importance[1-10]: 5</summary> __ Why: The "Matching with a prefix pattern" subsection includes a PPL query but no corresponding result block, making it inconsistent with every other example in the document. Adding the missing result block would improve documentation completeness. </details></details></td><td align=center>Low </td></tr><tr><td> <details><summary>Fix inconsistent column padding in result table</summary> ___ The result table for Example 1 has inconsistent column alignment: the <code>cart</code> and <br><code>product-catalog</code> values in the <code>resource.attributes.service.name</code> column are not padded <br>to match the column width, unlike <code>frontend</code>. This makes the table render incorrectly <br>in markdown and looks unprofessional in documentation. [docs/user/ppl/cmd/sort.md [52-61]](https://github.com/opensearch-project/sql/pull/5303/files#diff-10553d209efeb9e9bd7e1745b91efeedb8ab100d2ef2209cbced0fe9dadb32a2R52-R61) ```diff fetched rows / total rows = 4/4 +--------------+----------------+----------------------------------+ \| severityText \| severityNumber \| resource.attributes.service.name \| \|--------------+----------------+----------------------------------\| -\| DEBUG \| 5 \| cart \| -\| DEBUG \| 5 \| product-catalog \| -\| DEBUG \| 5 \| cart \| +\| DEBUG \| 5 \| cart \| +\| DEBUG \| 5 \| product-catalog \| +\| DEBUG \| 5 \| cart \| \| INFO \| 9 \| frontend \| +--------------+----------------+----------------------------------+ Suggestion importance[1-10]: 4 __ Why: The `cart` and `product-catalog` values in the result table are not padded to match the column width, causing inconsistent table formatting. The fix correctly aligns the column values to match the header width.	Low
		Case-sensitive playground eligibility check The check `'source=otellogs' not in block_text` is a plain substring match and is case-sensitive. A block containing `source=OtelLogs` or `SOURCE=otellogs` would be incorrectly excluded. Consider using a case-insensitive check or a regex to make the detection more robust. scripts/docs_exporter/export_to_docs_website.py [258-259] -if 'source=otellogs' not in block_text: +if 'source=otellogs' not in block_text.lower(): return False Suggestion importance[1-10]: 3 __ Why: While technically valid, the `source=otellogs` check is used against controlled documentation content where the casing is consistent. This is a minor robustness improvement with low practical impact in this context.	Low
		Fix blockquote formatting inconsistency The note is missing a space after the `>` blockquote marker, which is inconsistent with standard Markdown formatting conventions used elsewhere in the document (for example, the `> Note:` style used in `rex.md`). This may render incorrectly in some Markdown parsers. docs/user/ppl/cmd/convert.md [256] ->Note: The `none()` function is useful with wildcard support, allowing you to exclude specific fields from bulk conversions. +> Note: The `none()` function is useful with wildcard support, allowing you to exclude specific fields from bulk conversions. Suggestion importance[1-10]: 3 __ Why: The `>Note:` is missing a space after `>`, which is inconsistent with standard Markdown formatting. While minor, fixing it ensures consistent rendering across Markdown parsers.	Low

Suggestions up to commit 2000094

Category	Suggestion	Impact
Possible issue	Remove stray code lines outside code block The PPL code block for Example 3 is malformed. There is a stray `\| where age > 30` and `\| fields state, age` outside the code fence, which are leftover lines from the old example that were not properly removed. These lines should be deleted. docs/user/ppl/cmd/replace.md [83-91] ```ppl source=otellogs \| where severityText = 'ERROR' \| replace "frontend-proxy" WITH "frontend proxy" IN `resource.attributes.service.name` \| fields `resource.attributes.service.name`, body \| head 3 -\| where age > 30 -\| fields state, age -``` <details><summary>Suggestion importance[1-10]: 9</summary> __ Why: The code block for Example 3 is clearly malformed - there are leftover lines `\| where age > 30` and `\| fields state, age` outside the closing code fence, which are remnants from the old example. This is a real bug in the documentation that would confuse readers. </details></details></td><td align=center>High </td></tr><tr><td> <details><summary>Fix duplicate identical queries in example</summary> ___ Example 3 contains two identical PPL queries but claims they demonstrate different <br>behaviors (one with <code>keepempty=true</code> and one without). The second query should use <br><code>keepempty=true</code> to actually demonstrate including null values, matching the <br>description "deduplicates while ignoring documents with empty values" (which should <br>instead show <code>keepempty=true</code> behavior). The queries and their descriptions need to be <br>differentiated. [docs/user/ppl/cmd/dedup.md [86-115]](https://github.com/opensearch-project/sql/pull/5303/files#diff-532601ea92e852acbb8b9bcdd2e269377bc05a311ec141be38148539d20c5bafR86-R115) ```diff The following query deduplicates by instrumentation scope name to see which OTel SDKs are reporting. By default, records with null values are dropped: ```ppl source=otellogs \| dedup instrumentationScope.name \| fields instrumentationScope.name \| sort instrumentationScope.name The query returns the following results: `fetched rows / total rows = 3/3 -... ++-----------------------------------------------------------------------------+ +\| instrumentationScope.name \| +\|-----------------------------------------------------------------------------\| +\| @opentelemetry/instrumentation-http \| +\| Microsoft.Extensions.Hosting \| +\| go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc \| ++-----------------------------------------------------------------------------+` -The following query deduplicates while ignoring documents with empty values in the specified field: +The following query deduplicates while keeping documents with empty values in the specified field: `source=otellogs -\| dedup instrumentationScope.name +\| dedup instrumentationScope.name keepempty=true \| fields instrumentationScope.name \| sort instrumentationScope.name` <details><summary>Suggestion importance[1-10]: 8</summary> __ Why: The two PPL queries in Example 3 are identical but claim to demonstrate different behaviors. The second query should include `keepempty=true` to actually show the described behavior of "ignoring documents with empty values," making this a genuine documentation correctness issue. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Non-playground PPL blocks skip SQL conversion</summary> ___ The non-playground branch preserves the original <code>ppl</code> fence, which means the <br>subsequent <code>re.sub</code> for converting remaining PPL fences to SQL will still match and <br>convert them. However, the non-playground branch does NOT add a copy button, while <br>the original code added one for all SQL fences. This is likely intentional, but the <br>non-playground branch should also add a copy button after converting to SQL, or the <br>conversion to SQL should happen inside this lambda for the non-playground case as <br>well, to ensure consistent behavior. [scripts/docs_exporter/export_to_docs_website.py [259-269]](https://github.com/opensearch-project/sql/pull/5303/files#diff-181ca7ed47f0c6ab9ec4a7da4a3f26e0c13c35b9d3cfe596f072eb1df128c84eR259-R269) ```diff content = re.sub( r'^```ppl\b[^\n]\n(.?)^```$', lambda m: ( '```sql\n' + m.group(1) + '```\n{% include copy.html %}\n{% include try-in-playground.html %}' if _is_playground_eligible(m.group(1), current_file_path) else ( - '```ppl\n' + m.group(1) + '```' + '```sql\n' + m.group(1) + '```' ) ), content, flags=re.MULTILINE \| re.DOTALL, ) Suggestion importance[1-10]: 7 __ Why: The non-playground branch preserves the `ppl` fence, which will be caught by the subsequent `re.sub` on line 272 that converts remaining PPL fences to SQL. However, that conversion won't add a copy button, while playground-eligible blocks do get one. The suggestion to convert to `sql` directly in the non-playground branch is valid and would ensure consistent SQL conversion, though the copy button omission for non-playground blocks may be intentional design.	Medium
	Fix incorrect duplicate row in example result table The result table for Example 3 shows `WARN` appearing twice with the same count (`4`), which looks like a data error. The `override=true` behavior should replace the main search `agg` values with subsearch values only for matching rows (`ERROR` and `WARN`), leaving `INFO` and `DEBUG` from the main search unchanged. The duplicate `WARN` row and missing `DEBUG` row make the example misleading and potentially incorrect. docs/user/ppl/cmd/appendcol.md [95-104] fetched rows / total rows = 4/4 +-----+--------------+ \| agg \| severityText \| \|-----+--------------\| +\| 3 \| DEBUG \| \| 7 \| ERROR \| -\| 4 \| WARN \| \| 6 \| INFO \| \| 4 \| WARN \| +-----+--------------+ Suggestion importance[1-10]: 6 __ Why: The result table shows `WARN` appearing twice with the same count and is missing a `DEBUG` row, which appears to be a data error that could mislead readers about the `override=true` behavior.	Low
	General	Fix non-sequential example numbering The examples jump from Example 4 to Example 6, skipping Example 5. This appears to be a numbering error introduced in the PR when examples were reorganized. The example numbering should be sequential to avoid confusing readers. docs/user/ppl/cmd/sort.md [180] -## Example 6: Specifying the number of sorted documents to return +## Example 5: Specifying the number of sorted documents to return Suggestion importance[1-10]: 7 __ Why: The examples jump from Example 4 to Example 6, skipping Example 5. This is a real numbering error in the PR that would confuse readers navigating the documentation.	Medium
		Fix filter placement affecting aggregation correctness The `where` clause filters by `severityText` after `eventstats`, but `eventstats` computes counts over all rows (not just ERROR rows). This means `scope_count` reflects counts across all severity levels, not just errors, which is misleading. The `where` clause should be placed before `eventstats` to ensure the count reflects only ERROR entries, or the description should clarify this behavior. docs/user/ppl/cmd/eventstats.md [141-146] ```ppl source=otellogs +\| where severityText = 'ERROR' \| eventstats bucket_nullable=false count() as scope_count by instrumentationScope.name -\| where severityText = 'ERROR' \| sort `resource.attributes.service.name` \| fields `resource.attributes.service.name`, `instrumentationScope.name`, scope_count <details><summary>Suggestion importance[1-10]: 5</summary> __ Why: The `where severityText = 'ERROR'` clause is placed after `eventstats`, meaning `scope_count` reflects counts across all severity levels rather than just errors. While this may be intentional to demonstrate `bucket_nullable=false`, the query logic could be misleading. However, this may be a deliberate design choice for the example, so the impact is moderate. </details></details></td><td align=center>Low </td></tr><tr><td> <details><summary>Add missing result block for first subsection</summary> ___ The "Matching with a prefix pattern" subsection shows a query but has no result <br>block, while only the second subsection ("Matching with a wildcard pattern") shows <br>results. Readers of the first subsection will be left without example output. A <br>result block should be added after the first query, or a note should clarify that <br>only the second query's results are shown. [docs/user/ppl/cmd/where.md [108-130]](https://github.com/opensearch-project/sql/pull/5303/files#diff-ea8ca7481f94d3d9401c4057d33afd640010be2df62b62730137101ef083c140R108-R130) ```diff ### Matching with a prefix pattern The following query uses a percent sign (`%`) to find all services starting with `frontend`: ```ppl source=otellogs \| where LIKE(`resource.attributes.service.name`, 'frontend%') \| fields severityText, `resource.attributes.service.name`, body \| head 3 +``` + +The query returns the following results: + +```text +fetched rows / total rows = 3/3 ++--------------+----------------------------------+-------+ +\| severityText \| resource.attributes.service.name \| body \| +\|--------------+----------------------------------+-------\| +\| ... \| frontend \| ... \| +\| ... \| frontend-proxy \| ... \| +\| ... \| frontend-proxy \| ... \| ++--------------+----------------------------------+-------+ Matching with a wildcard pattern The following query finds all logs from services containing `product` in their name: source=otellogs \| where LIKE(`resource.attributes.service.name`, '%product%') \| fields severityText, `resource.attributes.service.name`, body \| head 3 The query returns the following results: <details><summary>Suggestion importance[1-10]: 4</summary> __ Why: The "Matching with a prefix pattern" subsection lacks a result block, which is inconsistent with the rest of the documentation. However, the `improved_code` contains placeholder data (`...`) rather than actual query results, making it an incomplete fix. </details></details></td><td align=center>Low </td></tr><tr><td> <details><summary>Fix misaligned table column padding in result output</summary> ___ The <code>unknown</code> values in the result table are not padded to align with the column <br>width, causing the table to appear misformatted. The <code>unknown</code> entries should be <br>padded with trailing spaces to match the column width defined by the header, <br>consistent with the formatting of other result tables in the documentation. [docs/user/ppl/cmd/fillnull.md [53-64]](https://github.com/opensearch-project/sql/pull/5303/files#diff-e05ce7bc17faa0105fe2b13aeacef92b8f08f90aac016e003b615f5f575efaa4R53-R64) ```diff -\| ERROR \| checkout \| unknown \| -\| ERROR \| checkout \| unknown \| -\| ERROR \| frontend-proxy \| unknown \| -\| WARN \| frontend-proxy \| unknown \| -\| WARN \| frontend-proxy \| unknown \| +\| ERROR \| checkout \| unknown \| +\| ERROR \| checkout \| unknown \| +\| ERROR \| frontend-proxy \| unknown \| +\| WARN \| frontend-proxy \| unknown \| +\| WARN \| frontend-proxy \| unknown \| \| ERROR \| payment \| @opentelemetry/instrumentation-http \| -\| ERROR \| payment \| unknown \| +\| ERROR \| payment \| unknown \| \| WARN \| product-catalog \| go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc \| -\| WARN \| product-catalog \| unknown \| -\| ERROR \| product-catalog \| unknown \| -\| ERROR \| recommendation \| unknown \| +\| WARN \| product-catalog \| unknown \| +\| ERROR \| product-catalog \| unknown \| +\| ERROR \| recommendation \| unknown \| Suggestion importance[1-10]: 4 __ Why: The `unknown` values in the result table are not padded to match the column width, causing visual misalignment. This is a cosmetic issue that affects readability but not correctness.	Low
Lookahead pattern may not reliably prevent duplicate copy buttons The negative lookahead `(?!\n{% include copy)` uses `re.DOTALL`, which makes `.` match newlines, but the lookahead itself checks for a literal newline `\n` followed by `{%`. With `re.DOTALL`, the `$` in `^`$(?!\n...) `matches end-of-line (due to` re.MULTILINE`), so the lookahead should work correctly. However, the`{% include try-in-playground.html %} `button is added after` {% include copy.html %}`, so the pattern` (?!\n{% include copy) `will correctly skip blocks that already have a copy button. Verify that the` `lookahead correctly handles cases where the playground button was added (which also` `includes a copy button before it) to avoid double copy buttons.` scripts/docs_exporter/export_to_docs_website.py [280-282] -content = re.sub(r'^```(bash\|sh\|sql)\b[^\n]\n(.?)^```$(?!\n\{% include copy)', +content = re.sub(r'^```(bash\|sh\|sql)\b[^\n]\n(.?)^```$(?!\n\{%[ ]include copy)', r'```\1\n\2```\n{% include copy.html %}', content, flags=re.MULTILINE \| re.DOTALL) Suggestion importance[1-10]: 3 __ Why: The suggestion to add a space in the lookahead pattern (`{%[ ]include copy`) is a minor robustness improvement, but the existing pattern `{% include copy` already matches the literal string produced by the code. The `improved_code` change is minimal and the concern about double copy buttons is valid but the existing logic should handle it correctly since playground blocks add `copy.html` before `try-in-playground.html`.		Low
Add missing newline at end of file The file is missing a newline at the end. Files should end with a newline character to follow standard text file conventions and avoid potential issues with file concatenation or diff tools. docs/user/ppl/cmd/appendpipe.md [86] +* Schema compatibility: When fields with the same name exist in both the main search and the subpipeline but have incompatible types, the query fails with an error. To avoid type conflicts, ensure that fields with the same name share the same data type. Alternatively, use different field names. You can rename the conflicting fields using `eval` or select non-conflicting columns using `fields`. - Suggestion importance[1-10]: 1 __ Why: The `existing_code` and `improved_code` are identical, meaning no actual change is proposed. Adding a trailing newline is a very minor style concern with negligible impact.		Low

Suggestions up to commit c051fd0

Category Suggestion Impact

Possible issue

Remove orphaned code lines outside code fence

The PPL code block for Example 3 is broken — there is a stray closing
triple-backtick followed by two orphaned pipeline stages (| where age > 30 and |
fields state, age) that appear outside the code fence. These leftover lines from the
old example were not removed and will render as raw text in the documentation.

docs/user/ppl/cmd/replace.md [83-91]

 ```ppl
 source=otellogs
 | where severityText = 'ERROR'
 | replace "frontend-proxy" WITH "frontend proxy" IN `resource.attributes.service.name`
 | fields `resource.attributes.service.name`, body
 | head 3

-| where age > 30
-| fields state, age
-```

<details><summary>Suggestion importance[1-10]: 8</summary>

__

Why: The `| where age > 30` and `| fields state, age` lines are leftover from the old example and appear outside the code fence, which will render as raw text in the documentation and break the example's presentation.


</details></details></td><td align=center>Medium

</td></tr><tr><td>



<details><summary>Prevent duplicate copy buttons on playground-eligible blocks</summary>

___


**The non-playground branch reconstructs the block as <code> </code><code></code>ppl <code> but the subsequent regex </code><br><code></code>re.sub(r'^<code></code><code>ppl\b.*$', '</code><code></code>sql', ...)<code> will then convert it to SQL anyway, which is the </code><br><code>intended behavior. However, the non-playground branch is missing the copy button and </code><br><code>the try-in-playground button, so the final output for non-otellogs PPL blocks will </code><br><code>only get a copy button from the later generic SQL regex — this is fine, but the </code><br><code>playground-eligible branch adds the copy button inline AND the later regex will add </code><br><code>another copy button because the </code>(?!\n\{% include copy)<code> lookahead checks for a </code><br><code>newline before </code>{% include copy<code>, but the playground block ends with </code>{% include <br>try-in-playground.html %}<code>, not </code>{% include copy<code>. This means playground-eligible </code><br><code>blocks will get a duplicate </code>{% include copy.html %}<code> appended.</code>**

[scripts/docs_exporter/export_to_docs_website.py [259-282]](https://github.com/opensearch-project/sql/pull/5303/files#diff-181ca7ed47f0c6ab9ec4a7da4a3f26e0c13c35b9d3cfe596f072eb1df128c84eR259-R282)

```diff
 content = re.sub(
     r'^```ppl\b[^\n]*\n(.*?)^```$',
     lambda m: (
         '```sql\n' + m.group(1) + '```\n{% include copy.html %}\n{% include try-in-playground.html %}'
         if _is_playground_eligible(m.group(1), current_file_path)
         else (
             '```ppl\n' + m.group(1) + '```'
         )
     ),
     content, flags=re.MULTILINE | re.DOTALL,
 )
 
+# Convert remaining PPL code fences to SQL
+content = re.sub(r'^```ppl\b.*$', '```sql', content, flags=re.MULTILINE)
+
+# Add copy buttons after sql/bash/sh fences that don't already have one
+content = re.sub(r'^```(bash|sh|sql)\b[^\n]*\n(.*?)^```$(?!\n\{% include copy)(?!\n\{% include try-in-playground)',
+                 r'```\1\n\2```\n{% include copy.html %}',
+                 content, flags=re.MULTILINE | re.DOTALL)
+

Suggestion importance[1-10]: 7

__

Why: This is a valid bug: playground-eligible blocks already include {% include copy.html %}, but the later regex at line 280 checks only for (?!\n\{% include copy) — since those blocks end with {% include try-in-playground.html %}, the lookahead won't match and a second copy button will be appended. The fix to also exclude blocks followed by {% include try-in-playground is correct and important.

Medium

Fix duplicate queries with different intended behaviors

The two sub-examples in Example 3 now use identical PPL queries (both lack
keepempty=true) and return identical results, making them indistinguishable. The
first sub-example should use keepempty=true to demonstrate retaining null values, as
the preceding description states "by default, records with null values are dropped."

docs/user/ppl/cmd/dedup.md [108-128]

-The following query deduplicates while ignoring documents with empty values in the specified field:
+The following query deduplicates by instrumentation scope name to see which OTel SDKs are reporting. By default, records with null values are dropped:
   
 ```ppl
 source=otellogs
-| dedup instrumentationScope.name
+| dedup instrumentationScope.name keepempty=true
 | fields instrumentationScope.name
 | sort instrumentationScope.name

-The query returns the following results:

-```text
-fetched rows / total rows = 3/3
-+-----------------------------------------------------------------------------+

-	instrumentationScope.name
-	@opentelemetry/instrumentation-http
-	Microsoft.Extensions.Hosting
-	go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc
-+-----------------------------------------------------------------------------+
-```

<details><summary>Suggestion importance[1-10]: 7</summary>

__

Why: Both sub-examples in Example 3 use identical queries without `keepempty=true`, making them indistinguishable despite the description implying different behaviors. The first sub-example should include `keepempty=true` to match its stated purpose.


</details></details></td><td align=center>Medium

</td></tr><tr><td>



<details><summary>Fix incorrect duplicate row in example output</summary>

___


**The result table for Example 3 shows <code>WARN</code> appearing twice with the same count (<code>4</code>), <br>which looks like a data error. With <code>override=true</code>, the subsearch values should <br>replace the main search values for matching rows, so <code>INFO</code> and <code>DEBUG</code> rows (which have <br>no match in the subsearch) should retain their original counts while <code>ERROR</code> and <code>WARN</code> <br>rows get replaced. The current output is misleading and inconsistent with the <br>described behavior.**

[docs/user/ppl/cmd/appendcol.md [96-103]](https://github.com/opensearch-project/sql/pull/5303/files#diff-d780d2b5919b1afcebbdbc20cc6a898119e430a7387532d8355a251ad895f203R96-R103)

```diff
 +-----+--------------+
 | agg | severityText |
 |-----+--------------|
+| 3   | DEBUG        |
 | 7   | ERROR        |
-| 4   | WARN         |
 | 6   | INFO         |
 | 4   | WARN         |
 +---...

github-actions · 2026-04-02T07:40:44Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit f0e3a6a.

Path	Line	Severity	Description
docs/user/ppl/cmd/stats.md	18	low	Query result output is embedded directly inside section headings and aggregation function bullet points (e.g., '## fetched rows / total rows = 1/1 ... +arameters' and 'VAR_SAMfetched rows...+--...+arameter'). This corrupted content is anomalous and inconsistent with normal editing behavior, though it appears to be an accidental copy-paste error rather than deliberate injection.
docs/user/ppl/cmd/head.md	15	low	Query result tables are embedded inside the '## Parameters' section heading and the parameter table header, producing malformed markdown. The same pattern as stats.md — anomalous but likely an accidental copy-paste error during the documentation rewrite.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 0 | Medium: 0 | Low: 2

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

github-actions · 2026-04-02T07:41:53Z

Persistent review updated to latest commit d0b2a08

github-actions · 2026-04-03T07:20:18Z

Persistent review updated to latest commit f0e3a6a

github-actions · 2026-04-06T07:14:48Z

Persistent review updated to latest commit c051fd0

github-actions · 2026-04-29T16:47:43Z

Persistent review updated to latest commit 2000094

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

github-actions · 2026-05-04T05:17:49Z

Persistent review updated to latest commit 5121c46

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

github-actions · 2026-05-04T05:25:52Z

Persistent review updated to latest commit 3df36ba

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

github-actions · 2026-05-04T05:38:56Z

Persistent review updated to latest commit 844e0fe

ritvibhatt added 7 commits April 29, 2026 15:09

update ppl doc examples to use otel data

5326862

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

update sample data

e37c073

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

update script and examples

ea1148e

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

update otel examples

30ed2bc

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

apply feedback from documentation website to docs

ad3d7fa

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

apply feedback from documentation website to docs

d2623ed

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

fix transpose

5121c46

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

ritvibhatt force-pushed the update-ppl-examples branch from 2000094 to 5121c46 Compare May 4, 2026 05:16

fix reverse doc

3df36ba

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

fix convert doc

844e0fe

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

ritvibhatt marked this pull request as ready for review May 4, 2026 16:17

ritvibhatt requested review from Swiddis, acarbonetto, anirudha, dai-chen, joshuali925, mengweieric, penghuo, ps48 and vamsimanohar as code owners May 4, 2026 16:17

ritvibhatt requested review from LantaoJin, RyanL1997, ahkcs, noCharger, qianheng-aws, songkant-aws, ykmr1224 and yuancu as code owners May 4, 2026 16:17

Conversation

ritvibhatt commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Check List

Uh oh!

github-actions Bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 844e0fe)

Uh oh!

github-actions Bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Parameters

Previous suggestions

Parameters

Parameters

Matching with a wildcard pattern

Matching with a wildcard pattern

-The query returns the following results:

Uh oh!

github-actions Bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Analyzer ❗

Uh oh!

github-actions Bot commented Apr 2, 2026

Uh oh!

github-actions Bot commented Apr 3, 2026

Uh oh!

github-actions Bot commented Apr 6, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

github-actions Bot commented May 4, 2026

Uh oh!

github-actions Bot commented May 4, 2026

Uh oh!

github-actions Bot commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ritvibhatt commented Apr 1, 2026 •

edited

Loading

github-actions Bot commented Apr 1, 2026 •

edited

Loading

(Review updated until commit `844e0fe`)

github-actions Bot commented Apr 1, 2026 •

edited

Loading

github-actions Bot commented Apr 2, 2026 •

edited

Loading