Skip to content

update ppl doc examples to use otel data#5303

Open
ritvibhatt wants to merge 9 commits intoopensearch-project:mainfrom
ritvibhatt:update-ppl-examples
Open

update ppl doc examples to use otel data#5303
ritvibhatt wants to merge 9 commits intoopensearch-project:mainfrom
ritvibhatt:update-ppl-examples

Conversation

@ritvibhatt
Copy link
Copy Markdown
Contributor

@ritvibhatt ritvibhatt commented Apr 1, 2026

Description

  • Updates PPL command docs to use otel sample data instead of accounts/bank/etc
  • Applies feedback from docs website to keep in sync (only diff from exporter script is commented out sections for data sources section on documentation website)

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 1, 2026

PR Reviewer Guide 🔍

(Review updated until commit 844e0fe)

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Replace otellogs test data with new OTel dataset

Relevant files:

  • doctest/test_data/otellogs.json

Sub-PR theme: Add playground button support for otellogs PPL blocks in docs exporter

Relevant files:

  • scripts/docs_exporter/export_to_docs_website.py

Sub-PR theme: Update PPL command documentation examples to use OTel data

Relevant files:

  • docs/user/ppl/cmd/search.md
  • docs/user/ppl/cmd/reverse.md
  • docs/user/ppl/cmd/graphlookup.md
  • docs/user/ppl/cmd/showdatasources.md

⚡ Recommended focus areas for review

Regex Ordering

The new PPL block regex (lines 267-277) converts ppl fences with otellogs to sql fences with playground buttons, and then line 280 converts remaining ppl fences to sql. However, the new regex at line 267 uses re.DOTALL and matches the full block including the closing fence. If the first regex matches and replaces a block, the second regex at line 280 (which only matches the opening fence line) should not re-match it since the fence is now sql. This ordering appears correct, but should be validated that no edge cases exist where a block is double-processed.

content = re.sub(
    r'^```ppl\b[^\n]*\n(.*?)^```$',
    lambda m: (
        '```sql\n' + m.group(1) + '```\n{% include copy.html %}\n{% include try-in-playground.html %}'
        if _is_playground_eligible(m.group(1), current_file_path)
        else (
            '```ppl\n' + m.group(1) + '```'
        )
    ),
    content, flags=re.MULTILINE | re.DOTALL,
)

# Convert remaining PPL code fences to SQL
content = re.sub(r'^```ppl\b.*$', '```sql', content, flags=re.MULTILINE)
Playground Check

The _is_playground_eligible function checks for source=otellogs as a substring of the block text. This could produce false positives if a comment or string literal within the PPL block contains source=otellogs without it being the actual source command. A more precise check (e.g., matching source\s*=\s*otellogs as a word boundary or at the start of a line) would be more robust.

def _is_playground_eligible(block_text, file_path):
    """Check if a PPL block should get a Try in Playground button."""
    if 'source=otellogs' not in block_text:
        return False
    if file_path:
        stem = file_path.stem if hasattr(file_path, 'stem') else str(file_path).rsplit('/', 1)[-1].replace('.md', '')
        if stem in PLAYGROUND_UNSUPPORTED_CMDS:
            return False
    return True
Formatting Inconsistency

In the wildcard example result table at line 414, the ERROR value appears to have inconsistent spacing/padding (| ERROR | vs | ERROR |), which may indicate a copy-paste error in the table formatting.

| ERROR       |
Missing Index Directives

The old data file used explicit {"index": {"_id": "N"}} bulk API directives before each document. The new file omits these, which changes the bulk indexing format. If this file is used directly with the OpenSearch Bulk API, the missing action lines will cause indexing failures. Verify that the ingestion pipeline handles this format correctly or adds the action lines automatically.

{"@timestamp": "2024-02-01T09:10:00Z", "time": "2024-02-01T09:10:00Z", "severityNumber": 9, "severityText": "INFO", "body": "[2024-02-01T09:10:00.123Z] \"GET /api/products HTTP/1.1\" 200 - 1024 45 frontend-6b7b4c9f-x2kl9", "traceId": "abcd1234efgh5678", "spanId": "span0001", "flags": 0, "instrumentationScope": {"name": "@opentelemetry/instrumentation-http", "version": "0.57.0", "droppedAttributesCount": 0}, "resource": {"attributes": {"service": {"name": "frontend"}, "host": {"name": "frontend-6b7b4c9f-x2kl9"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:11:00Z", "time": "2024-02-01T09:11:00Z", "severityNumber": 9, "severityText": "INFO", "body": "Order #1234 placed successfully by user U100", "traceId": "abcd1234efgh5678", "spanId": "span0002", "flags": 0, "instrumentationScope": {"name": "Microsoft.Extensions.Hosting", "version": "9.0.0", "droppedAttributesCount": 0}, "resource": {"attributes": {"service": {"name": "cart"}, "host": {"name": "cart-5d8f7b-mk29s"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:12:00Z", "time": "2024-02-01T09:12:00Z", "severityNumber": 13, "severityText": "WARN", "body": "Slow query detected: SELECT * FROM products WHERE category = 'electronics' took 3200ms", "traceId": "abcd1234efgh5678", "spanId": "span0003", "flags": 0, "instrumentationScope": {"name": "go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc", "version": "0.49.0", "droppedAttributesCount": 0}, "resource": {"attributes": {"service": {"name": "product-catalog"}, "host": {"name": "productcatalog-7c9d-zn4p2"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:13:00Z", "time": "2024-02-01T09:13:00Z", "severityNumber": 17, "severityText": "ERROR", "body": "Payment failed: connection timeout to payment gateway after 30000ms", "traceId": "ijkl9012mnop3456", "spanId": "span0004", "flags": 0, "instrumentationScope": {"name": "@opentelemetry/instrumentation-http", "version": "0.57.0", "droppedAttributesCount": 0}, "resource": {"attributes": {"service": {"name": "payment"}, "host": {"name": "payment-6f8d4b-ht7q3"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:14:00Z", "time": "2024-02-01T09:14:00Z", "severityNumber": 5, "severityText": "DEBUG", "body": "Cache miss for key user:session:U200 in Valkey cluster", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "cart"}, "host": {"name": "cart-5d8f7b-mk29s"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:15:00Z", "time": "2024-02-01T09:15:00Z", "severityNumber": 17, "severityText": "ERROR", "body": "NullPointerException in CheckoutService.placeOrder at line 142", "traceId": "qrst5678uvwx9012", "spanId": "span0006", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "checkout"}, "host": {"name": "checkout-8b4c2d-jp5r7"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:16:00Z", "time": "2024-02-01T09:16:00Z", "severityNumber": 9, "severityText": "INFO", "body": "User U300 authenticated via OAuth2 from 10.0.0.5", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "frontend"}, "host": {"name": "frontend-6b7b4c9f-x2kl9"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:17:00Z", "time": "2024-02-01T09:17:00Z", "severityNumber": 17, "severityText": "ERROR", "body": "Out of memory: Java heap space - shutting down pod payment-6f8d4b-ht7q3", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "payment"}, "host": {"name": "payment-6f8d4b-ht7q3"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:18:00Z", "time": "2024-02-01T09:18:00Z", "severityNumber": 13, "severityText": "WARN", "body": "Connection pool 80% utilized on database replica db-replica-02", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "product-catalog"}, "host": {"name": "productcatalog-7c9d-zn4p2"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:19:00Z", "time": "2024-02-01T09:19:00Z", "severityNumber": 9, "severityText": "INFO", "body": "Kafka consumer group rebalanced: 4 partitions assigned to consumer checkout-consumer-01", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "checkout"}, "host": {"name": "checkout-8b4c2d-jp5r7"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:20:00Z", "time": "2024-02-01T09:20:00Z", "severityNumber": 17, "severityText": "ERROR", "body": "[2024-02-01T09:20:00.456Z] \"POST /api/checkout HTTP/1.1\" 503 - 0 30000 checkout-8d4f7b-mk2p9", "traceId": "yzab3456cdef7890", "spanId": "span0011", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "frontend-proxy"}, "host": {"name": "frontendproxy-envoy-7d4b8c-xk2q9"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:21:00Z", "time": "2024-02-01T09:21:00Z", "severityNumber": 5, "severityText": "DEBUG", "body": "gRPC call /ProductCatalogService/GetProduct completed in 12ms", "traceId": "abcd1234efgh5678", "spanId": "span0012", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "product-catalog"}, "host": {"name": "productcatalog-7c9d-zn4p2"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:22:00Z", "time": "2024-02-01T09:22:00Z", "severityNumber": 13, "severityText": "WARN", "body": "SSL certificate for api.example.com expires in 14 days", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "frontend-proxy"}, "host": {"name": "frontendproxy-envoy-7d4b8c-xk2q9"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:23:00Z", "time": "2024-02-01T09:23:00Z", "severityNumber": 9, "severityText": "INFO", "body": "Deployment frontend-v2.2.0 rolled out successfully to 3/3 replicas", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "frontend"}, "host": {"name": "frontend-6b7b4c9f-x2kl9"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:24:00Z", "time": "2024-02-01T09:24:00Z", "severityNumber": 17, "severityText": "ERROR", "body": "Failed to process recommendation request: invalid product ID from 203.0.113.50", "traceId": "ghij7890klmn1234", "spanId": "span0015", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "recommendation"}, "host": {"name": "recommendation-5f7c-bn3k8"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:25:00Z", "time": "2024-02-01T09:25:00Z", "severityNumber": 17, "severityText": "ERROR", "body": "Database primary node unreachable: connection refused to db-primary-01:5432", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "product-catalog"}, "host": {"name": "productcatalog-7c9d-zn4p2"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:26:00Z", "time": "2024-02-01T09:26:00Z", "severityNumber": 9, "severityText": "INFO", "body": "Health check passed: all 5 dependencies healthy", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "frontend"}, "host": {"name": "frontend-6b7b4c9f-x2kl9"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:27:00Z", "time": "2024-02-01T09:27:00Z", "severityNumber": 13, "severityText": "WARN", "body": "Rate limit threshold reached: 450/500 requests per minute for API key ending in ...abc789", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "frontend-proxy"}, "host": {"name": "frontendproxy-envoy-7d4b8c-xk2q9"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:28:00Z", "time": "2024-02-01T09:28:00Z", "severityNumber": 5, "severityText": "DEBUG", "body": "Valkey SETEX user:session:U300 3600 - session refreshed", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "cart"}, "host": {"name": "cart-5d8f7b-mk29s"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:29:00Z", "time": "2024-02-01T09:29:00Z", "severityNumber": 17, "severityText": "ERROR", "body": "Kafka producer delivery failed: message too large for topic order-events (max 1048576 bytes)", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "checkout"}, "host": {"name": "checkout-8b4c2d-jp5r7"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 1, 2026

PR Code Suggestions ✨

Latest suggestions up to 844e0fe

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix malformed code block with stray lines

The PPL code block for Example 3 is malformed. The closing triple backtick and the
subsequent lines | where age > 30 and | fields state, age appear to be leftover from
the old example and are outside the code block, making the documentation incorrect
and confusing.

docs/user/ppl/cmd/replace.md [83-91]

 ```ppl
 source=otellogs
 | where severityText = 'ERROR'
 | replace "frontend-proxy" WITH "frontend proxy" IN `resource.attributes.service.name`
 | fields `resource.attributes.service.name`, body
 | head 3

-| where age > 30
-| fields state, age
-```

<details><summary>Suggestion importance[1-10]: 9</summary>

__

Why: The code block for Example 3 is genuinely malformed - the closing backticks appear mid-block and the lines `| where age > 30` and `| fields state, age` are leftover from the old example, appearing outside the new code block. This is a real documentation bug that would confuse readers.


</details></details></td><td align=center>High

</td></tr><tr><td>



<details><summary>Fix unclosed syntax code block</summary>

___


**The syntax code block is not closed before the <code>## Parameters</code> section begins, causing <br>the Parameters heading and all subsequent content to be rendered inside the code <br>block. A closing triple backtick is missing after the syntax line.**

[docs/user/ppl/cmd/transpose.md [9-15]](https://github.com/opensearch-project/sql/pull/5303/files#diff-ec1fe93388c8d21c3152db80ad223e173eeb0bfb9685a46b9d3ea389f31748b0R9-R15)

```diff
 The `transpose` command has the following syntax:
 

transpose [int] [column_name=]
+```

Parameters

<details><summary>Suggestion importance[1-10]: 9</summary>

__

Why: The syntax code block is missing its closing triple backtick before the `## Parameters` heading, which would cause the entire Parameters section and all subsequent content to be rendered inside the code block, breaking the document structure significantly.


</details></details></td><td align=center>High

</td></tr><tr><td>



<details><summary>Non-playground PPL blocks lose copy button</summary>

___


**The non-playground branch reconstructs the block as <code> </code><code></code>ppl <code> but the subsequent regex </code><br><code></code>re.sub(r'^<code></code><code>ppl\b.*$', '</code><code></code>sql', ...)<code> will then convert it to SQL anyway, effectively </code><br><code>making the playground-ineligible branch a no-op and losing the </code>{% include copy.html <br>%}<code> button for those blocks. The non-playground branch should emit </code> <code></code><code>sql </code> with a copy <br>button directly, or the second regex should be guarded to skip blocks already <br>converted.**

[scripts/docs_exporter/export_to_docs_website.py [267-277]](https://github.com/opensearch-project/sql/pull/5303/files#diff-181ca7ed47f0c6ab9ec4a7da4a3f26e0c13c35b9d3cfe596f072eb1df128c84eR267-R277)

```diff
 content = re.sub(
     r'^```ppl\b[^\n]*\n(.*?)^```$',
     lambda m: (
         '```sql\n' + m.group(1) + '```\n{% include copy.html %}\n{% include try-in-playground.html %}'
         if _is_playground_eligible(m.group(1), current_file_path)
         else (
-            '```ppl\n' + m.group(1) + '```'
+            '```sql\n' + m.group(1) + '```\n{% include copy.html %}'
         )
     ),
     content, flags=re.MULTILINE | re.DOTALL,
 )
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a logic bug: the non-playground branch emits ```ppl which is then converted to ```sql by the subsequent regex on line 280, but without adding a {% include copy.html %} button. The improved code directly emits ```sql with a copy button, which is the correct behavior and avoids the silent loss of copy buttons for non-playground PPL blocks.

Medium
Fix duplicate queries obscuring keepempty behavior

Both the keepempty=true and the default (drop nulls) sub-examples in Example 3 use
identical PPL queries, making it impossible to distinguish the two behaviors. The
first query should include keepempty=true to demonstrate including null values,
matching the description that says "By default, records with null values are
dropped."

docs/user/ppl/cmd/dedup.md [86-128]

 The following query deduplicates by instrumentation scope name to see which OTel SDKs are reporting. By default, records with null values are dropped:
   
 ```ppl
 source=otellogs
-| dedup instrumentationScope.name
+| dedup instrumentationScope.name keepempty=true
 | fields instrumentationScope.name
 | sort instrumentationScope.name

The query returns the following results:

-fetched rows / total rows = 3/3
+fetched rows / total rows = 4/4
...

The following query deduplicates while ignoring documents with empty values in the specified field:

source=otellogs
| dedup instrumentationScope.name
| fields instrumentationScope.name
| sort instrumentationScope.name
<details><summary>Suggestion importance[1-10]: 8</summary>

__

Why: Both sub-examples in Example 3 use identical PPL queries (`dedup instrumentationScope.name` without `keepempty=true`), making it impossible to distinguish the two behaviors described. The first query should include `keepempty=true` to properly demonstrate the difference between keeping and dropping null values.


</details></details></td><td align=center>Medium

</td></tr><tr><td>



<details><summary>Fix duplicate row in override example output</summary>

___


**The result table for Example 3 shows <code>WARN</code> appearing twice with <code>agg=4</code>, which is <br>inconsistent with the explanation that <code>override=true</code> replaces main search values <br>with subsearch values. The subsearch only covers <code>ERROR</code> and <code>WARN</code>, so <code>INFO</code> and <code>DEBUG</code> <br>rows should retain their original counts, and <code>WARN</code> should appear only once. The <br>duplicate <code>WARN</code> row and the presence of <code>INFO</code> with an overridden value suggest the <br>expected output is incorrect.**

[docs/user/ppl/cmd/appendcol.md [95-104]](https://github.com/opensearch-project/sql/pull/5303/files#diff-d780d2b5919b1afcebbdbc20cc6a898119e430a7387532d8355a251ad895f203R95-R104)

```diff
 fetched rows / total rows = 4/4
 +-----+--------------+
 | agg | severityText |
 |-----+--------------|
 | 7   | ERROR        |
 | 4   | WARN         |
 | 6   | INFO         |
-| 4   | WARN         |
+| 3   | DEBUG        |
 +-----+--------------+
Suggestion importance[1-10]: 5

__

Why: The result table shows WARN appearing twice and INFO with an overridden value, which is inconsistent with the described behavior of override=true. However, the suggested fix introduces a DEBUG row that may not match the actual query behavior, making the correction uncertain.

Low
General
Add missing query results for prefix pattern example

The "Matching with a prefix pattern" subsection shows a query but does not include
its expected output results. Every other subsection in the document includes a
results block, so the missing output makes this example inconsistent and incomplete
for readers trying to verify their queries.

docs/user/ppl/cmd/where.md [108-130]

 ### Matching with a prefix pattern
 
 The following query uses a percent sign (`%`) to find all services starting with `frontend`:
 
 ```ppl
 source=otellogs
 | where LIKE(`resource.attributes.service.name`, 'frontend%')
 | fields severityText, `resource.attributes.service.name`, body
 | head 3

-### Matching with a wildcard pattern
-...
The query returns the following results:

+text +fetched rows / total rows = 3/3 ++--------------+----------------------------------+-------------------------------------------------------------------------------------------+ +| severityText | resource.attributes.service.name | body | +|--------------+----------------------------------+-------------------------------------------------------------------------------------------| +| WARN | frontend-proxy | SSL certificate for api.example.com expires in 14 days | +| ERROR | frontend-proxy | [2024-02-01T09:20:00.456Z] "POST /api/checkout HTTP/1.1" 503 - 0 30000 checkout-8d4f7b-mk2p9 | +| INFO | frontend | Incoming request: GET /api/products from 203.0.113.50 | ++--------------+----------------------------------+-------------------------------------------------------------------------------------------+ +
+
+### Matching with a wildcard pattern
+

<details><summary>Suggestion importance[1-10]: 5</summary>

__

Why: The "Matching with a prefix pattern" subsection is missing its expected output results, making it inconsistent with all other examples in the document. However, the suggested output data is fabricated and cannot be verified from the PR diff, so the score is moderate.


</details></details></td><td align=center>Low

</td></tr><tr><td>



<details><summary>Fragile substring check for playground eligibility</summary>

___


**The check <code>'source=otellogs' not in block_text</code> is a plain substring match and is <br>case-sensitive. A block containing <code>source=OtelLogs</code> or <code>source=otellogs_v2</code> would be <br>incorrectly handled. Consider using a more precise regex match to avoid false <br>positives and improve robustness.**

[scripts/docs_exporter/export_to_docs_website.py [258-259]](https://github.com/opensearch-project/sql/pull/5303/files#diff-181ca7ed47f0c6ab9ec4a7da4a3f26e0c13c35b9d3cfe596f072eb1df128c84eR258-R259)

```diff
-if 'source=otellogs' not in block_text:
+if not re.search(r'\bsource\s*=\s*otellogs\b', block_text):
     return False
Suggestion importance[1-10]: 4

__

Why: The suggestion raises a valid concern about the plain substring match for source=otellogs potentially matching unintended strings like source=otellogs_v2. Using a regex with word boundaries would be more precise, though in practice the current dataset may not have such edge cases.

Low
Fix inconsistent table column padding in results

The table formatting for the resource.attributes.service.name column is inconsistent
cart and product-catalog rows have truncated padding compared to other rows. All
rows should be padded to the same column width to maintain proper table alignment.

docs/user/ppl/cmd/sort.md [52-61]

 fetched rows / total rows = 4/4
 +--------------+----------------+----------------------------------+
 | severityText | severityNumber | resource.attributes.service.name |
 |--------------+----------------+----------------------------------|
-| DEBUG        | 5              | cart                     |
-| DEBUG        | 5              | product-catalog                |
-| DEBUG        | 5              | cart                     |
+| DEBUG        | 5              | cart                             |
+| DEBUG        | 5              | product-catalog                  |
+| DEBUG        | 5              | cart                             |
 | INFO         | 9              | frontend                         |
 +--------------+----------------+----------------------------------+
Suggestion importance[1-10]: 4

__

Why: The cart and product-catalog rows have inconsistent padding in the resource.attributes.service.name column, breaking the visual alignment of the ASCII table. The fix correctly pads all values to match the column width.

Low

Previous suggestions

Suggestions up to commit 3df36ba
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix unclosed syntax code block

The syntax code block is missing its closing triple-backtick fence. The ##
Parameters section header and everything that follows it is being rendered inside
the code block instead of as normal Markdown. A closing fence should be added after
the syntax line.

docs/user/ppl/cmd/transpose.md [9-15]

 The `transpose` command has the following syntax:
 

transpose [int] [column_name=]
+```

Parameters

<details><summary>Suggestion importance[1-10]: 9</summary>

__

Why: The syntax code block is missing its closing triple-backtick fence, which would cause the `## Parameters` section and everything following it to be rendered inside the code block. This is a critical formatting bug that breaks the entire document structure.


</details></details></td><td align=center>High

</td></tr><tr><td>



<details><summary>Fix malformed code block in example</summary>

___


**The PPL code block for Example 3 is malformed. The closing triple-backtick of the <br>first code block is missing, causing the leftover lines <code>| where age > 30</code> and <code>| </code><br><code>fields state, age</code> to appear as stray text outside the code block. The closing fence <br>should be added after <code>| head 3</code>.**

[docs/user/ppl/cmd/replace.md [83-91]](https://github.com/opensearch-project/sql/pull/5303/files#diff-8644862a762e96b24031ad83e31729768a40b3387fda49b9a3a4f0c6f4c3de39R83-R91)

```diff
 ```ppl
 source=otellogs
 | where severityText = 'ERROR'
 | replace "frontend-proxy" WITH "frontend proxy" IN `resource.attributes.service.name`
 | fields `resource.attributes.service.name`, body
 | head 3

-| where age > 30
-| fields state, age
-```

<details><summary>Suggestion importance[1-10]: 8</summary>

__

Why: The code block for Example 3 is genuinely malformed - the closing triple-backtick is missing, causing `| where age > 30` and `| fields state, age` to appear as stray text outside the code block. This is a real rendering bug that would confuse readers.


</details></details></td><td align=center>Medium

</td></tr><tr><td>



<details><summary>Fix duplicate identical queries with different described behaviors</summary>

___


**Both queries in Example 3 are identical (both use <code>dedup instrumentationScope.name</code> <br>without <code>keepempty</code>), yet the text describes them as having different behaviors (one <br>drops nulls by default, the other "ignores documents with empty values"). The second <br>query should use <code>keepempty=true</code> to demonstrate including null values, matching the <br>original intent of the example.**

[docs/user/ppl/cmd/dedup.md [88-115]](https://github.com/opensearch-project/sql/pull/5303/files#diff-532601ea92e852acbb8b9bcdd2e269377bc05a311ec141be38148539d20c5bafR88-R115)

```diff
-The following query deduplicates by instrumentation scope name to see which OTel SDKs are reporting. By default, records with null values are dropped:
+The following query deduplicates while keeping documents with empty values in the specified field:
   
 ```ppl
 source=otellogs
-| dedup instrumentationScope.name
-| fields instrumentationScope.name
-| sort instrumentationScope.name
-```
-  
-The query returns the following results:
-  
-```text
-fetched rows / total rows = 3/3
-...
-```
-  
-The following query deduplicates while ignoring documents with empty values in the specified field:
-  
-```ppl
-source=otellogs
-| dedup instrumentationScope.name
+| dedup instrumentationScope.name keepempty=true
 | fields instrumentationScope.name
 | sort instrumentationScope.name
<details><summary>Suggestion importance[1-10]: 8</summary>

__

Why: Both queries in Example 3 are identical (both use `dedup instrumentationScope.name` without `keepempty`), yet the surrounding text describes them as having different behaviors. The second query should use `keepempty=true` to correctly demonstrate including null values, as the text implies.


</details></details></td><td align=center>Medium

</td></tr><tr><td rowspan=2>General</td>
<td>



<details><summary>Use robust file extension stripping for stem extraction</summary>

___


**The <code>stem</code> extraction for non-Path objects uses <code>rsplit('/', 1)[-1]</code> which returns the <br>full filename including the <code>.md</code> extension before calling <code>.replace('.md', '')</code>. <br>However, if the file has no <code>/</code> in its path, <code>rsplit('/', 1)[-1]</code> still returns the full <br>string correctly. The bigger issue is that <code>PLAYGROUND_UNSUPPORTED_CMDS</code> checks the <br>file stem (e.g., <code>multisearch</code>) but the set only contains <code>'multisearch'</code> — if the <br>actual filename is <code>multisearch.md</code>, the stem would be <code>multisearch</code>, which works. But <br>the check <code>stem in PLAYGROUND_UNSUPPORTED_CMDS</code> uses a set lookup on a <code>set</code>, which is <br>correct. This logic is fine, but the <code>str(file_path).rsplit('/', </code><br><code>1)[-1].replace('.md', '')</code> approach will fail for files with extensions other than <br><code>.md</code> (e.g., <code>.mdx</code>). Consider using <code>os.path.splitext</code> for robustness.**

[scripts/docs_exporter/export_to_docs_website.py [258-264]](https://github.com/opensearch-project/sql/pull/5303/files#diff-181ca7ed47f0c6ab9ec4a7da4a3f26e0c13c35b9d3cfe596f072eb1df128c84eR258-R264)

```diff
 if 'source=otellogs' not in block_text:
     return False
 if file_path:
-    stem = file_path.stem if hasattr(file_path, 'stem') else str(file_path).rsplit('/', 1)[-1].replace('.md', '')
+    import os
+    stem = file_path.stem if hasattr(file_path, 'stem') else os.path.splitext(str(file_path).rsplit('/', 1)[-1])[0]
     if stem in PLAYGROUND_UNSUPPORTED_CMDS:
         return False
 return True
Suggestion importance[1-10]: 3

__

Why: The suggestion to use os.path.splitext instead of .replace('.md', '') is a minor robustness improvement for non-.md extensions, but the codebase appears to only process .md files, making this an edge case with low practical impact. Also, adding an import inside a nested function is not ideal style.

Low
Fix misaligned table column values in example output

The unknown values in the result table for Example 1 are not padded to align with
the column width, causing the table to appear misformatted compared to the other
rows. The values should be padded with trailing spaces to match the column width
defined by the header.

docs/user/ppl/cmd/fillnull.md [53-57]

-| ERROR        | checkout                         | unknown                                                            |
-| ERROR        | checkout                         | unknown                                                            |
-| ERROR        | frontend-proxy                   | unknown                                                            |
-| WARN         | frontend-proxy                   | unknown                                                            |
-| WARN         | frontend-proxy                   | unknown                                                            |
+| ERROR        | checkout                         | unknown                                                                     |
+| ERROR        | checkout                         | unknown                                                                     |
+| ERROR        | frontend-proxy                   | unknown                                                                     |
+| WARN         | frontend-proxy                   | unknown                                                                     |
+| WARN         | frontend-proxy                   | unknown                                                                     |
Suggestion importance[1-10]: 2

__

Why: The misalignment of unknown values in the ASCII table is a cosmetic issue. In Markdown rendered output, the table borders are defined by the separator row, so the padding difference has minimal visual impact on the final rendered documentation.

Low
Suggestions up to commit 5121c46
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix malformed code block with stray lines

The PPL code block for Example 3 is malformed. The closing triple backtick for the
first code block is missing, and leftover lines from the old example (| where age >
30 and | fields state, age) appear outside the code block. These stray lines should
be removed and the code block properly closed.

docs/user/ppl/cmd/replace.md [83-91]

 ```ppl
 source=otellogs
 | where severityText = 'ERROR'
 | replace "frontend-proxy" WITH "frontend proxy" IN `resource.attributes.service.name`
 | fields `resource.attributes.service.name`, body
 | head 3

-| where age > 30
-| fields state, age
-```

<details><summary>Suggestion importance[1-10]: 9</summary>

__

Why: The code block for Example 3 is clearly malformed - the closing triple backtick is missing and stray lines (`| where age > 30` and `| fields state, age`) from the old example appear outside the code block, which would break the rendered documentation significantly.


</details></details></td><td align=center>High

</td></tr><tr><td>



<details><summary>Fix unclosed syntax code block</summary>

___


**The syntax code block is not closed before the <code>## Parameters</code> section begins, causing <br>the entire Parameters section and everything that follows to be rendered as part of <br>the code block. A closing triple backtick must be added after the syntax line.**

[docs/user/ppl/cmd/transpose.md [9-15]](https://github.com/opensearch-project/sql/pull/5303/files#diff-ec1fe93388c8d21c3152db80ad223e173eeb0bfb9685a46b9d3ea389f31748b0R9-R15)

```diff
 The `transpose` command has the following syntax:
 

transpose [int] [column_name=]
+```

Parameters

<details><summary>Suggestion importance[1-10]: 9</summary>

__

Why: The syntax code block starting with triple backticks is never closed before the `## Parameters` heading, which would cause the entire Parameters section and subsequent content to be rendered as part of the code block, severely breaking the documentation layout.


</details></details></td><td align=center>High

</td></tr><tr><td>



<details><summary>Non-playground PPL blocks lose copy button</summary>

___


**The non-playground branch reconstructs the block as <code> </code><code></code>ppl <code> but the subsequent regex </code><br><code></code>re.sub(r'^<code></code><code>ppl\b.*$', '</code><code></code>sql', ...)<code> will then convert it to SQL anyway, effectively </code><br><code>making the playground-ineligible branch a no-op and losing the </code>{% include copy.html <br>%}<code> button for those blocks. The non-playground branch should emit </code> <code></code><code>sql </code> with a copy <br>button directly, or the second regex should be guarded to skip already-converted <br>blocks.**

[scripts/docs_exporter/export_to_docs_website.py [267-277]](https://github.com/opensearch-project/sql/pull/5303/files#diff-181ca7ed47f0c6ab9ec4a7da4a3f26e0c13c35b9d3cfe596f072eb1df128c84eR267-R277)

```diff
 content = re.sub(
     r'^```ppl\b[^\n]*\n(.*?)^```$',
     lambda m: (
         '```sql\n' + m.group(1) + '```\n{% include copy.html %}\n{% include try-in-playground.html %}'
         if _is_playground_eligible(m.group(1), current_file_path)
         else (
-            '```ppl\n' + m.group(1) + '```'
+            '```sql\n' + m.group(1) + '```\n{% include copy.html %}'
         )
     ),
     content, flags=re.MULTILINE | re.DOTALL,
 )
Suggestion importance[1-10]: 8

__

Why: This is a valid and important bug: the non-playground branch emits ```ppl which is then converted to ```sql by the subsequent regex on line 280, but without adding a {% include copy.html %} button. The fix correctly changes the non-playground branch to emit ```sql with a copy button directly, ensuring consistent behavior.

Medium
Fix duplicate queries with different described behaviors

Example 3 contains two identical PPL queries but the description says one uses
keepempty=true and the other does not. The first query should include keepempty=true
to demonstrate the difference in behavior, otherwise the two sub-examples are
indistinguishable and the explanation is misleading.

docs/user/ppl/cmd/dedup.md [86-115]

 The following query deduplicates by instrumentation scope name to see which OTel SDKs are reporting. By default, records with null values are dropped:
   
 ```ppl
 source=otellogs
-| dedup instrumentationScope.name
+| dedup instrumentationScope.name keepempty=true
 | fields instrumentationScope.name
 | sort instrumentationScope.name

The query returns the following results:

fetched rows / total rows = 3/3
...

The following query deduplicates while ignoring documents with empty values in the specified field:

source=otellogs
| dedup instrumentationScope.name
| fields instrumentationScope.name
| sort instrumentationScope.name
<details><summary>Suggestion importance[1-10]: 8</summary>

__

Why: Both PPL queries in Example 3 are identical (`dedup instrumentationScope.name` without `keepempty=true`), but the descriptions say they demonstrate different behaviors. The first query should include `keepempty=true` to correctly illustrate the contrast between keeping and dropping null values.


</details></details></td><td align=center>Medium

</td></tr><tr><td>



<details><summary>Incorrect example output for override parameter behavior</summary>

___


**The result table for Example 3 (override parameter) appears inconsistent: <code>WARN</code> <br>appears twice with the same count (<code>4</code>), and <code>INFO</code> and <code>DEBUG</code> rows are present even <br>though the subsearch only covers <code>ERROR</code> and <code>WARN</code>. With <code>override=true</code>, the main search <br>rows for <code>INFO</code> and <code>DEBUG</code> should retain their original <code>agg</code> values (since the subsearch <br>has no matching row to override them), and <code>WARN</code> should appear only once. The example <br>output is misleading and may confuse readers about how <code>override</code> works.**

[docs/user/ppl/cmd/appendcol.md [94-103]](https://github.com/opensearch-project/sql/pull/5303/files#diff-d780d2b5919b1afcebbdbc20cc6a898119e430a7387532d8355a251ad895f203R94-R103)

```diff
 fetched rows / total rows = 4/4
 +-----+--------------+
 | agg | severityText |
 |-----+--------------|
 | 7   | ERROR        |
-| 4   | WARN         |
 | 6   | INFO         |
+| 3   | DEBUG        |
 | 4   | WARN         |
 +-----+--------------+
Suggestion importance[1-10]: 4

__

Why: The result table shows WARN appearing twice with the same count, which is inconsistent with how override=true should work. However, the "improved_code" provided may also not be accurate since it's unclear what the actual query would return, making this a potentially misleading correction.

Low
General
Missing result block for a query example

The prefix pattern subsection shows a PPL query but is missing its corresponding
result block. Every other subsection in the document includes a "The query returns
the following results:" block with a sample output table. Without it, the
documentation is incomplete and inconsistent.

docs/user/ppl/cmd/where.md [108-119]

 ### Matching with a prefix pattern
 
 The following query uses a percent sign (`%`) to find all services starting with `frontend`:
 
 ```ppl
 source=otellogs
 | where LIKE(`resource.attributes.service.name`, 'frontend%')
 | fields severityText, `resource.attributes.service.name`, body
 | head 3

+The query returns the following results:
+
+text +fetched rows / total rows = 3/3 ++--------------+----------------------------------+-----------------------------------------------------------------------+ +| severityText | resource.attributes.service.name | body | +|--------------+----------------------------------+-----------------------------------------------------------------------| +| ... | frontend | ... | +| ... | frontend-proxy | ... | +| ... | frontend-proxy | ... | ++--------------+----------------------------------+-----------------------------------------------------------------------+ +
+

Matching with a wildcard pattern

<details><summary>Suggestion importance[1-10]: 5</summary>

__

Why: The "Matching with a prefix pattern" subsection includes a PPL query but no corresponding result block, making it inconsistent with every other example in the document. Adding the missing result block would improve documentation completeness.


</details></details></td><td align=center>Low

</td></tr><tr><td>



<details><summary>Fix inconsistent column padding in result table</summary>

___


**The result table for Example 1 has inconsistent column alignment: the <code>cart</code> and <br><code>product-catalog</code> values in the <code>resource.attributes.service.name</code> column are not padded <br>to match the column width, unlike <code>frontend</code>. This makes the table render incorrectly <br>in markdown and looks unprofessional in documentation.**

[docs/user/ppl/cmd/sort.md [52-61]](https://github.com/opensearch-project/sql/pull/5303/files#diff-10553d209efeb9e9bd7e1745b91efeedb8ab100d2ef2209cbced0fe9dadb32a2R52-R61)

```diff
 fetched rows / total rows = 4/4
 +--------------+----------------+----------------------------------+
 | severityText | severityNumber | resource.attributes.service.name |
 |--------------+----------------+----------------------------------|
-| DEBUG        | 5              | cart                     |
-| DEBUG        | 5              | product-catalog                |
-| DEBUG        | 5              | cart                     |
+| DEBUG        | 5              | cart                             |
+| DEBUG        | 5              | product-catalog                  |
+| DEBUG        | 5              | cart                             |
 | INFO         | 9              | frontend                         |
 +--------------+----------------+----------------------------------+
Suggestion importance[1-10]: 4

__

Why: The cart and product-catalog values in the result table are not padded to match the column width, causing inconsistent table formatting. The fix correctly aligns the column values to match the header width.

Low
Case-sensitive playground eligibility check

The check 'source=otellogs' not in block_text is a plain substring match and is
case-sensitive. A block containing source=OtelLogs or SOURCE=otellogs would be
incorrectly excluded. Consider using a case-insensitive check or a regex to make the
detection more robust.

scripts/docs_exporter/export_to_docs_website.py [258-259]

-if 'source=otellogs' not in block_text:
+if 'source=otellogs' not in block_text.lower():
     return False
Suggestion importance[1-10]: 3

__

Why: While technically valid, the source=otellogs check is used against controlled documentation content where the casing is consistent. This is a minor robustness improvement with low practical impact in this context.

Low
Fix blockquote formatting inconsistency

The note is missing a space after the > blockquote marker, which is inconsistent
with standard Markdown formatting conventions used elsewhere in the document (for
example, the > Note: style used in rex.md). This may render incorrectly in some
Markdown parsers.

docs/user/ppl/cmd/convert.md [256]

->**Note**: The `none()` function is useful with wildcard support, allowing you to exclude specific fields from bulk conversions.
+> **Note**: The `none()` function is useful with wildcard support, allowing you to exclude specific fields from bulk conversions.
Suggestion importance[1-10]: 3

__

Why: The >**Note**: is missing a space after >, which is inconsistent with standard Markdown formatting. While minor, fixing it ensures consistent rendering across Markdown parsers.

Low
Suggestions up to commit 2000094
CategorySuggestion                                                                                                                                    Impact
Possible issue
Remove stray code lines outside code block

The PPL code block for Example 3 is malformed. There is a stray | where age > 30 and
| fields state, age outside the code fence, which are leftover lines from the old
example that were not properly removed. These lines should be deleted.

docs/user/ppl/cmd/replace.md [83-91]

 ```ppl
 source=otellogs
 | where severityText = 'ERROR'
 | replace "frontend-proxy" WITH "frontend proxy" IN `resource.attributes.service.name`
 | fields `resource.attributes.service.name`, body
 | head 3

-| where age > 30
-| fields state, age
-```

<details><summary>Suggestion importance[1-10]: 9</summary>

__

Why: The code block for Example 3 is clearly malformed - there are leftover lines `| where age > 30` and `| fields state, age` outside the closing code fence, which are remnants from the old example. This is a real bug in the documentation that would confuse readers.


</details></details></td><td align=center>High

</td></tr><tr><td>



<details><summary>Fix duplicate identical queries in example</summary>

___


**Example 3 contains two identical PPL queries but claims they demonstrate different <br>behaviors (one with <code>keepempty=true</code> and one without). The second query should use <br><code>keepempty=true</code> to actually demonstrate including null values, matching the <br>description "deduplicates while ignoring documents with empty values" (which should <br>instead show <code>keepempty=true</code> behavior). The queries and their descriptions need to be <br>differentiated.**

[docs/user/ppl/cmd/dedup.md [86-115]](https://github.com/opensearch-project/sql/pull/5303/files#diff-532601ea92e852acbb8b9bcdd2e269377bc05a311ec141be38148539d20c5bafR86-R115)

```diff
 The following query deduplicates by instrumentation scope name to see which OTel SDKs are reporting. By default, records with null values are dropped:
   
 ```ppl
 source=otellogs
 | dedup instrumentationScope.name
 | fields instrumentationScope.name
 | sort instrumentationScope.name

The query returns the following results:

fetched rows / total rows = 3/3
-...
++-----------------------------------------------------------------------------+
+| instrumentationScope.name                                                   |
+|-----------------------------------------------------------------------------|
+| @opentelemetry/instrumentation-http                                         |
+| Microsoft.Extensions.Hosting                                                |
+| go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc |
++-----------------------------------------------------------------------------+

-The following query deduplicates while ignoring documents with empty values in the specified field:
+The following query deduplicates while keeping documents with empty values in the specified field:

source=otellogs
-| dedup instrumentationScope.name
+| dedup instrumentationScope.name keepempty=true
| fields instrumentationScope.name
| sort instrumentationScope.name
<details><summary>Suggestion importance[1-10]: 8</summary>

__

Why: The two PPL queries in Example 3 are identical but claim to demonstrate different behaviors. The second query should include `keepempty=true` to actually show the described behavior of "ignoring documents with empty values," making this a genuine documentation correctness issue.


</details></details></td><td align=center>Medium

</td></tr><tr><td>



<details><summary>Non-playground PPL blocks skip SQL conversion</summary>

___


**The non-playground branch preserves the original <code>ppl</code> fence, which means the <br>subsequent <code>re.sub</code> for converting remaining PPL fences to SQL will still match and <br>convert them. However, the non-playground branch does NOT add a copy button, while <br>the original code added one for all SQL fences. This is likely intentional, but the <br>non-playground branch should also add a copy button after converting to SQL, or the <br>conversion to SQL should happen inside this lambda for the non-playground case as <br>well, to ensure consistent behavior.**

[scripts/docs_exporter/export_to_docs_website.py [259-269]](https://github.com/opensearch-project/sql/pull/5303/files#diff-181ca7ed47f0c6ab9ec4a7da4a3f26e0c13c35b9d3cfe596f072eb1df128c84eR259-R269)

```diff
 content = re.sub(
     r'^```ppl\b[^\n]*\n(.*?)^```$',
     lambda m: (
         '```sql\n' + m.group(1) + '```\n{% include copy.html %}\n{% include try-in-playground.html %}'
         if _is_playground_eligible(m.group(1), current_file_path)
         else (
-            '```ppl\n' + m.group(1) + '```'
+            '```sql\n' + m.group(1) + '```'
         )
     ),
     content, flags=re.MULTILINE | re.DOTALL,
 )
Suggestion importance[1-10]: 7

__

Why: The non-playground branch preserves the ppl fence, which will be caught by the subsequent re.sub on line 272 that converts remaining PPL fences to SQL. However, that conversion won't add a copy button, while playground-eligible blocks do get one. The suggestion to convert to sql directly in the non-playground branch is valid and would ensure consistent SQL conversion, though the copy button omission for non-playground blocks may be intentional design.

Medium
Fix incorrect duplicate row in example result table

The result table for Example 3 shows WARN appearing twice with the same count (4),
which looks like a data error. The override=true behavior should replace the main
search agg values with subsearch values only for matching rows (ERROR and WARN),
leaving INFO and DEBUG from the main search unchanged. The duplicate WARN row and
missing DEBUG row make the example misleading and potentially incorrect.

docs/user/ppl/cmd/appendcol.md [95-104]

 fetched rows / total rows = 4/4
 +-----+--------------+
 | agg | severityText |
 |-----+--------------|
+| 3   | DEBUG        |
 | 7   | ERROR        |
-| 4   | WARN         |
 | 6   | INFO         |
 | 4   | WARN         |
 +-----+--------------+
Suggestion importance[1-10]: 6

__

Why: The result table shows WARN appearing twice with the same count and is missing a DEBUG row, which appears to be a data error that could mislead readers about the override=true behavior.

Low
General
Fix non-sequential example numbering

The examples jump from Example 4 to Example 6, skipping Example 5. This appears to
be a numbering error introduced in the PR when examples were reorganized. The
example numbering should be sequential to avoid confusing readers.

docs/user/ppl/cmd/sort.md [180]

-## Example 6: Specifying the number of sorted documents to return
+## Example 5: Specifying the number of sorted documents to return
Suggestion importance[1-10]: 7

__

Why: The examples jump from Example 4 to Example 6, skipping Example 5. This is a real numbering error in the PR that would confuse readers navigating the documentation.

Medium
Fix filter placement affecting aggregation correctness

The where clause filters by severityText after eventstats, but eventstats computes
counts over all rows (not just ERROR rows). This means scope_count reflects counts
across all severity levels, not just errors, which is misleading. The where clause
should be placed before eventstats to ensure the count reflects only ERROR entries,
or the description should clarify this behavior.

docs/user/ppl/cmd/eventstats.md [141-146]

 ```ppl
 source=otellogs
+| where severityText = 'ERROR'
 | eventstats bucket_nullable=false count() as scope_count by instrumentationScope.name
-| where severityText = 'ERROR'
 | sort `resource.attributes.service.name`
 | fields `resource.attributes.service.name`, `instrumentationScope.name`, scope_count
<details><summary>Suggestion importance[1-10]: 5</summary>

__

Why: The `where severityText = 'ERROR'` clause is placed after `eventstats`, meaning `scope_count` reflects counts across all severity levels rather than just errors. While this may be intentional to demonstrate `bucket_nullable=false`, the query logic could be misleading. However, this may be a deliberate design choice for the example, so the impact is moderate.


</details></details></td><td align=center>Low

</td></tr><tr><td>



<details><summary>Add missing result block for first subsection</summary>

___


**The "Matching with a prefix pattern" subsection shows a query but has no result <br>block, while only the second subsection ("Matching with a wildcard pattern") shows <br>results. Readers of the first subsection will be left without example output. A <br>result block should be added after the first query, or a note should clarify that <br>only the second query's results are shown.**

[docs/user/ppl/cmd/where.md [108-130]](https://github.com/opensearch-project/sql/pull/5303/files#diff-ea8ca7481f94d3d9401c4057d33afd640010be2df62b62730137101ef083c140R108-R130)

```diff
 ### Matching with a prefix pattern
 
 The following query uses a percent sign (`%`) to find all services starting with `frontend`:
 
 ```ppl
 source=otellogs
 | where LIKE(`resource.attributes.service.name`, 'frontend%')
 | fields severityText, `resource.attributes.service.name`, body
 | head 3
+```
+
+The query returns the following results:
+
+```text
+fetched rows / total rows = 3/3
++--------------+----------------------------------+-------+
+| severityText | resource.attributes.service.name | body  |
+|--------------+----------------------------------+-------|
+| ...          | frontend                         | ...   |
+| ...          | frontend-proxy                   | ...   |
+| ...          | frontend-proxy                   | ...   |
++--------------+----------------------------------+-------+

Matching with a wildcard pattern

The following query finds all logs from services containing product in their name:

source=otellogs
| where LIKE(`resource.attributes.service.name`, '%product%')
| fields severityText, `resource.attributes.service.name`, body
| head 3

The query returns the following results:

<details><summary>Suggestion importance[1-10]: 4</summary>

__

Why: The "Matching with a prefix pattern" subsection lacks a result block, which is inconsistent with the rest of the documentation. However, the `improved_code` contains placeholder data (`...`) rather than actual query results, making it an incomplete fix.


</details></details></td><td align=center>Low

</td></tr><tr><td>



<details><summary>Fix misaligned table column padding in result output</summary>

___


**The <code>unknown</code> values in the result table are not padded to align with the column <br>width, causing the table to appear misformatted. The <code>unknown</code> entries should be <br>padded with trailing spaces to match the column width defined by the header, <br>consistent with the formatting of other result tables in the documentation.**

[docs/user/ppl/cmd/fillnull.md [53-64]](https://github.com/opensearch-project/sql/pull/5303/files#diff-e05ce7bc17faa0105fe2b13aeacef92b8f08f90aac016e003b615f5f575efaa4R53-R64)

```diff
-| ERROR        | checkout                         | unknown                                                            |
-| ERROR        | checkout                         | unknown                                                            |
-| ERROR        | frontend-proxy                   | unknown                                                            |
-| WARN         | frontend-proxy                   | unknown                                                            |
-| WARN         | frontend-proxy                   | unknown                                                            |
+| ERROR        | checkout                         | unknown                                                                     |
+| ERROR        | checkout                         | unknown                                                                     |
+| ERROR        | frontend-proxy                   | unknown                                                                     |
+| WARN         | frontend-proxy                   | unknown                                                                     |
+| WARN         | frontend-proxy                   | unknown                                                                     |
 | ERROR        | payment                          | @opentelemetry/instrumentation-http                                         |
-| ERROR        | payment                          | unknown                                                            |
+| ERROR        | payment                          | unknown                                                                     |
 | WARN         | product-catalog                  | go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc |
-| WARN         | product-catalog                  | unknown                                                            |
-| ERROR        | product-catalog                  | unknown                                                            |
-| ERROR        | recommendation                   | unknown                                                            |
+| WARN         | product-catalog                  | unknown                                                                     |
+| ERROR        | product-catalog                  | unknown                                                                     |
+| ERROR        | recommendation                   | unknown                                                                     |
Suggestion importance[1-10]: 4

__

Why: The unknown values in the result table are not padded to match the column width, causing visual misalignment. This is a cosmetic issue that affects readability but not correctness.

Low
Lookahead pattern may not reliably prevent duplicate copy buttons

The negative lookahead (?!\n{% include copy) uses re.DOTALL, which makes . match
newlines, but the lookahead itself checks for a literal newline \n followed by {%.
With re.DOTALL, the $ in ^$(?!\n...) matches end-of-line (due to re.MULTILINE), so
the lookahead should work correctly. However, the
{% include try-in-playground.html
%} button is added after {% include copy.html %}, so the pattern (?!\n{% include
copy) will correctly skip blocks that already have a copy button. Verify that the
lookahead correctly handles cases where the playground button was added (which also
includes a copy button before it) to avoid double copy buttons.

scripts/docs_exporter/export_to_docs_website.py [280-282]

-content = re.sub(r'^```(bash|sh|sql)\b[^\n]*\n(.*?)^```$(?!\n\{% include copy)',
+content = re.sub(r'^```(bash|sh|sql)\b[^\n]*\n(.*?)^```$(?!\n\{%[ ]include copy)',
                  r'```\1\n\2```\n{% include copy.html %}',
                  content, flags=re.MULTILINE | re.DOTALL)
Suggestion importance[1-10]: 3

__

Why: The suggestion to add a space in the lookahead pattern ({%[ ]include copy) is a minor robustness improvement, but the existing pattern {% include copy already matches the literal string produced by the code. The improved_code change is minimal and the concern about double copy buttons is valid but the existing logic should handle it correctly since playground blocks add copy.html before try-in-playground.html.

Low
Add missing newline at end of file

The file is missing a newline at the end. Files should end with a newline character
to follow standard text file conventions and avoid potential issues with file
concatenation or diff tools.

docs/user/ppl/cmd/appendpipe.md [86]

+* **Schema compatibility**: When fields with the same name exist in both the main search and the subpipeline but have incompatible types, the query fails with an error. To avoid type conflicts, ensure that fields with the same name share the same data type. Alternatively, use different field names. You can rename the conflicting fields using `eval` or select non-conflicting columns using `fields`.
 
-
Suggestion importance[1-10]: 1

__

Why: The existing_code and improved_code are identical, meaning no actual change is proposed. Adding a trailing newline is a very minor style concern with negligible impact.

Low
Suggestions up to commit c051fd0
CategorySuggestion                                                                                                                                    Impact
Possible issue
Remove orphaned code lines outside code fence

The PPL code block for Example 3 is broken — there is a stray closing
triple-backtick followed by two orphaned pipeline stages (| where age > 30 and |
fields state, age) that appear outside the code fence. These leftover lines from the
old example were not removed and will render as raw text in the documentation.

docs/user/ppl/cmd/replace.md [83-91]

 ```ppl
 source=otellogs
 | where severityText = 'ERROR'
 | replace "frontend-proxy" WITH "frontend proxy" IN `resource.attributes.service.name`
 | fields `resource.attributes.service.name`, body
 | head 3

-| where age > 30
-| fields state, age
-```

<details><summary>Suggestion importance[1-10]: 8</summary>

__

Why: The `| where age > 30` and `| fields state, age` lines are leftover from the old example and appear outside the code fence, which will render as raw text in the documentation and break the example's presentation.


</details></details></td><td align=center>Medium

</td></tr><tr><td>



<details><summary>Prevent duplicate copy buttons on playground-eligible blocks</summary>

___


**The non-playground branch reconstructs the block as <code> </code><code></code>ppl <code> but the subsequent regex </code><br><code></code>re.sub(r'^<code></code><code>ppl\b.*$', '</code><code></code>sql', ...)<code> will then convert it to SQL anyway, which is the </code><br><code>intended behavior. However, the non-playground branch is missing the copy button and </code><br><code>the try-in-playground button, so the final output for non-otellogs PPL blocks will </code><br><code>only get a copy button from the later generic SQL regex — this is fine, but the </code><br><code>playground-eligible branch adds the copy button inline AND the later regex will add </code><br><code>another copy button because the </code>(?!\n\{% include copy)<code> lookahead checks for a </code><br><code>newline before </code>{% include copy<code>, but the playground block ends with </code>{% include <br>try-in-playground.html %}<code>, not </code>{% include copy<code>. This means playground-eligible </code><br><code>blocks will get a duplicate </code>{% include copy.html %}<code> appended.</code>**

[scripts/docs_exporter/export_to_docs_website.py [259-282]](https://github.com/opensearch-project/sql/pull/5303/files#diff-181ca7ed47f0c6ab9ec4a7da4a3f26e0c13c35b9d3cfe596f072eb1df128c84eR259-R282)

```diff
 content = re.sub(
     r'^```ppl\b[^\n]*\n(.*?)^```$',
     lambda m: (
         '```sql\n' + m.group(1) + '```\n{% include copy.html %}\n{% include try-in-playground.html %}'
         if _is_playground_eligible(m.group(1), current_file_path)
         else (
             '```ppl\n' + m.group(1) + '```'
         )
     ),
     content, flags=re.MULTILINE | re.DOTALL,
 )
 
+# Convert remaining PPL code fences to SQL
+content = re.sub(r'^```ppl\b.*$', '```sql', content, flags=re.MULTILINE)
+
+# Add copy buttons after sql/bash/sh fences that don't already have one
+content = re.sub(r'^```(bash|sh|sql)\b[^\n]*\n(.*?)^```$(?!\n\{% include copy)(?!\n\{% include try-in-playground)',
+                 r'```\1\n\2```\n{% include copy.html %}',
+                 content, flags=re.MULTILINE | re.DOTALL)
+
Suggestion importance[1-10]: 7

__

Why: This is a valid bug: playground-eligible blocks already include {% include copy.html %}, but the later regex at line 280 checks only for (?!\n\{% include copy) — since those blocks end with {% include try-in-playground.html %}, the lookahead won't match and a second copy button will be appended. The fix to also exclude blocks followed by {% include try-in-playground is correct and important.

Medium
Fix duplicate queries with different intended behaviors

The two sub-examples in Example 3 now use identical PPL queries (both lack
keepempty=true) and return identical results, making them indistinguishable. The
first sub-example should use keepempty=true to demonstrate retaining null values, as
the preceding description states "by default, records with null values are dropped."

docs/user/ppl/cmd/dedup.md [108-128]

-The following query deduplicates while ignoring documents with empty values in the specified field:
+The following query deduplicates by instrumentation scope name to see which OTel SDKs are reporting. By default, records with null values are dropped:
   
 ```ppl
 source=otellogs
-| dedup instrumentationScope.name
+| dedup instrumentationScope.name keepempty=true
 | fields instrumentationScope.name
 | sort instrumentationScope.name

-The query returns the following results:

-```text
-fetched rows / total rows = 3/3
-+-----------------------------------------------------------------------------+

- instrumentationScope.name
- @opentelemetry/instrumentation-http
- Microsoft.Extensions.Hosting
- go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc
-+-----------------------------------------------------------------------------+
-```
<details><summary>Suggestion importance[1-10]: 7</summary>

__

Why: Both sub-examples in Example 3 use identical queries without `keepempty=true`, making them indistinguishable despite the description implying different behaviors. The first sub-example should include `keepempty=true` to match its stated purpose.


</details></details></td><td align=center>Medium

</td></tr><tr><td>



<details><summary>Fix incorrect duplicate row in example output</summary>

___


**The result table for Example 3 shows <code>WARN</code> appearing twice with the same count (<code>4</code>), <br>which looks like a data error. With <code>override=true</code>, the subsearch values should <br>replace the main search values for matching rows, so <code>INFO</code> and <code>DEBUG</code> rows (which have <br>no match in the subsearch) should retain their original counts while <code>ERROR</code> and <code>WARN</code> <br>rows get replaced. The current output is misleading and inconsistent with the <br>described behavior.**

[docs/user/ppl/cmd/appendcol.md [96-103]](https://github.com/opensearch-project/sql/pull/5303/files#diff-d780d2b5919b1afcebbdbc20cc6a898119e430a7387532d8355a251ad895f203R96-R103)

```diff
 +-----+--------------+
 | agg | severityText |
 |-----+--------------|
+| 3   | DEBUG        |
 | 7   | ERROR        |
-| 4   | WARN         |
 | 6   | INFO         |
 | 4   | WARN         |
 +---...

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 2, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit f0e3a6a.

PathLineSeverityDescription
docs/user/ppl/cmd/stats.md18lowQuery result output is embedded directly inside section headings and aggregation function bullet points (e.g., '## fetched rows / total rows = 1/1 ... +arameters' and 'VAR_SAMfetched rows...+--...+arameter'). This corrupted content is anomalous and inconsistent with normal editing behavior, though it appears to be an accidental copy-paste error rather than deliberate injection.
docs/user/ppl/cmd/head.md15lowQuery result tables are embedded inside the '## Parameters' section heading and the parameter table header, producing malformed markdown. The same pattern as stats.md — anomalous but likely an accidental copy-paste error during the documentation rewrite.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 0 | Medium: 0 | Low: 2


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 2, 2026

Persistent review updated to latest commit d0b2a08

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 3, 2026

Persistent review updated to latest commit f0e3a6a

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 6, 2026

Persistent review updated to latest commit c051fd0

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 2000094

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@ritvibhatt ritvibhatt force-pushed the update-ppl-examples branch from 2000094 to 5121c46 Compare May 4, 2026 05:16
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Persistent review updated to latest commit 5121c46

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Persistent review updated to latest commit 3df36ba

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Persistent review updated to latest commit 844e0fe

@ritvibhatt ritvibhatt marked this pull request as ready for review May 4, 2026 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant