Skip to content

[RFC][FEATURE] PPL/SQL Query Linter & Static Analyzer #5405

@RyanL1997

Description

@RyanL1997

1. Problem Statement

1.1 Today: validation is post-execution

In OpenSearch SQL/PPL, every form of feedback the user gets about query quality lands after the query has already been dispatched (or, in the editor, after the parser fails outright). There is no advisory layer that runs before execution to warn the user about queries that are syntactically valid but semantically suboptimal or wrong.

The current state, by stage:

Stage Where it runs What it catches What it misses
ANTLR parse OSD Monaco worker + backend Syntax errors only — rendered as MarkerSeverity.Error red squigglies Anything past tokens/grammar
SemanticCheckException Backend Analyzer, at execution time Hard semantic failures (e.g., "can't resolve field X") Soft mistakes that don't fail the resolver
Cluster execution Backend executor Engine-level failures, OOMs, timeouts Wrong-but-completing results
Pre-execution advisory — (does not exist today) — Everything semantically suboptimal

Everything between "ANTLR parsed it" and "the cluster ran it" is invisible until after the user clicks Run. The Monaco editor's MarkerSeverity.Warning channel is currently unused.

1.2 What that means in practice

This is forgiving on input but unforgiving on output. The same query the user wrote is dispatched, possibly scans a year of data, and returns either an error, an empty result, or a wrong result. Examples we see in the wild:

  • ... | where service = "checkout" | stats count() by user_id on a 5-billion-doc index — runs to completion, OOMs the coordinator at the stats stage.
  • ... | where message like "%5xx%" against a keyword-mapped field — silently returns zero rows. The user assumes "no errors == no matches".
  • source = logs-* | head 100 against a wildcard pattern matching zero indices — empty result, no warning, indistinguishable from "no data in the time range".
  • ... | sort @timestamp after stats — sorts the wrong post-aggregation columns, producing non-deterministic output.
  • A typo in a field name (stauts vs status) — SemanticCheckException thrown, but only after the user submits and waits.

None of these are syntax errors. None necessarily throw SemanticCheckException early enough to be useful. The user discovers the problem only after the query has either run wrong or failed at the cluster.

1.3 Where peers are

Tool Pre-execution advisory Auto-fix Cost preview
Azure Data Explorer (Kusto) Yes — "Best Practices" analyzer Yes — Quick-fix Yes
BigQuery Studio Yes — partition-filter warnings No Yes — "will process N GB"
DataGrip Yes — named SQL inspections Yes No
OpenSearch SQL/PPL today No No No

We are the outlier. The capability exists in every comparable analytics product.

1.4 The gap

Summarized: the project has post-execution validation (parse errors, runtime exceptions, engine failures), but no pre-execution layer that gives the user advisory feedback while they are still composing the query. We need to add one.


2. Proposed Solution

2.1 The shift: pre-execution lint layer

Add a pre-execution lint layer that runs against the parsed query without executing it and returns structured diagnostics the editor can render inline. The layer is purely advisory: it never blocks, never executes data-plane queries, and never mutates state. It complements — not replaces — the existing post-execution validation.

                    ┌───────────────────────────────┐
                    │  PRE-EXECUTION (NEW)          │
  user types        │                               │
   query  ───────►  │   ANTLR parse  →  Lint layer  │  ───►  diagnostics in editor
                    │   (errors)        (warnings)  │
                    └───────────────────────────────┘
                                    │
                                    │ user clicks Run
                                    ▼
                    ┌───────────────────────────────┐
                    │  POST-EXECUTION (TODAY)       │
                    │                               │
                    │   Analyzer  →  Planner  →     │  ───►  results / errors
                    │                Executor       │
                    └───────────────────────────────┘

2.2 What ships

  1. Lint REST endpointsPOST /_plugins/_ppl/_lint and POST /_plugins/_sql/_lint accepting a query plus target index/data source, returning a structured diagnostics array without executing the query.
  2. Pluggable rule framework — adding a new rule should not require changes to the REST handler or the response schema. Rules are independent units that walk the parsed query and emit diagnostics.
  3. Initial rule catalog — at least 8 rules covering correctness, performance, and determinism categories, discovered from the existing PPL/SQL docs and exception sites (see §2.5).
  4. Frontend Monaco integration — debounced calls to the lint endpoint, results rendered as MarkerSeverity.Warning markers alongside the existing Error syntax markers, with hover tooltips and a one-click Apply fix action.
  5. Auto-fix mechanism — diagnostics that are auto-correctable carry a structured fix the editor applies on click.
  6. Cluster settings — operators can disable individual rules, adjust thresholds, or kill-switch the entire feature without a restart.

2.3 Goals & non-goals

Goals

  • G1. Stable REST contract for linting PPL and SQL queries against a target index/data source without executing them.
  • G2. Pluggable rule catalog that grows incrementally — both in-tree and (eventually) by other plugins.
  • G3. Ship at least 8 concrete rules at GA.
  • G4. Diagnostics carry enough metadata (position, severity, category, rule ID, optional fix) for editors to render a useful UX out of the box.
  • G5. OpenSearch Dashboards renders warnings inline within ~500 ms of a keystroke at p50.
  • G6. Linting is safe: never executes user data-plane queries, never mutates state, observes per-cluster opt-out.

Non-goals

  • Replacing SemanticCheckException for hard errors. The lint layer is additive and advisory.
  • Full Calcite EXPLAIN cost-model integration on day one (planned as a later phase).
  • Linting async data sources (Spark/Glue) end-to-end — different planner, different scope.
  • Blocking execution. Lint is advisory; the engine remains the source of truth for refusal.

2.4 Requirements

2.4.1 Must-have

  1. Lint endpointPOST /_plugins/_ppl/_lint accepting { query, index, dataSourceId? } and returning a structured diagnostics array. SQL counterpart symmetric.
  2. Pluggable rule framework — the REST surface and response schema do not change when a new rule is added.
  3. Diagnostic schema — every diagnostic includes:
    • ruleId (stable, kebab-case)
    • severity (error | warning | info | hint)
    • category (correctness | performance | determinism | style)
    • message (user-facing)
    • position (precise start/end line/column and offsets — must be reliable for editor underlines)
    • optional fix (structured edit: start/end offset + replacement text)
    • optional docUrl
  4. Initial rule catalog (≥ 8 rules) — discovered from PPL/SQL docs and exception sites; see §2.5.
  5. Severity-level configuration via cluster settings — operators can disable individual rules and adjust thresholds without a restart.
  6. Master switchplugins.sql.lint.enabled (default false until GA) so clusters can opt in.
  7. Frontend Monaco integration — debounced calls, results rendered as MarkerSeverity.Warning markers alongside existing Error markers; hover shows message + suggestion + docs link.
  8. Auto-fix payload — diagnostics that can be auto-corrected return a structured fix the editor applies via a code action.

2.4.2 Nice-to-have

  • Stretch rules beyond the GA catalog — additional candidates surfaced from the same docs / exception sources (see §2.5).
  • Cost-estimate badge (à la BigQuery's "This query will process X GB") — feasible once Calcite explain-cost is wired through.
  • External plugin SPI so third-party plugins can register rules without forking core.
  • Per-data-source rule subsets (some rules apply only to OpenSearch indices, others to Glue, etc.).
  • Telemetry on which fixes get accepted, to guide which rules earn their keep.

2.5 Rule discovery & catalog

Rather than prescribing a fixed list of rules upfront, this RFC asks the intern to discover the initial catalog from sources of truth that already exist in the project. The PPL and SQL documentation, the existing exception sites in the codebase, and the issue tracker collectively describe — with examples — every footgun the team has cared about enough to write down. Those are the natural rule candidates.

Where to look for rule candidates

  1. PPL command docsdocs/user/ppl/cmd/*.md. Many commands include "Note", "Limitation", or "Caveat" callouts describing wrong-result patterns. Each one is a rule candidate.
  2. PPL function docsdocs/user/ppl/functions/*.md. Function-specific footguns (regex semantics, type coercion, null handling) belong here.
  3. PPL general guidesdocs/user/ppl/general/. Cross-cutting topics like wildcards, identifiers, and time filters.
  4. PPL limitationsdocs/user/ppl/limitations/ and docs/user/limitations/. By definition these are documented constraints; every entry is a candidate for a pre-execution warning.
  5. SQL docsdocs/user/dql/, docs/user/admin/, docs/user/optimization/. Same pattern as PPL.
  6. SemanticCheckException throw sites — every place the analyzer throws SemanticCheckException is a hard error that fires at execution time today. Most of them have a natural pre-execution counterpart: the lint version surfaces the same condition as a warning while the user is still typing.
  7. Issues labelled bug / wrong-result / confused user in the opensearch-project/sql issue tracker. Recurring user reports point to gaps that lint can close.

What makes a good rule

  • It catches a real, repeatable mistake (cited in docs or issues).
  • It can be detected from the parsed query plus index metadata (mapping, _stats, _field_caps) — no execution required.
  • It has a clear, low-false-positive trigger condition, or a tunable threshold.
  • It belongs to one of: correctness, performance, determinism, style.

Where to look for rule exceptions (carve-outs)

For every rule, there are legitimate uses that should not fire a warning. The same docs are the source of those carve-outs: usage patterns the docs explicitly endorse, examples in the docs, and "this is fine when …" notes. A rule without documented carve-outs is probably under-specified.

For instance, a rule "warn on head without sort" has a documented carve-out "no warning when the upstream is already deterministic (e.g., top N, rare N, a single-row scalar source)" — that carve-out comes directly from the head command docs.

Catalog target

The minimum bar at GA is at least 8 rules spanning all four categories, with at least one auto-fix-capable rule per category where it makes sense. The intern proposes the actual list in week 1 (see §3) after surveying the sources above.

Each proposed rule is reviewed against:

  • Source citation (which doc entry / exception throw site / issue motivates it).
  • Documented carve-outs.
  • Default severity rationale.
  • Auto-fix design (or rationale for why it has none).

Per-rule cluster-settings example

plugins.sql.lint.enabled: true
plugins.sql.lint.rules.<rule-id>.severity: warning      # off | hint | info | warning | error
plugins.sql.lint.thresholds.<rule-id>.<param>: 100000   # rule-specific threshold

2.6 High-level design

The pre-execution layer sits next to the existing query path and reuses its parser and analyzer:

HTTP request
  → REST handler                        (new)
  → Parser (PPL or SQL)                 (existing ANTLR path, reused)
  → Lint context build                  (mapping, _stats, settings — cached)
  → Tolerant analyzer pass              (catches semantic errors as diagnostics, not exceptions)
  → Rule engine                         (each rule visits the plan + context, emits diagnostics)
  → JSON response

Two design points worth calling out:

  • Reuse the existing parser and analyzer. The lint engine should not re-implement type inference or field resolution — it should run the same code paths as a real query, but in a tolerant mode where what would normally throw becomes a diagnostic. This is where most of the high-value rules (field validation, type mismatch) come from "for free".
  • Diagnostics carry precise positions. Editors need offsets accurate to the offending token, not to the surrounding command. The structured position field is non-negotiable.

2.7 API contract

Request

POST /_plugins/_ppl/_lint
Content-Type: application/json

{
  "query":  "source=logs-* | where status = 500 | stats count() by user_id",
  "index":  "logs-*",
  "dataSourceId": "my_glue",        // optional, defaults to local cluster
  "rules":  ["field-validation"],   // optional allowlist
  "settings": {                     // optional per-call overrides
    "highCardinalityThreshold": 100000
  }
}

POST /_plugins/_sql/_lint is symmetric.

Response

{
  "took": 14,
  "diagnostics": [
    {
      "ruleId": "high-cardinality-groupby",
      "severity": "warning",
      "category": "performance",
      "message": "Grouping by 'user_id' (cardinality ≈ 4.2M) may exhaust coordinator memory.",
      "command": "stats",
      "position": {
        "startLine": 1, "startColumn": 51,
        "endLine":   1, "endColumn":   58,
        "startOffset": 50, "endOffset": 57
      },
      "fix": {
        "title": "Replace stats-by with `top N`",
        "edits": [
          { "startOffset": 50, "endOffset": 57, "replacement": "user_id | head 1000" }
        ]
      },
      "docUrl": "https://opensearch.org/docs/latest/.../lint/high-cardinality-groupby/"
    }
  ]
}

The endpoint always returns 200 if the request is well-formed — parse errors are surfaced as a single error-severity diagnostic with ruleId: "syntax" so the frontend renders them via the same code path.

2.8 Frontend integration (OpenSearch Dashboards)

  • An OSD server proxy route forwards /api/enhancements/ppl/lint to _plugins/_ppl/_lint (same shape as the existing _grammar proxy), threading dataSourceId for multi-DS clusters.
  • The Monaco PPL language module debounces lint calls (~500 ms) on content change, cancels in-flight requests on new keystrokes, and renders diagnostics as MarkerSeverity.Warning markers using the existing setModelMarkers() infrastructure.
  • A code-action provider reads each diagnostic's structured fix and exposes a one-click Apply fix in the lightbulb menu.
  • When the cluster has lint disabled, the proxy returns an empty diagnostics array; the editor never lights up.

No new UI framework, no new component library — pure addition on the existing Monaco infrastructure.

2.9 Cross-cutting concerns

  • Performance. Mapping and stats lookups must be cached (per cluster + index, short TTL) so repeated keystrokes don't hammer _mapping. Per-rule and per-request timeouts bound worst-case latency. Lint is best-effort: a rule that times out is silently skipped rather than failing the response.
  • Security. The endpoint reads _mapping and _stats for the target index — caller's role must already permit those reads. No privilege elevation. Audit-log impact (queries logged on every keystroke) is documented; the master switch lets operators opt out.
  • Multi-data-source. dataSourceId is plumbed end-to-end. For non-OpenSearch data sources, AST-only rules apply; metadata-dependent rules emit an info diagnostic noting they couldn't run.
  • Settings. plugins.sql.lint.enabled, plugins.sql.lint.rules.<rule-id>.severity, and per-rule thresholds are cluster settings (no restart required to change).
  • i18n. Diagnostic messages are externalized for translation; English ships day one.

3. Acceptance Criteria

  • POST /_plugins/_ppl/_lint and POST /_plugins/_sql/_lint return the documented schema for the eight rules from the catalog, with positive and negative integration tests for each.
  • Per-rule cluster setting flips a rule off without a restart.
  • Adding a new rule does not require changes to the REST handler or the response schema.
  • A sample external-plugin rule is discovered and run end-to-end without modifying the SQL plugin or core.
  • "How to add a rule" guide is published and the intern's own rules serve as canonical examples.
  • OSD Monaco renders warning markers within ~500 ms p50 against a 10-line query.
  • With plugins.sql.lint.enabled: false, no behavior change anywhere; the new endpoints are inert.

4. Open Questions

  1. Default-on or default-off at GA? Lean default-off until P6 ships and feedback is positive, then flip.
  2. Source of cardinality estimates for cardinality-sensitive rules. _field_caps doesn't expose value cardinality; a sampled terms aggregation is one possible path but needs a cost ceiling.
  3. Where do per-rule docs live? Proposal: docs/user/ppl/lint/<rule-id>.md, with each diagnostic's docUrl pointing into the published docs.
  4. Should error-severity diagnostics block submission if the user hits Run anyway? Recommended: no — the engine remains the source of truth for refusal. Lint is advisory by definition.
  5. Telemetry on rule emission and fix-acceptance — useful for prioritizing the catalog, but needs a privacy review before P6.
  6. Streaming / partial parses — could we lint mid-typing? Out of scope for v1; needs partial-AST recovery.

5. References

Primary sources for rule and carve-out discovery (in-repo)

Prior art (external)


Appendix — Illustrative end-to-end example

The rule IDs below are placeholders for shape only — the actual catalog is locked in week 1 from the discovery output (see §2.5).

Input

source=logs-* | where status = 500 | stats count() by user_id | head 100

Diagnostics emitted (truncated for brevity)

[
  {
    "ruleId": "missing-time-filter",
    "severity": "info",
    "category": "performance",
    "message": "Index 'logs-*' has a @timestamp field but no predicate filters on it. Full scans on time-series indices are usually unintended.",
    "fix": {
      "title": "Restrict to last 1 hour",
      "edits": [{ "startOffset": 16, "endOffset": 16, "replacement": "where @timestamp > now() - 1h | " }]
    }
  },
  {
    "ruleId": "high-cardinality-groupby",
    "severity": "warning",
    "category": "performance",
    "message": "Grouping by 'user_id' (cardinality ≈ 4.2M) may exhaust coordinator memory."
  },
  {
    "ruleId": "head-without-sort",
    "severity": "info",
    "category": "determinism",
    "message": "head returns an arbitrary subset of rows; output is non-deterministic."
  }
]

Three warnings on a query the user thought was fine. That is the value proposition.

Metadata

Metadata

Assignees

No one assigned

    Labels

    RFCRequest For CommentsenhancementNew feature or requestfeature

    Type

    No type

    Projects

    Status

    New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions