The Kibana ES|QL parser uses the ANTLR library for lexing and parse tree (CST) generation. The ANTLR grammar is imported from the Elasticsearch repository in an automated CI job.
We use the ANTLR outputs: (1) the token stream; and (2) the parse tree to generate (1) the Abstract Syntax Tree (AST), (2) for syntax validation, (3) for syntax highlighting, and (4) for formatting (comment and whitespace) extraction and assignment to AST nodes.
In general ANTLR is resilient to grammar errors, in the sense that it can produce a Parser tree up to the point of the error, then stops. This is useful to perform partial tasks even with broken queries and this means that a partial AST can be produced even with an invalid query.
The parser is structured as follows:
src/
|- parser/ Contains the logic to parse the ES|QL query and generate the AST.
| |- index.ts Main parser exports - primary entry point for consumers.
| |- core/ High-level parsing logic and AST building.
| | |- parser.ts Main Parser class with parse(), parseCommand(), parseExpression() methods.
| | |- cst_to_ast_converter.ts Converts ANTLR CST to ES|QL AST.
| | |- esql_error_listener.ts Collects syntax errors during parsing.
| | |- constants.ts Parser constants and configuration.
| | |- helpers.ts Utility functions for parsing operations.
| | |- types.ts Parser-specific type definitions.
| | └- index.ts Exports from the core parsing module.
| |
| |- antlr/ ANTLR-generated grammar files and assets.
| | |- esql_lexer.g4 ES|QL ANTLR lexer grammar.
| | |- esql_parser.g4 ES|QL ANTLR parser grammar.
| | |- esql_lexer.ts Generated TypeScript lexer.
| | |- esql_parser.ts Generated TypeScript parser.
| | |- promql_*.g4 / promql_*.ts PromQL grammar and generated files.
| | └- lexer_config.js Lexer configuration for ANTLR.
| |
| └- __tests__/ Parser tests and test utilities.
The parse function returns the AST data structure, unless a syntax error
happens in which case the errors array gets populated with a Syntax errors.
import { Parser } from '@elastic/esql';
const src = 'FROM index | STATS 1 + AVG(myColumn) ';
const { root, errors } = await Parser.parse(src);
if (errors) {
console.log({ syntaxErrors: errors });
}
// do stuff with the astThe root is the root node of the AST. The AST is a tree structure where each
node represents a part of the query. Each node has a type property which
indicates the type of the node.
When calling the parse method with the withFormatting flag set to true,
the AST will be populated with comments.
import { Parser } from '@elastic/esql';
const src = 'FROM /* COMMENT */ index';
const { root } = await Parser.parse(src, { withFormatting: true });You can use Parser.parseCommand() or Parser.parseExpression() to parse a single
command or expression, respectively. For example:
import { Parser } from '@elastic/esql';
const { root } = await Parser.parseExpression('count(*) + 1');By default, when parsing the AST does not include any formatting information, such as comments or whitespace. This is because the AST is designed to be compact and to be used for syntax validation, syntax highlighting, and other high-level operations.
However, sometimes it is useful to have comments attached to the AST nodes. The
parser can collect all comments when the withFormatting flag is set to true
and attach them to the AST nodes. The comments are attached to the closest node,
while also considering the surrounding punctuation.
Currently, when parsed inter-node comments are attached to the node from the left side.
Around colon in source identifier:
FROM cluster /* comment */ : index
Arounds dots in column identifier:
KEEP column /* comment */ . subcolumn
Cast expressions:
STATS "abc":: /* asdf */ integer
Time interface expressions:
STATS 1 /* asdf */ DAY
The pipeline is the following:
- ANTLR grammar files are added to this package.
- ANTLR grammar files are compiled to
.tsassets in theantlrfolder. - A query is parsed to a CST by ANTLR.
- The
CstToAstConvertertraverses the CST and builds the AST. - Optionally:
- Comments and whitespace are extracted from the ANTLR lexer's token stream.
- The comments and whitespace are attached to the AST nodes.
When a new command/option is added to ES|QL it is done via a grammar update. Therefore adding them requires a two step phase:
To update the grammar:
- Make sure the
lexerandparserfiles are up to date with their ES counterparts.
- an existing CI job is updating them already automatically
- Run the script into the
package.jsonto compile the ES|QL grammar. - write some code in the
CstToAstConverterto translate the Antlr Parser tree into the custom AST (there are already few utilites for that, but sometimes it is required to write some more code if theparserintroduced a new flow)
- pro tip: use the
http://lab.antlr.org/to visualize/debug the parser tree for a given statement (copy and paste the grammar files there)