Skip to content

Missing output for optional nodes #73

@kevinvanleer

Description

@kevinvanleer

First of all, thanks for developing this great library! I am prototyping a search grammar in javascript with the hopes of porting it to Java for uses in an android app.

I am building a search grammar that takes the form expression (operator expression)*. If I build the grammar using optional repetition, the optional nodes are missing from the output.

Example:

peg file

grammar CyclotrackSearch
  query <- (expression @" "? expr_op? @" "?)+
  expression <- negation? @" "? lvalue @" " operator @" " rvalue
  expr_op <- "and" / "or"
  lvalue <- "distance" / "date"
  rvalue <- [a-zA-Z0-9]+
  operator <- "is" / "equals" / "greater than" / "less than" / "=" / "<" / ">" / ">=" / "<="
  negation <- "not" / "!"

test app.js

const search = require('./cyclotrack-search')

let tree = search.parse(process.argv[2])

console.log(JSON.stringify(tree,null,2));

for (let node of tree) {
  console.log(node.offset, node.text);
}

If I attempt to parse the following string:

'not distance is 20 or not date is today and not date is tomorrow'

I get the following output using the URL parser sample code:

0 not distance is 20 or
22 not date is today and
44 not date is tomorrow

I expected to get:

0 not distance is 20
19 or
22 not date is today
40 and
44 not date is tomorrow

If I change the grammar such that all nodes are required I get the expected output.

grammar CyclotrackSearch
  #query <- expression @" " expr_op @" " expression @" " expr_op @" " expression
  query <- (expression @" "? expr_op? @" "?)+
  expression <- negation? @" "? lvalue @" " operator @" " rvalue
  expr_op <- "and" / "or"
  lvalue <- "distance" / "date"
  rvalue <- [a-zA-Z0-9]+
  operator <- "is" / "equals" / "greater than" / "less than" / "=" / "<" / ">" / ">=" / "<="
  negation <- "not" / "!"

If I print the entire tree I see that all required nodes are in the tree, but optional nodes are not. Here is an excerpt from an expression node:

"expression": {
        "text": "not distance is 20",
        "offset": 0,
        "elements": [
          {
            "text": "not",
            "offset": 0,
            "elements": []
          },
          {
            "text": "distance",
            "offset": 4,
            "elements": []
          },
          {
            "text": "is",
            "offset": 13,
            "elements": []
          },
          {
            "text": "20",
            "offset": 16,
            "elements": [
              {
                "text": "2",
                "offset": 16,
                "elements": []
              },
              {
                "text": "0",
                "offset": 17,
                "elements": []
              }
            ]
          }
        ],
        "lvalue": {
          "text": "distance",
          "offset": 4,
          "elements": []
        },
        "operator": {
          "text": "is",
          "offset": 13,
          "elements": []
        },
        "rvalue": {
          "text": "20",
          "offset": 16,
          "elements": [
            {
              "text": "2",
              "offset": 16,
              "elements": []
            },
            {
              "text": "0",
              "offset": 17,
              "elements": []
            }
          ]
        }
      }
    },

You can see that all nodes other than the optional negation node are listed. The negation node is parsed as a member of the elements list. This is also true for the expr_op nodes. They are only included in the tree if they are required nodes.

If I change the grammar to use a different form of repetition I get a third output:

query <- expression (@" " expr_op @" " expression)*

output:

0 not distance is 20
18  or not date is today and not date is tomorrow

However the expr_op nodes are included as nodes in the second blob.

It seems like

query <- expression (@" " expr_op @" " expression)*

and

query <- (expression @" "? expr_op? @" "?)+

should produce the same output and that the expr_op nodes and expression nodes should be at the same level in the tree. Your help is greatly appreciated!

EDIT: I was hoping the tree object would look something like this:

{
[
  {expression: {
    negation: "",
    lvalue: "",
    operator: "",
    rvalue: "",
  }},
  {expr_op: ""},
  {expression: {
    negation: "",
    lvalue: "",
    operator: "",
    rvalue: "",
  }},
  {expr_op: ""},
  {expression: {
    negation: "",
    lvalue: "",
    operator: "",
    rvalue: "",
  }}
]
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions