Skip to content

4.3 Using a ParseTree to Transform Text

Susan edited this page May 16, 2018 · 15 revisions

When ANTLR 4 generates a parser, it also generates a custom template for developing software to process any parse tree which has been output from that particular parser. This template, which functions as a listener to a parse tree walker, is the key that makes the ANTLR 4 approach to braille translating and to solving a large variety of other problems so powerful yet so easy to use. Since I'm generating ANTLR's default Java targets, the template is a Java class with empty methods named for each label supplied by the parser so from now on I'll use Java terminology. (If you aren't familiar with tree terminology, references are easy to find.)

Extending the Custom Template

The best way to use the template is as the basis of a new Java class that extends the template by overriding just those methods which are needed to accomplish whatever you are trying to do. Each method will get called or invoked automatically by the parse tree walker supplied by ANTLR; you do not have to implement a walker yourself! The parse tree walker automatically traverses every node in a parse tree by starting at the top or root node and "walking" down and back up in a zig-zag fashion that encounters each node in the order of the input and invokes the associated method. (If the extended class doesn't override a method, the tree walker will just invoke the empty method in the original template.) Actually each node (except for the leaf nodes at the bottom of the tree) is encountered twice, once when the walker enters it on the way down from its parent node and once when the walker exits it on the way back up to its parent node.

A Sample Transformation

Let's imagine that we want to extend the template to transform the original input to the example lexer on the previous page to (1) eliminate any individual letters, (2) replace each standalone "and" with an ampersand ("&"), and (3) enclose each "and" that is part of a longer word in parentheses. That is we want to convert the particular input

     dandy and z
     andiron stand and

to the output

     d(and)y &
     (and)iron st(and) &

Remember that parse tree created by the parser is the input to a template-based process that carries out a modification. As is typically the case, there are several different options for adding programming statements to the template so as to accomplish the example transformation. The parse tree display on the previous page suggests that the "exitAndAsWholeWord" and "exitAndAsPartWord" methods of the corresponding template are appropriate and easy to use choices. Since the parser has already correctly identified and grouped the input, it isn't necessary to examine the input to the first method before replacing it with an ampersand ("&"). As for the second method, it is necessary to examine and use its input in order to carry out the string manipulations for inserting the parentheses.

Is It Time to Go Back and Change the Parser Grammar?

Of course, if the example parser grammar had provided separate labels for the three possibilities where "and" is part of a longer word, the template would have provided different methods for each so the decision logic required when when using a single method for all three cases could be avoided.
(In my experience figuring out what's best for the parser to do, what's best left to the third step, and where to add instructions in the third step requires an iterative process which is luckily very easy to carry out since ANTLR can quickly generate new lexers, parsers, and templates when their grammars are changed. Of course, if your new parser grammar were to change some labels along with adding new ones, you might have to manually adapt your extended class to handle the name changes.)

What Should Be Done with the Transformations?

I haven't yet discussed what should be done with the transformations. Of course you could just print them out as they are created but that's not a very general solution. ANTLR supplies a very useful ParseTreeProperty object that essentially replicates a parse tree and can be used for storing transformations or what it refers to as "node annotations" or other useful information. This object, which is derived from an Identity Hash Map, uses pointers to the nodes in the parse tree as keys and whatever type you choose as values. Keeping the annotations in an accessible data structure prior to saving the final results facilitates further manipulations and transformations that make it possible to address situations much more complex translations than this simple example.

Clone this wiki locally