-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathREADME.html
More file actions
204 lines (190 loc) · 8.61 KB
/
README.html
File metadata and controls
204 lines (190 loc) · 8.61 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
<html>
<head>
<title>dsl-pa Documentation</title>
<style type="text/css">
body { font-family: Arial, Helvetica, sans-serif; }
h1 { color:#0005C8; }
h2 { color:#0005C8; }
h3 { color:#0005C8; }
h4 { color:#0005C8; }
code { color:#800000; }
pre.cmd { color:#0000c0; }
pre.code { color:#600000; }
pre.indented { margin-left:1.3cm; }
pre.proto { color:#800000; }
pre.anot { color:#0000c0; }
pre.conf { color:#0000c0; }
pre.schema { color:#0000ff; }
pre.xml { color:#0000ff; }
span.xtag { color:#800000; }
span.xattr { color:#ff0000; }
span.ckw { color:#0000ff; }
span.ccom { color:#008000; }
p.content-link { margin-top: 2px; margin-bottom: 4px; border: 2px 0; padding: 2px 0; }
.desc { background-color:#99DAFF; width:100%; padding: 5px; border-width: thin; border-style: solid; border-color: #99DAFF; }
.synop { color:#0000ff; background-color:#B7E5FF; padding: 5px; }
.flist { background-color:#FFF; padding: 5px; }
.ret { color: #FF9900; text-align: right; }
.fname { color: #800000; }
.args { color: #00f; }
.comment { color: #008000; }
</style>
</head>
<body>
<h1>dsl-pa Documentation</h1>
<p>dsl-pa is a Domain Specific Language Parsing Assistant library designed to
take advantage of the C++ logic shortcut operators such as && and ||.
dsl-pa function calls extract sections of language input based on a
specified alphabet. Calls to dsl-pa functions are combined using the C++
shortcut operators in order to parse a language in a reasonably compact
form. Consequently all dsl-pa functions return either a boolean value or
a value, such as an integer, that can be implicitly converted to a bool.
The idea is that the set of shortcut operations should mirror the
<a href='http://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form' title="BNF Notation">BNF</a> or
similar notation (such as
<a href='http://en.wikipedia.org/wiki/Augmented_Backus%E2%80%93Naur_Form' title="ABNF Notation">ABNF</a>)
that is used to describe the langauge being parsed.
Various reader classes can be used to read from file, std::string and
memory buffer sources.
</p>
<h2>Project Status</h2>
<p>This project is now ready for beta use by thrid-parties.</p>
<h1>Contents</h1>
<ul>
<li><a href='#dsl_pa'>The dsl_pa class</a></li>
<li><a href='#char-set'>Character sets</a></li>
<li><a href='#readers'>Readers</a></li>
<li><a href='#alphabets'>Alphabets</a></li>
</ul>
<a name='dsl_pa' />
<h1>The dsl_pa class</h1>
<p>dsl_pa is the main class of the library</p>
<h2>To inherit or not</h2>
<p>Possibly the simplest way to use <code>dsl_pa</code> is to create your own class that inherits <code>dsl_pa</code>.
This way all the methods in <code>dsl_pa</code> become first class citizens and the code is more compact.
Inheritance seems justified here on the basis that your class IS-A parser. However, you don't need to use
<code>dsl_pa</code> this way, and most of the test code uses <code>dsl_pa</code> by construction.
</p>
<p>If you do use inheritance to use <code>dsl_pa</code> and you use some form of factory for creating
instances of parsers, you can implement the <code>parse()</code> virtual method in <code>dsl_pa</code>
to invoke your parsing.
</p>
<h2>Using C++ shortcut operators</h2>
<p><code>dsl_pa</code> functions are designed so that they can be combined using C++'s shortcut logical
operators, such as && and ||. For example, if the strucure of your input is:
<pre class='code'>
width=10, height = 5
</pre>
</p>
<p>then you might parse it similar to the following:
</p>
<pre class='code'>
size_t width, height;
if( opt_space() &&
fixed( "width" ) &&
opt_space() && is_char( '=' ) && opt_space() &&
get_uint( &width ) &&
opt_space() && is_char( ',' ) && opt_space() &&
fixed( "height" ) &&
opt_space() && is_char( '=' ) && opt_space() &&
get_uint( &height ) )
{
fout << "Example 1 OK: w=" << width << " & h=" << height << "\n";
}
else
{
fout << "Unable to parse input\n";
}
</pre>
<h2>Skipping white space</h2>
<p>A common operation when parsing a language is skiiping white space. This can be done using the
following dsl_pa methods:</p>
<pre class='code'>
bool space();
bool opt_space();
bool wsp(); // From ABNF (RFC5234) whitespace: non-newline space chars
bool opt_wsp();
</pre>
<p>space() requires there to be white space (space, tab, plus newline characters) present in order to
return true and opt_space() allows white space to be optional.</p>
<p>wsp() and opt_wsp() are similar to space() and opt_space(), but only accepts non-newline whitespace
characters. They are handy for parsing line based syntaxes.</p>
<h3>Explicit or implicit space</h3>
<h2>dsl_pa primary getters</h2>
<p><code>dsl_pa</code> has a number of functions for reading items of data into std::strings. These are:</p>
<pre class='code'>
size_t /*num chars read*/ get( std::string * p_output, const alphabet & r_alphabet );
size_t get( std::string * p_output, const alphabet & r_alphabet, size_t max_chars );
size_t get_until( std::string * p_output, const alphabet & r_alphabet );
size_t get_bounded_until( std::string * p_output, const alphabet & r_alphabet, size_t max_chars );
size_t get_escaped_until( std::string * p_output, const alphabet & r_alphabet, char escape_char );
size_t get_until( std::string * p_output, const alphabet & r_alphabet, char escape_char, size_t max_chars );
</pre>
<p>In the above functions the p_output argument points to the instance of std::string that is to receive the output.
Any parsed data is appended to the string. This allows multiple get...() functions to accumulate data into a
single string. If you wish to clear a std::string first you can use the dsl_pa::clear() method within your set of
shortcut operators.</p>
<p>The r_alphabet parameter takes a reference to an alphabet class. An alphabet specifies the set of characters
that you want to either parse into your string (using the get() methods) or not get in your string (using the
get...until() methods.)</p>
<h2>dsl_pa type specific getters</h2>
<p><code>dsl_pa</code> has a number of functions for reading items of data such as integers, floats
and booleans.</p>
<pre class='code'>
bool /*is_parsed*/ get_bool( std::string * p_input );
bool /*is_parsed*/ get_bool( bool * p_bool );
size_t /*num chars read*/ get_int( std::string * p_num );
size_t /*num chars read*/ get_int( int * p_int );
size_t /*num chars read*/ get_uint( std::string * p_num );
size_t /*num chars read*/ get_uint( unsigned int * p_int );
bool get_float( std::string * p_num );
bool get_float( double * p_float );
bool get_float( float * p_float );
bool get_sci_float( std::string * p_num );
bool get_sci_float( double * p_float );
bool get_sci_float( float * p_float );
</pre>
<h2>Reading fixed text</h2>
<p>Occasionally when parsing a langauge you need to parse fixed text. The following dsl-pa functions support this:</p>
<pre class='code'>
bool fixed( const char * p_seeking );
bool ifixed( const char * p_seeking );
bool get_fixed( std::string * p_output, const char * p_seeking );
bool get_ifixed( std::string * p_output, const char * p_seeking );
</pre>
<p>If the specified text is not found in its entirity then the parser's input location will be restored to
where it was before the function was called.</p>
<p>
The methods <em>without</em> get_ in the name with simply skip over the specified text if it is present. The methods
<em>with</em> get_ in the name will collect the parsed text. The methods with _ifixed in the name are
case-insensitive for characters in the ASCII range. The other methods are case sensitive.</p>
<h2>Back tracking and recording locations</h2>
<h3>The location_logger class</h3>
<h2>dsl_pa helpers</h2>
<h3>dsl_pa::optional()</h3>
<h3>dsl_pa::optional_rewind()</h3>
<h3>dsl_pa::on_fail()</h3>
<h3>dsl_pa::set()</h3>
<h3>dsl_pa::record()</h3>
<h3>dsl_pa::clear()</h3>
<h3>dsl_pa::append()</h3>
<h3>dsl_pa::error() and dsl_pa::error_fatal()</h3>
<a name='char-set' />
<h1>Character sets</h1>
<p>
The language input is assumed to be in UTF-8 encoding and operations are done on C++ <code>char</code>s. It is assumed that
any parsing decisions of the input language will be based on the ASCII range of characters and any char valued 0x80 and
above is just considered to be part of "the rest" of Unicode. Hence dsl-pa operates on a <code>char</code> by <code>char</code> basis and isn't
multi-byte character aware.
</p>
<p>
A value of <code>0</code> is used to indicate the end of input has been reached.
</p>
<a name='readers' />
<h1>Readers</h1>
<h2>Reader locations</h2>
<a name='alphabets' />
<h1>Alphabets</h1>
<h2>Short alphabet names</h2>
</body>
</html>