You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.adoc
+53Lines changed: 53 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -137,6 +137,59 @@ Pattern is a regex expression. It should be representing as a string without `//
137
137
138
138
Result is a replacement a for pattern's match. It can contain a string, an Unicode characters specified by a hexadecimal number, a captured group reference. String with hexadecimal number or captured group reference should be double quoted. For example `"Y\u00eb"` or `"\\1\u00b7\\2"`. Captured group are referred by double backslash and group's number.
139
139
140
+
Because rules are applied in order, multiple rules applicable to the same segment of a string can be addressed by rule ordering, and rules can be used as priority over characters. For example:
141
+
142
+
[source,yaml]
143
+
----
144
+
map:
145
+
rules:
146
+
- pattern: \u03B3\u03B3 # γ (before Γ, Ξ, Χ)
147
+
result: ng
148
+
- pattern: (?<![Γγ])\u03B3(?=[ΕεέΗηήΙιίΥυύ]) # γ (before front vowels)
149
+
result: y
150
+
----
151
+
152
+
(γι maps to `yi`; but γγ maps to `ng`. In the case of γγι, the first rule takes priority, and the transliteration is `ngi`: it makes the second rule impossible.)
153
+
154
+
[source,yaml]
155
+
----
156
+
map:
157
+
rules:
158
+
- pattern: (?<=\b)\u03BC[πΠ] # μπ (initially)
159
+
result: b
160
+
- pattern: \u03BC[πΠ] # μπ (medially)
161
+
result: mb
162
+
----
163
+
164
+
(The first rule applies at the start of a word; the second rule does not specify a context, as it applies in all other cases not covered by the first rule.)
165
+
166
+
[source,yaml]
167
+
----
168
+
map:
169
+
rules:
170
+
- pattern: ";"
171
+
result: "?"
172
+
173
+
characters
174
+
"\u00B7": ";
175
+
----
176
+
177
+
(This guarantees that any `;` are converted to `?` before any new `;` are introduced; because all three are Latin script, they could be mixed up in ordering.)
178
+
179
+
Normally rules "bleed" each other: once a rule applies to a segment, that segment cannot trigger other rules, because it is already converted to Roman. Exceptionally, it will be necessary to have a rule add or remove characters in the original script, rather than transliterate them, so that the same context can be invoked by two rules in succession:
180
+
181
+
[source,yaml]
182
+
----
183
+
map:
184
+
rules:
185
+
- pattern: (?<=[АаЕеЁёИиОоУуЫыЭэЮюЯя])\u042b # Ы after any vowel character
186
+
result: "\u00b7Ы"
187
+
- pattern: \u042b(?=[АаУуЫыЭэ]) # Ы before а, у, ы, or э
188
+
result: "Ы\u00b7"
189
+
----
190
+
191
+
(If the result were "\u00B7Y", the second rule could not be applied afterwards; but we want ОЫУ to transliterate as `O·Y·U`. In order to make that happen, we preserve the Ы during the rules phase, resulting in О·Ы·У; we only convert the letters to Roman script in the `characters` phase.)
192
+
140
193
=== Testing transliteration systems
141
194
142
195
To test all transliteration systems in `maps` directory run a command:
0 commit comments