Skip to content

Commit 95c6745

Browse files
opoudjisronaldtse
authored andcommitted
Add alalc-ell 1997, elot-ell-743-2001, elot-ell-743-1982, un-ell-1987,
bgnpcgn-ell-1962, iso-ell-843
1 parent ab20c12 commit 95c6745

16 files changed

+4826
-38
lines changed

README.adoc

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,59 @@ Pattern is a regex expression. It should be representing as a string without `//
137137

138138
Result is a replacement a for pattern's match. It can contain a string, an Unicode characters specified by a hexadecimal number, a captured group reference. String with hexadecimal number or captured group reference should be double quoted. For example `"Y\u00eb"` or `"\\1\u00b7\\2"`. Captured group are referred by double backslash and group's number.
139139

140+
Because rules are applied in order, multiple rules applicable to the same segment of a string can be addressed by rule ordering, and rules can be used as priority over characters. For example:
141+
142+
[source,yaml]
143+
----
144+
map:
145+
rules:
146+
- pattern: \u03B3\u03B3 # γ (before Γ, Ξ, Χ)
147+
result: ng
148+
- pattern: (?<![Γγ])\u03B3(?=[ΕεέΗηήΙιίΥυύ]) # γ (before front vowels)
149+
result: y
150+
----
151+
152+
(γι maps to `yi`; but γγ maps to `ng`. In the case of γγι, the first rule takes priority, and the transliteration is `ngi`: it makes the second rule impossible.)
153+
154+
[source,yaml]
155+
----
156+
map:
157+
rules:
158+
- pattern: (?<=\b)\u03BC[πΠ] # μπ (initially)
159+
result: b
160+
- pattern: \u03BC[πΠ] # μπ (medially)
161+
result: mb
162+
----
163+
164+
(The first rule applies at the start of a word; the second rule does not specify a context, as it applies in all other cases not covered by the first rule.)
165+
166+
[source,yaml]
167+
----
168+
map:
169+
rules:
170+
- pattern: ";"
171+
result: "?"
172+
173+
characters
174+
"\u00B7": ";
175+
----
176+
177+
(This guarantees that any `;` are converted to `?` before any new `;` are introduced; because all three are Latin script, they could be mixed up in ordering.)
178+
179+
Normally rules "bleed" each other: once a rule applies to a segment, that segment cannot trigger other rules, because it is already converted to Roman. Exceptionally, it will be necessary to have a rule add or remove characters in the original script, rather than transliterate them, so that the same context can be invoked by two rules in succession:
180+
181+
[source,yaml]
182+
----
183+
map:
184+
rules:
185+
- pattern: (?<=[АаЕеЁёИиОоУуЫыЭэЮюЯя])\u042b # Ы after any vowel character
186+
result: "\u00b7Ы"
187+
- pattern: \u042b(?=[АаУуЫыЭэ]) # Ы before а, у, ы, or э
188+
result: "Ы\u00b7"
189+
----
190+
191+
(If the result were "\u00B7Y", the second rule could not be applied afterwards; but we want ОЫУ to transliterate as `O·Y·U`. In order to make that happen, we preserve the Ы during the rules phase, resulting in О·Ы·У; we only convert the letters to Roman script in the `characters` phase.)
192+
140193
=== Testing transliteration systems
141194

142195
To test all transliteration systems in `maps` directory run a command:

lib/interscript.rb

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ def transliterate(system_code, string)
2525
separator = mapping.character_separator || ""
2626
word_separator = mapping.word_separator || ""
2727
title_case = mapping.title_case
28+
downcase = mapping.downcase
2829

2930
charmap = mapping.characters&.sort_by { |k, _v| k.size }&.reverse&.to_h
3031

@@ -49,7 +50,7 @@ def transliterate(system_code, string)
4950
charmap.each do |k, v|
5051
while (match = output&.match(/#{k}/))
5152
pos = match.offset(0).first
52-
result = up_case_around?(output, pos) ? v.upcase : v
53+
result = !downcase && up_case_around?(output, pos) ? v.upcase : v
5354
result = result[0] if result.is_a?(Array) # if more than one, choose the first one
5455
output[pos, match[0].size] = add_separator(separator, pos, result)
5556
end

lib/interscript/mapping.rb

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,8 @@ class Mapping
1919
:destination_script,
2020
:character_separator,
2121
:word_separator,
22-
:title_case
22+
:title_case,
23+
:downcase
2324
)
2425

2526
def initialize(system_code, options = {})
@@ -74,6 +75,7 @@ def serialize_system_mappings(mappings)
7475
@character_separator = mappings["map"]["character_separator"] || nil
7576
@word_separator = mappings["map"]["word_separator"] || nil
7677
@title_case = mappings["map"]["title_case"] || false
78+
@downcase = mappings["map"]["downcase"] || false
7779
@rules = mappings["map"]["rules"] || []
7880
@postrules = mappings["map"]["postrules"] || []
7981
@characters = mappings["map"]["characters"] || {}

0 commit comments

Comments
 (0)