Unicode defines normalized forms for characters and character classes.
It might work to normalize strings to NFKD and remove any characters of class Mn (Nonspacing_Mark) (see table 12)
It might be necessary to specially handle conversions like ß to ss
See also python stack overflow answer
Unicode defines normalized forms for characters and character classes.
It might work to normalize strings to NFKD and remove any characters of class Mn (Nonspacing_Mark) (see table 12)
It might be necessary to specially handle conversions like ß to ss
See also python stack overflow answer