It's an effective Rust implementation of phonetic Polyphon algorithm.
Original paper: «Polyphon: An Algorithm for Phonetic String Matching in Russian Language».
Authors: Viacheslav V. Paramonov, Alexey O. Shigarov, Gennagy M. Ruzhnikov, Polina V. Belykh.
We propose a new phonetic algorithm to string matching in Russian language without transliteration from Cyrillic to Latin characters. It is based on the rules of sounds formation in Russian language.
Add the dependency:
[dependencies]
polyphon = "1.0"And then use:
use polyphon::encode;
let code = encode("литие"); // -> "лата"Note: encode works on a single word and removes any non-Russian characters (including spaces). If you want to encode multiple words, split them first and encode each separately.
There is a python wrapper for the public API. You can find it here.
src/lib.rs— public interface;src/normalize.rs— normalization (incl.normalizefunction);src/rules.rs— other algorithm steps.