Compress

Using the Roman alphabet (also known as the Latin alphabet) isn't very efficient and has led to inconsistencies in the language.

Why is s pronounced so many different ways?

So /s/ Sure /ʃ/ Measure /ʒ/

Furthermore, these inconsistencies and the adoption foreign words into the core English vernacular led to many ghost characters.

Half of this word is silent: Though

The Solution? Shavian

The Shavian alphabet (also known as the Shaw alphabet) is a constructed alphabet conceived to provide simple, phonemic orthography for the English language to replace the difficulties of conventional spelling using the Latin alphabet. It was posthumously funded by and named after Irish playwright Bernard Shaw.

How the Sausage is made

A text file is passed as the second argument of the program.

compress ./helloworld.txt

𐑣𐑩𐑤𐑩𐑫  𐑢𐑻𐑤𐑛

This file is parsed, and the text is converted to the IPA (International Phonetic Alphabet) with the cli tool espeak-ng
This IPA passage is then converted to Shavian

A summary of each step can be seen with the --summary flag

compress ./helloworld.txt --summary

###./helloworld.txt######################
==[ roman ]===============================
Contents: Hello World
Characters: 11
Words: 2
[-]=======================================

==[ ipa ]=================================
Contents: həlˈəʊ wˈɜːld
Characters: 13
Words: 2
[-]=======================================

==[ shavian ]=============================
Contents: 𐑣𐑩𐑤𐑩𐑫 𐑢𐑻𐑤𐑛
Characters: 10
Words: 2
[-]=======================================
##########################################

Roman to Shavian saves an entire character here 🤯

Possible Pronunciation Pitfalls

Shavian is an alphabet designed for the English language; therefore, it doesn't contain all the phonemes that are included in the IPA. This poses some challenges when the IPA is generated from espeak-ng. For instance, /ɹ/ (a rolled 'r' sound) isn't spoken in American English; instead, it would be pronounced /r/. These differences are caught and corrected to the best of my ability but are subject to accent interpretation.

pub fn predictive_fix(ipa: char) -> char {
   let mut case: HashMap<char, ShavianCharacter> = HashMap::new();

   case.insert('\u{0279}', ShavianCharacter { name: "ROAR", character: '\u{1046E}', phoneme: PhonemeCharacter { ipa: "\u{0072}", examples: ["r","",""] }});
   case.insert('\u{0250}', ShavianCharacter { name: "ADO", character: '\u{10469}', phoneme: PhonemeCharacter { ipa: "\u{0259}", examples: ["a","o",""] }});
   case.insert('\u{0069}', ShavianCharacter { name: "EAT", character: '\u{10470}', phoneme: PhonemeCharacter { ipa: "\u{0069}\u{02D0}", examples: ["ee","e",""] }});
   case.insert('\u{0061}', ShavianCharacter { name: "AH", character: '\u{1046D}', phoneme: PhonemeCharacter { ipa: "\u{0251}\u{02D0}", examples: ["a","",""] }});
   case.insert('\u{025C}', ShavianCharacter { name: "UP", character: '\u{10473}', phoneme: PhonemeCharacter { ipa: "\u{028C}", examples: ["u","",""] }});
   
   match case.get(&ipa) {
       Some(c) => c.character,
       None => ipa
   }
}

Convertion is as such:

/ɹ/ => /r/
/ɐ/ => /ə/
/i/ => /iː/
/a/ => /ɑː/
/ɜ/ => /ʌ/

And with that, I say, goodbye Roman and 𐑣𐑩𐑤𐑩𐑫 𐑖𐑭𐑝𐑾𐑯!

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
files		files
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Compress

The Solution? Shavian

How the Sausage is made

Possible Pronunciation Pitfalls

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

CoreyRobinsonDev/compress

Folders and files

Latest commit

History

Repository files navigation

Compress

The Solution? Shavian

How the Sausage is made

Possible Pronunciation Pitfalls

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages