Skip to content

jruttan1/ComancheNLP

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ComancheNLP

Advancing Uto-Aztecan Language Technologies: A Case Study on the Endangered Comanche Language

Authors: Jesus Alvarez C, Daua Karajeanes, Ashley Prado, John Ruttan, Ivory Yang, Sean O’Brien, Vasu Sharma, Kevin Zhu

Explore how we accelerate Comanche NLP by combining synthetic text pipelines and language ID to overcome data scarcity in endangered languages.

🔗 Read the full paper (AmericasNLP 2025)


Clone the Repo

git clone https://github.com/comanchegenerate/ComancheSynthetic.git
cd ComancheSynthetic

What’s Inside

  • Datasets/: 412 phrase Comanche-English corpus, the first for this language.
  • comanche_synthetic_generation.py: Generate validated synthetic Comanche text via GPT-4 few-shot prompting.
  • language_identification.ipynb: Language identification experimentation showing effectiveness of few-shot examples on increasing accuracy.

Contributing

Feedback and pull requests welcome!

About

The first computational modelling for the critically endangered language Comanche

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

No contributors

Languages

  • Jupyter Notebook 58.2%
  • Python 41.8%