Skip to content

Latest commit

 

History

History
30 lines (18 loc) · 1.02 KB

File metadata and controls

30 lines (18 loc) · 1.02 KB

ComancheNLP

Advancing Uto-Aztecan Language Technologies: A Case Study on the Endangered Comanche Language

Authors: Jesus Alvarez C, Daua Karajeanes, Ashley Prado, John Ruttan, Ivory Yang, Sean O’Brien, Vasu Sharma, Kevin Zhu

Explore how we accelerate Comanche NLP by combining synthetic text pipelines and language ID to overcome data scarcity in endangered languages.

🔗 Read the full paper (AmericasNLP 2025)


🚀 Clone the Repo

git clone https://github.com/comanchegenerate/ComancheSynthetic.git
cd ComancheSynthetic

📂 What’s Inside

  • Datasets/: 412 phrase Comanche-English corpus, the first for this language.
  • comanche_synthetic_generation.py: Generate validated synthetic Comanche text via GPT-4 few-shot prompting.
  • language_identification.ipynb: Language identification experimentation showing effectiveness of few-shot examples on increasing accuracy.

🤝 Contributing

Feedback and pull requests welcome!