Skip to content

ZamAI-ORG/labs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

ZamAI Labs

ZamAI Labs is where we develop and publish foundational work that supports the ZamAI ecosystem — including Pashto language resources, datasets, processing pipelines, and model work.


What we publish

  • Pashto datasets and curated language resources
  • Text processing and normalization pipelines
  • Training experiments and model artifacts
  • Evaluation tools and reproducible research assets

Repositories

Area Repository Description Link
Processing pashto-processing Main repository for processing Pashto texts (normalization, cleaning, preprocessing). https://github.com/ZamAI-ORG/pashto-processing
Datasets pashto-datasets Curated and processed Pashto datasets with source attribution and dataset notes. https://github.com/ZamAI-ORG/pashto-datasets
Models zamai-models Model artifacts and experiments published by ZamAI Labs. https://github.com/ZamAI-ORG/zamai-models
Pipelines zamai-training-pipelines Central hub for data workflow and training/fine-tuning pipelines. https://github.com/ZamAI-ORG/zamai-training-pipelines
Training training-spaces Reusable training/experiment spaces (templates, scripts, reproducible runs). https://github.com/ZamAI-ORG/training-spaces
Fine-tuning phi3-mini-pashto-lora Pashto instruction-tuned LoRA adaptation of microsoft/Phi-3-mini-4k-instruct. https://github.com/ZamAI-ORG/phi3-mini-pashto-lora
mT5 mt5-pashto Pashto-focused mT5 experiments and fine-tuning work. https://github.com/ZamAI-ORG/mt5-pashto
Templates zamai-pashto-template Starter template for Pashto language projects in ZamAI Labs. https://github.com/ZamAI-ORG/zamai-pashto-template

Conventions

Dataset structure (recommended)

For dataset repositories/folders, we aim to keep:

  • raw/ (optional): small samples used for testing/validation
  • processed/: cleaned/normalized outputs ready for training
  • SOURCE.md: source URL + attribution + license/terms
  • notes.md: processing notes (what ZamAI changed)

Licensing & attribution

Licensing depends on the base datasets/models and their terms. Each repo should document:

  • original sources
  • required attribution
  • any restrictions (if applicable)

Contact

© 2026 ZamAI. Home of Zeerak.

About

ZamAI Labs — datasets, Pashto processing, models, and training pipelines powering the ecosystem.

Topics

Resources

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors