MiniLM is a small, efficient, and educational language model built from scratch, inspired by Karpathy's nanoGPT and the TinyStories research paper by Microsoft. This project is designed to help understand the full lifecycle of training a Transformer-based model, from data preparation to inference, in an accessible and minimalist codebase.
- βοΈ Pure PyTorch implementation β no external training libraries
- π Clean, well-commented code for learning and experimentation
- ποΈ Modular Transformer architecture
- π§Ύ Token-level and character-level tokenization options
- π Training on TinyStories-like synthetic dataset (or your own!)
- π¦ Efficient training with gradient accumulation and mixed precision
- π§ͺ Tiny inference script to generate new text from a prompt
- nanoGPT: A compact reimplementation of OpenAIβs GPT-style models by Andrej Karpathy, focusing on readability and simplicity.
- TinyStories: A Microsoft Research paper demonstrating that small transformer models (under 10M parameters) can be trained to generate coherent short stories when trained on domain-specific synthetic datasets.
This project blends both ideas: the simplicity and training loop style of nanoGPT with the scale and dataset philosophy of TinyStories.
- GPT-style Transformer decoder
- Causal self-attention
- Positional embeddings
- Configurable depth and width (e.g., 2-6 layers, 128-512 hidden units)
- Dropout for regularization