SmallDoge: Ultra-Fast Small Language Models

English | 简体中文

SmallDoge: Ultra-Fast Small Language Models

Train a 20M parameter language model in just 3 hours! 🚀

SmallDoge is a family of dynamic, ultra-fast small language models designed for efficiency and accessibility.

✨ Key Features

🚀 Ultra-Fast Training: 3-hour training for 20M models
💡 Innovative Architecture: Dynamic Mask Attention + Cross Domain MoE
🏎️ Lightning Inference: 142 tokens/s on i7-11 CPU
🔧 Complete Toolkit: Pre-training → Instruction Fine-tuning → Reasoning Fine-tuning
🌐 Web Interface: Built-in chat interface and OpenAI-compatible API

Webui-Doge-320M-Instruct running on i7-11 CPU

🚀 Quick Start

Installation

git clone https://github.com/SmallDoges/small-doge.git
cd small-doge
pip install -e .

Basic Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load model
model_name = "SmallDoge/Doge-60M-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

# Generate text
prompt = "Explain machine learning in simple terms:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Web Interface

# Install WebUI
pip install -e '.[webui]'

# Launch interface
small-doge-webui

Access: http://localhost:7860 (Frontend) | http://localhost:8000 (API)

📖 Detailed guides: Quick Start | Installation | Training

📊 Available Models

Model	Size	Speed (i7-11 CPU)	MMLU	Use Case
Doge-20M	20M	142 tok/s	25.4	Ultra-fast prototyping
Doge-60M	60M	62 tok/s	26.4	Balanced performance
Doge-160M	160M	28 tok/s	29.2	Better reasoning
Doge-320M	320M	16 tok/s	33.8	Production ready

Instruction Models: Add -Instruct to any model name for chat-optimized versions.

Checkpoints: Add -checkpoint for continued training (see Model Docs).

🏗️ Architecture

Key Innovations:

Dynamic Mask Attention: Dynamic attention mechanism for efficient long sequences
Cross Domain Mixture of Experts: Sparse experts with dense-to-sparse continuation training
WSD Scheduler: Warmup-Stable-Decay for seamless checkpoint resumption

🎓 Training Pipeline

SmallDoge supports complete three-stage training:

Pre-training → Base models (Doge-Base)
Instruction Fine-tuning → Chat models (Doge-Instruct)
Reasoning Fine-tuning → Reasoning models (Doge-Reason)

Key Features:

🚀 One-stop processor: Unified data handling across all stages
🔧 Flexible recipes: Pre-configured training configs
📊 Efficient training: Optimized for small models
🔄 Seamless continuation: WSD scheduler for checkpoint resumption

Training Times (RTX 4090):

Doge-20M: 14 hours | Doge-60M: 128 hours | Doge-160M: 522 hours | Doge-320M: 1856 hours

📚 Learn more: Training Guide

📈 Evaluation Results

Base Models

Model	MMLU	ARC	PIQA	HellaSwag	Winogrande
Doge-20M	25.4	29.8	58.4	27.3	50.2
Doge-60M	26.4	37.9	61.4	31.5	50.8
Doge-160M	29.2	44.4	70.1	43.4	52.2
Doge-320M	33.8	52.1	73.9	52.7	55.0

Instruction Models

Model	IFEval	MMLU	BBH	Performance
Doge-20M-Instruct	7.3	26.3	18.3	Good for basic chat
Doge-60M-Instruct	7.4	27.5	27.7	Balanced chat model
Doge-160M-Instruct	16.8	29.7	29.1	Advanced reasoning

🔍 Evaluation toolkit: Evaluation Guide

🛠️ Use Cases

🤖 Edge AI: Deploy on resource-constrained devices
🎮 Gaming: Real-time NPC dialogue and game mechanics
📱 Mobile Apps: On-device AI assistants
🔬 Research: Fast prototyping and experimentation
📚 Education: Learning AI/ML with manageable models
🏭 Industry: Lightweight production deployments

📦 Project Structure

small-doge/
├── src/small_doge/          # Core implementation
│   ├── models/              # Model architectures  
│   ├── trainer/             # Training code
│   ├── processor/           # Data processing
│   └── webui/               # Web interface
├── recipes/                 # Training recipes
│   └── doge/                # Doge model configs
├── examples/                # Tutorials & examples
├── evaluation/              # Evaluation toolkit
├── docs/                    # Documentation
└── assets/                  # Images & resources

🤝 Contributing

We welcome contributions! Here's how you can help:

🐛 Report bugs: GitHub Issues
💡 Suggest features: Discussions
📚 Improve docs: Submit PRs for documentation
🏋️ Share models: Contribute trained models and recipes
💬 Join community: Discord

📚 Documentation

📖 Quick Start - Get started in 5 minutes
⚙️ Installation - Detailed setup guide
🗃️ Dataset Processors - Data processing utilities
🎓 Training - Complete training pipeline
🤖 Models - Architecture and performance
🌐 WebUI - Web interface guide
🔧 Examples - Jupyter notebooks and tutorials
📊 Evaluation - Benchmarking toolkit

📄 Citation

@misc{smalldoges2025,
    title={SmallDoges: A Family of Dynamic Ultra-Fast Small Language Models}, 
    author={Jingze Shi and Yifan Wu and Bingheng Wu and Yuyu Luo},
    year={2025},
    month={March},
    url={https://github.com/SmallDoges/small-doge}
}

📄 License

This project is licensed under the Apache-2.0 License - see the LICENSE file for details.

Built with ❤️ by the SmallDoge Team

Give us a ⭐ if you find SmallDoge helpful!

Name		Name	Last commit message	Last commit date
Latest commit History 551 Commits
assets		assets
docs		docs
evaluation		evaluation
examples		examples
projects/litecot		projects/litecot
recipes		recipes
src/small_doge		src/small_doge
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README_zh.md		README_zh.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SmallDoge: Ultra-Fast Small Language Models

✨ Key Features

🚀 Quick Start

Installation

Basic Usage

Web Interface

📊 Available Models

🏗️ Architecture

🎓 Training Pipeline

📈 Evaluation Results

Base Models

Instruction Models

🛠️ Use Cases

📦 Project Structure

🤝 Contributing

📚 Documentation

📄 Citation

📄 License

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

License

SmallDoges/small-doge

Folders and files

Latest commit

History

Repository files navigation

SmallDoge: Ultra-Fast Small Language Models

✨ Key Features

🚀 Quick Start

Installation

Basic Usage

Web Interface

📊 Available Models

🏗️ Architecture

🎓 Training Pipeline

📈 Evaluation Results

Base Models

Instruction Models

🛠️ Use Cases

📦 Project Structure

🤝 Contributing

📚 Documentation

📄 Citation

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages