A complete GPT neural network in 31 lines of Python, encoded in vintage IBM punch cards.
From Andrej Karpathy's microgpt (240 lines) → attogpt (31 lines) — an 87.1% reduction while maintaining full functionality.
How small can a working GPT get?
- microgpt by Andrej Karpathy: 240 lines
- picogpt: 64 lines (shared as QR code)
- femtogpt: 53 lines (shared as 3000-digit prime number)
- attogpt: 31 lines (shared as 52 IBM punch cards!)
# Clone the repository
git clone https://github.com/yourusername/attogpt.git
cd attogpt
# Run the model (trains on names dataset)
python3 attogpt.pyAfter 1000 training steps, attogpt generates plausible new names:
step 996 / 1000 | loss 2.1018
step 997 / 1000 | loss 1.7791
step 998 / 1000 | loss 2.4764
step 999 / 1000 | loss 2.4730
step 1000 / 1000 | loss 2.6497
--- inference ---
sample 1: kamon
sample 2: ann
sample 3: karai
sample 4: jaire
sample 5: vialan
sample 6: karia
sample 7: yeran
sample 8: anna
sample 9: areli
sample 10: kaina
- Autograd Engine: Full backpropagation with computational graph
- Transformer Architecture: Multi-head self-attention mechanism
- Training Loop: Adam optimizer with bias correction
- Text Generation: Temperature-controlled sampling
- Position Embeddings: Learned positional encoding
- RMS Normalization: Layer normalization for stability
- Feed-Forward Networks: MLP blocks with ReLU activation
All in 3,117 characters using only Python's standard library.
To showcase the compression, attogpt is encoded in 52 IBM punch cards — the same format that powered computing in the 1960s and helped send humans to the moon. (Well, revisited due to lower cases incompatibility...)
- Python code → UTF-8 bytes
- UTF-8 bytes → Base64 encoding
- Base64 characters → 12-bit punch patterns
- 80 columns per card × 52 cards = complete neural network
# View the encoding
cat punchcard_encoding.json
# Decode back to Python
python3 decode_punchcard_base64.py
attogpt/
├── attogpt.py # The 31-line GPT implementation
├── microgpt.py # Original 240-line version (for comparison)
├── picotgpt.py # 68-line version
├── femtogpt.py # 53-line version
├── punchcard_encoding.json # All 52 punch card patterns
├── encode_punchcard_base64.py # Script to encode Python → punch cards
├── decode_punchcard_base64.py # Script to decode punch cards → Python
├── input.txt # Names dataset (auto-downloaded if missing)
└── README.md # You are here
Embedding: 16-dimensional
Attention: 4 heads, 1 layer
Context: 16 tokens
Parameters: ~5,000 total
Block size: 16 tokens
Vocabulary: 28 tokens (26 letters + BOS + space)
- Lambda functions: Replace method definitions
- Dictionary state: Flatten parameter structure
- Expression folding: Combine operations into single statements
- Variable reuse: Aggressive name reuse
- Inline operations: Eliminate intermediate variables
Despite the compression, attogpt produces identical results to microgpt.
| Version | Lines | Reduction | Shared As |
|---|---|---|---|
| microgpt | 240 | baseline | Text file |
| picogpt | 64 | -73.4% | QR code |
| femtogpt | 53 | -77.9% | Prime number |
| attogpt | 31 | -87.1% | 52 punch cards |
This project demonstrates:
- How small can neural networks get? Surprisingly small while staying functional
- Code compression techniques for Python
- Historical data formats and how far storage has evolved
- The elegance of transformers distilled to their essence
- Reversible encoding schemes for data preservation
None! attogpt uses only Python's standard library:
os- File operationsmath- Mathematical functionsrandom- Random number generationurllib.request- Dataset download (automatic)
Tested on Python 3.8+
# In attogpt.py, adjust these parameters:
num_steps = 1000 # Number of training steps
learning_rate = 0.01 # Learning rate
temperature = 0.5 # Generation temperature (higher = more creative)Replace input.txt with your own text file (one entry per line):
# attogpt will automatically use your custom dataset
docs = [l.strip() for l in open('input.txt').read().strip().split('\n')]Run all versions to see the outputs:
python3 microgpt.py # 240 lines, original
python3 picotgpt.py # 64 lines
python3 femtogpt.py # 53 lines
python3 attogpt.py # 31 lines (ours!)- Andrej Karpathy - Original microgpt (240 lines)
- Kuber - picogpt (64 lines, QR code format)
- MatchaOnMuffins - femtogpt (53 lines, prime number format)
This project builds on their excellent work to push compression even further.