Character-level GPT built from scratch in pure NumPy. No PyTorch, no frameworks — just matrix math and backpropagation.
Trained on a curated vocabulary of memetics, information warfare, cognitive science, persuasion, cryptography, and network science terms. Generates novel concept names in the same domain.
# Generate novel concepts (creative checkpoint)
python3 microgpt.py --load ckpt_creative.npz --generate_only --num_samples 50
# Generate accurate, domain-grounded terms (v9 full)
python3 microgpt.py --load ckpt_v9.npz --generate_only --num_samples 50
# Generate with novelty analysis
python3 microgpt.py --load ckpt_creative.npz --generate_only --num_samples 100 --novelty
# Train a new model (small, ~1 hour)
python3 microgpt.py --n_embd 64 --n_layer 6 --n_head 8 --block_size 48 \
--num_steps 200000 --save ckpt.npz
# Train a large model (~6 hours with v5 optimizations)
python3 microgpt.py --n_embd 128 --n_layer 6 --n_head 8 --block_size 64 \
--num_steps 200000 --lr 0.003 --save ckpt_large.npz- Multi-head causal self-attention with RMSNorm
- ReLU MLP (4x expansion)
- Cosine LR schedule with warmup
- AdamW optimizer
- Dropout on attention and MLP
| Feature | Description |
|---|---|
| First-char debiasing | Penalizes overrepresented starting characters (--debias 0.5) |
| Quality scoring | Automated pass/fail: uppercase, length, triple chars, vowel ratio, consonant soup, truncation |
| Repetition penalty | N-gram tracking to avoid repeated trigrams (--rep_penalty 1.5) |
| Post-processing | Auto-capitalize first letter |
| Novelty analysis | --novelty flag compares output to training data via Levenshtein distance |
| Sample deduplication | Within-run dedup with retry (MAX_RETRIES=10) |
| Garble filter | Rejects nonsense words via edit distance against training vocabulary |
| Novel term collection | --collect novel_terms.txt appends novel outputs with dedup |
| Self-contained checkpoints | Saves vocab + model config in .npz for zero-config loading |
| float32 training | 1.7x speedup on Apple Silicon vs float64 |
| Checkpoint | Params | Dataset | Steps | Quality | Novelty | Use for |
|---|---|---|---|---|---|---|
ckpt_creative.npz |
1.2M | 5,869 terms (v7) | 50K | 96% | 94.5% | Novel concept generation |
ckpt_v9.npz |
1.2M | 6,473 terms (v9) | 200K | 97% | 63% | Accurate, domain-grounded generation |
ckpt_v8.npz |
1.2M | 6,447 terms (v8) | 200K | 97% | 69% | Superseded by v9 |
ckpt_v7.npz |
1.2M | 5,869 terms (v7) | 200K | 88% | 93.5% | Superseded by v8 |
Two-checkpoint strategy: ckpt_creative.npz (v7 50K) is the novelty king at 94.5%. ckpt_v9.npz is the quality king at 97%. Novelty dropped in v8/v9 because the expanded dataset covers more concept space — a measurement artifact, not less creativity.
Note: ckpt_200k.npz and ckpt_200k_large.npz are legacy checkpoints without saved vocab. Use --data input_backup.txt when loading them.
input.txt — 6,508 terms across these clusters:
- Memetics & information warfare (original core)
- Cognitive biases & psychology
- Game theory & decision science
- Rhetoric & persuasion
- Network science & behavioral economics
- Cryptography & privacy
- Propaganda & IO techniques
- Disinformation & media manipulation
- MITRE ATT&CK, ATLAS & DISARM frameworks (AI attack/defense)
- OWASP LLM Top 10 (AI security)
- CAPEC attack patterns
- Dark patterns & deceptive design
- AI safety & alignment (MIRI, ARC, DeepMind)
- Cognitive security (COGSEC)
- EU AI Act & NIST AI RMF (governance)
- 167 curated novel terms from 6 flywheel cycles
| Script | Purpose |
|---|---|
novelty_check.py |
Standalone novelty analysis (edit distance vs training data) |
diversity_metrics.py |
Batch diversity: unique rate, entropy, type-token ratio |
audit_expanded.py |
Dataset quality audit: dupes, charset, length, near-dupes |
profile_instrumented.py |
Per-operation training profiler with FLOP estimation |
- v10 — (training) Flywheel cycle 6, 6,508 terms
- v9 — Flywheel cycle 6: 35 novel terms from v9 creative; 97% quality, 63% novel at 50K
- v8 — Flywheel cycle 5: 26 novel terms; 97% quality, 69% novel at 50K. Dataset expanded to 6,447 terms via ATT&CK, biases, fallacies, rhetoric, CAPEC, dark patterns
- v7 — Flywheel cycle 4: 13 novel terms → 5,869 terms; 94.5% novelty at 96% quality (50K sweet spot)
- v6 — Flywheel cycle 3: 25 novel terms → 5,856 terms; 87% novelty at 97% quality
- v5 — Dataset scaled to 5,831 terms via external sources (ATLAS, DISARM, OWASP, COGSEC, EU AI Act, NIST); float32 training, 1.7x speedup; garble filter, sample dedup,
--collectflag - v4 — Debiasing, quality scoring, repetition penalty, checkpoint vocab/config saving
- v3 — Cosine LR, AdamW, batched training, dropout, top-k/top-p sampling
v7 sweep (1.2M params, 5,869 terms) shows novelty staying above 88% across all checkpoints:
| Steps | Quality | Novel% | AvgDist | Notes |
|---|---|---|---|---|
| 10K | 92% | 99.5% | 0.395 | Near-random, very novel |
| 20K | 98% | 96% | 0.357 | Highest quality |
| 50K | 96% | 94.5% | 0.308 | Best balance — sweet spot |
| 90K | 90% | 92% | 0.314 | Quality dip in mid-range |
| 140K | 96% | 92.5% | 0.326 | Quality recovers |
| 180K | 94% | 94.5% | 0.327 | Late novelty peak |
| 200K | 88% | 93.5% | 0.329 | Quality drops at end |
Novelty progression across versions: v4 55% → v5 84% → v6 87% → v7 94.5% → v8 69% → v9 63%.
v8+ novelty drop reflects broader training coverage (6,447+ terms vs 5,869), not reduced creativity. The v7 creative checkpoint remains the novelty benchmark.
- Fuse QKV projections for additional 5-10% speedup
- Explore word-level tokenizer for multi-word pattern capture
- Scale dataset to 10-15K terms for further diversity
- Continue data flywheel: collect → curate → retrain (6 cycles completed, 167 novel terms curated)
- Find earlier sweet spot (20-30K) for v8+ dataset size to recover novelty