A Transformer Interpretability Framework Built From Scratch
"To understand a model, you must first see what it sees."
Documentation • Quickstart • Notebooks • Research • Contributing
KAMUI is a decoder-only transformer language model and mechanistic interpretability framework built entirely from scratch in PyTorch.
No HuggingFace Trainer. No PyTorch Lightning. No black boxes.
Every weight matrix, every attention pattern, every residual stream activation is exposed, documented, and inspectable by design.
KAMUI is for researchers and students who want to understand how language models actually work — not just use them.
KAMUI is currently being built in public.
Current Progress:
Repository Foundation ██████████ ✅ complete
ModelConfig System ██████████ ✅ complete
Vocabulary System ██████████ ✅ complete
BPE Tokenizer ██████████ ✅ complete
Embeddings ██████████ ✅ complete
LayerNorm ██████████ ✅ complete
FeedForward Network ██████████ ✅ complete
Attention Mechanism ░░░░░░░░░░ ⏳ planned
Transformer Architecture ░░░░░░░░░░ ⏳ planned
Training Pipeline ░░░░░░░░░░ ⏳ planned
Hook System ░░░░░░░░░░ ⏳ planned
Logit Lens ░░░░░░░░░░ ⏳ planned
Activation Patching ░░░░░░░░░░ ⏳ planned
The roadmap and issue tracker reflect active development.
Most interpretability research is done on pretrained models (GPT-2, LLaMA) using tools that weren't designed for transparency. This creates two problems:
-
The model is a black box: you can probe it, but you don't know what choices were made in training, initialisation, or architecture.
-
The tools are abstractions:
model.run_with_cache()hides the hook system.AutoModelForCausalLMhides the architecture.
KAMUI removes both layers of opacity. You train the model yourself. You read every line of every tool.
| nanoGPT | TransformerLens | KAMUI | |
|---|---|---|---|
| Implemented from scratch | ✅ | ❌ | ✅ |
| Trains from scratch | ✅ | ❌ | ✅ |
| Full interpretability toolkit | ❌ | ✅ | ✅ |
| Context-managed hook system | ❌ | partial | ✅ |
| Educational notebooks (7) | ❌ | ❌ | ✅ |
| Research infrastructure | ❌ | ❌ | ✅ |
| Zero magic abstractions | ✅ | ❌ | ✅ |
git clone https://github.com/RithvikReddy0-0/kamui
cd kamui
pip install -e ".[all]"
pytestThis clones the repo, installs all dependencies in editable mode, and runs the test suite. The tests cover the components that are implemented so far — config, vocabulary, and tokenizer infrastructure.
Once the core components are complete, the intended interface will look like this:
Train a model
import kamui
model = kamui.KAMUITransformer.from_config("configs/small.yaml")
tokenizer = kamui.BPETokenizer.train("data/tinystories.txt", vocab_size=8192)
trainer = kamui.Trainer(model, tokenizer, config="configs/small.yaml")
trainer.train()Run logit lens
lens = kamui.LogitLens(model, tokenizer)
result = lens.run("The Eiffel Tower is located in the city of")
result.plot() # layer × token heatmap — watch "Paris" emergeFind induction heads
detector = kamui.InductionHeadDetector(model)
scores = detector.score_all_heads()
detector.plot_scores(scores)
# Expect high scores at layer 1, heads 2 and 5Causal intervention
patcher = kamui.ActivationPatcher(model)
effect = patcher.patch_all_layers(
clean="The Eiffel Tower is in Paris",
corrupted="The Eiffel Tower is in Berlin",
)
effect.plot() # which layer stores "Paris"?KAMUI is organised into five layers with a strict one-direction dependency:
tokenizer → model → hooks → mechinterp → evaluate
text input
↓ BPETokenizer (from scratch — no tiktoken)
token_ids (B, S)
↓ Embedding: token + positional
residual_stream (B, S, D)
↓ × n_layers:
Pre-LN → MultiHeadAttention → residual add
Pre-LN → FeedForward → residual add
residual_stream (B, S, D)
↓ Final LayerNorm → Linear unembedding
logits (B, S, V)
HookManager captures any activation above ↑
mechinterp tools use captured activations for analysis
| Tool | What it answers |
|---|---|
AttentionVisualizer |
What is each attention head attending to? |
LogitLens |
At each layer, what token does the model predict? |
LinearProbe |
At each layer, what linguistic properties are encoded? |
ActivationPatcher |
Which components are causally responsible for a behaviour? |
InductionHeadDetector |
Which heads implement in-context pattern matching? |
CircuitAblator |
What is the minimal circuit for a behaviour? |
| Notebook | What you learn |
|---|---|
00_bpe_tokenizer |
Build BPE tokenisation from first principles |
01_attention_mechanics |
Visualise attention in a 2-layer model |
02_training_dynamics |
Loss curves, gradient norms, LR schedules |
03_logit_lens |
Watch predictions evolve layer by layer |
04_activation_patching |
Causal interventions — find where facts live |
05_induction_heads |
Detect and ablate induction circuits |
06_circuit_analysis |
Reverse-engineer a complete behaviour |
KAMUI includes first-class research tooling:
research/
├── experiments/ # one folder per experiment (config + results + notes)
├── reports/ # written findings and paper drafts
├── figures/ # publication-quality plots
├── future/ # v0.2 design specs (SAEs)
└── RESEARCH_LOG.md # chronological experiment log
Every experiment is reproducible from its folder alone. The research log becomes the experiments section of your paper.
| Version | Scope | Status |
|---|---|---|
| v0.1 | Core transformer + 6 interpretability tools | 🔄 In progress |
| v0.2 | Sparse autoencoders, gradient attribution, RoPE | 📋 Designed |
See CHANGELOG.md for detailed version history.
# Minimal install (training + inference)
pip install kamui
# With visualisation tools
pip install "kamui[viz]"
# With Jupyter notebooks
pip install "kamui[notebooks]"
# Full development install
git clone https://github.com/RithvikReddy0-0/kamui
cd kamui
pip install -e ".[all]"
pre-commit installRequirements: Python 3.11+, PyTorch 2.1+
KAMUI is an open research project. See CONTRIBUTING.md.
The easiest first contribution is adding a new interpretability tool to
kamui/mechinterp/ — the hook system handles activation capture, you only
write the analysis logic.
Find open issues on GitHub Issues.
This project is built on a simple conviction:
Interpretability is not a feature. It is the prerequisite for trust.
We cannot trust systems we cannot understand. KAMUI is a tool for building that understanding — one component, one circuit, one forward pass at a time.
The framework is inspired by:
- nanoGPT — Andrej Karpathy's minimal GPT implementation
- TransformerLens — Neel Nanda's interpretability library
- Anthropic Interpretability Research — the circuits thread
If you use KAMUI in research, please cite:
@software{mukkara2026kamui,
author = {Mukkara, Rithvik Reddy},
title = {{KAMUI}: {K}nowledge {A}ctivation {M}apping \& {U}nderstanding {I}nterface},
year = {2026},
publisher = {GitHub},
url = {https://github.com/RithvikReddy0-0/kamui},
license = {MIT},
}MIT — see LICENSE.
Amrita Vishwa Vidyapeetham · CSE · 2027