You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A comprehensive mechanistic interpretability library built on JAX and Equinox. Originally inspired by TransformerLens, IRTK provides 170+ analysis modules for understanding transformer internals — from basic logit lenses and activation patching to circuit discovery, sparse autoencoders, causal scrubbing, and representation engineering.
Initially vibe-coded by Opus, so YMMV. PRs welcome.
Installation
pip install irtk-jax
Development setup
git clone git@github.com:danielpcox/irtk.git
cd irtk
uv sync --extra dev
Requirements: Python 3.11+.
Quick start
importirtk# Load a pretrained model (GPT-2, GPT-Neo, GPT-NeoX, LLaMA, Mistral)model=irtk.HookedTransformer.from_pretrained("gpt2")
# Run with activation cachinglogits, cache=model.run_with_cache(tokens)
# Logit lens — decode residual stream at each layerirtk.logit_lens.logit_lens(model, tokens, cache)
# Activation patchingirtk.patching.activation_patch(model, clean_tokens, corrupt_tokens, cache)
# Circuit analysisirtk.circuits.direct_logit_attribution(model, tokens, cache)
# Attention head taxonomyirtk.head_analysis.find_induction_heads(model, tokens, cache)
IRTK's 170+ modules cover the full spectrum of mechanistic interpretability research. Every module is importable from the top-level package (irtk.<module>).
The notebooks/ directory contains 350+ Jupyter notebooks with worked examples and API demos covering every module.
Start here:00_getting_started.ipynb — a complete mechinterp investigation walkthrough from loading a model through logit lens, activation patching, and circuit analysis.
To run notebooks:
uv run jupyter lab notebooks/
If you don't have Jupyter installed yet:
uv add --dev jupyterlab
uv run jupyter lab notebooks/
Recommended reading order:
Notebook
Topic
00_getting_started
End-to-end mechinterp workflow
01_api_cheatsheet
Quick reference for every module
02_transformer_anatomy
Understanding transformer components
03_logit_lens
Logit lens and tuned lens
04_ioi_patching
Activation patching on IOI
05_linear_probes
Training linear probes
06_sparse_autoencoders
SAE feature discovery
07_circuit_analysis
OV/QK circuits and composition
08_gradient_interpretability
Gradient-based attribution
09_automatic_circuit_discovery
ACDC-style circuit finding
The remaining 340+ notebooks cover individual modules in depth — each module in the module reference above has a corresponding notebook.
Testing
uv run pytest # run full suite (4000+ tests)
uv run pytest irtk/tests/ -x -q # quick smoke test, stop on first failure