Exhaustive Circuit Mapping of a Single-Cell Foundation Model

Code and results for the paper:

Exhaustive Circuit Mapping of a Single-Cell Foundation Model Reveals Massive Redundancy, Heavy-Tailed Hub Architecture, and Layer-Dependent Differentiation Control

Ihor Kendiukhov

Department of Computer Science, University of Tubingen, Germany

Overview

This repository contains the analysis code and experimental results for three experiments that address systematic limitations in prior mechanistic interpretability work on single-cell foundation models:

Exhaustive Feature Tracing — Traces all 4,065 active sparse autoencoder (SAE) features at layer 5 of Geneformer V2-316M, yielding 1,393,850 significant downstream edges and revealing heavy-tailed hub architecture with systematic annotation bias.
Higher-Order Combinatorial Ablation — Extends pairwise ablation to three-way feature triplets (8 triplets, 7 conditions each), demonstrating that redundancy deepens monotonically with interaction order (three-way ratio 0.59 vs. pairwise 0.74) with zero synergy.
Trajectory-Guided Feature Steering — Causally tests 14 differentiation-associated switch features, establishing that late-layer features (L17) universally push cell states toward maturity while early/mid-layer features push away.

Repository Structure

sae-biological-map/
├── src/
│   ├── sae_model.py                    # TopK sparse autoencoder model (d=1152, 4x expansion, k=32)
│   ├── exhaustive_feature_tracing.py   # Experiment 1: exhaustive L5 circuit tracing
│   ├── higher_order_ablation.py        # Experiment 2: three-way combinatorial ablation
│   └── trajectory_steering.py          # Experiment 3: causal trajectory steering
├── results/
│   ├── exhaustive_tracing/
│   │   └── exhaustive_summary.json     # Summary statistics (1.39M edges, hub distribution)
│   ├── higher_order_ablation/
│   │   ├── summary.json                # Aggregate ablation results
│   │   └── triplet_*.json              # Per-triplet detailed results (8 files)
│   └── trajectory_steering/
│       ├── summary.json                # Aggregate steering results
│       ├── steering_F*_L*.json         # Per-feature steering results (14 files)
│       └── state_signatures.npz        # Early/late pseudotime gene signatures
├── paper/
│   ├── manuscript.tex                  # LaTeX source
│   ├── references.bib                  # Bibliography (63 entries)
│   └── figures/                        # Figures 1-6
├── requirements.txt
├── LICENSE
└── README.md

Prerequisites

Data

The following external datasets are required to reproduce the experiments:

K562 CRISPRi perturbation data (Replogle et al., 2022): Figshare
Tabula Sapiens immune subset (The Tabula Sapiens Consortium, 2022): CZ CELLxGENE
Geneformer V2-316M pretrained model (Theodoris et al., 2023): HuggingFace

Upstream dependencies

These experiments build on trained SAE models and extracted activations from a companion study (Kendiukhov, 2025). You will need:

Trained SAE checkpoints (sae_layer{N}.pt) for each Geneformer layer
Extracted residual-stream activations (layer_{N}_activations.npy)
Circuit tracing results from prior causal patching (for Experiment 2)
Trajectory dynamics results from prior pseudotime analysis (for Experiment 3)

Installation

conda create -n sae-bio python=3.10
conda activate sae-bio
pip install -r requirements.txt

Usage

Configure data paths via environment variables or edit the path constants at the top of each script:

export SAE_DATA_ROOT="/path/to/phase1_k562"      # SAE models and activations
export SAE_DATA_PATH="/path/to/replogle_concat.h5ad"  # K562 CRISPRi data

Experiment 1: Exhaustive Feature Tracing

python src/exhaustive_feature_tracing.py --n-cells 200 --source-layer 5

Traces all active features at layer 5 to downstream layers (L6, L11, L17). Outputs per-feature JSON files with resume support. Runtime: ~12 hours on Apple M2 Max.

Experiment 2: Higher-Order Ablation

python src/higher_order_ablation.py --n-cells 200 --n-triplets 10

Performs single, pairwise, and three-way ablation for 8 biologically motivated feature triplets. Runtime: ~2 hours.

Experiment 3: Trajectory Steering

python src/trajectory_steering.py --alphas 2.0,5.0 --n-cells 500

Amplifies 14 switch features in early-pseudotime immune cells and measures state shift toward maturity. Runtime: ~1 minute.

Key Results

Experiment	Key Finding	Main Metric
Exhaustive tracing	27x more edges than selective sampling; 40% of top-20 hubs unannotated	1,393,850 edges from 4,065 features
Higher-order ablation	Redundancy deepens; zero synergy at all orders	Three-way ratio = 0.59 (vs. pairwise 0.74)
Trajectory steering	L17 universally pushes toward maturity; L0/L11 push away	L17 fraction positive = 1.00

Compute Environment

All experiments were run on a MacBook Pro with Apple M2 Max (38-core GPU, 96 GB unified memory) using PyTorch 2.1 with MPS backend. Total compute: ~26.3 hours.

Citation

If you use this code or results, please cite:

@article{kendiukhov2025exhaustive,
  title={Exhaustive Circuit Mapping of a Single-Cell Foundation Model Reveals Massive Redundancy, Heavy-Tailed Hub Architecture, and Layer-Dependent Differentiation Control},
  author={Kendiukhov, Ihor},
  year={2025}
}

License

This project is licensed under the MIT License. See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exhaustive Circuit Mapping of a Single-Cell Foundation Model

Overview

Repository Structure

Prerequisites

Data

Upstream dependencies

Installation

Usage

Experiment 1: Exhaustive Feature Tracing

Experiment 2: Higher-Order Ablation

Experiment 3: Trajectory Steering

Key Results

Compute Environment

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
paper		paper
results		results
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Exhaustive Circuit Mapping of a Single-Cell Foundation Model

Overview

Repository Structure

Prerequisites

Data

Upstream dependencies

Installation

Usage

Experiment 1: Exhaustive Feature Tracing

Experiment 2: Higher-Order Ablation

Experiment 3: Trajectory Steering

Key Results

Compute Environment

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages