NerVE: Nonlinear Eigenspectrum Dynamics in LLM Feed-Forward Networks

Project Page | Paper | Demo Notebook

To appear at ICLR 2026

NerVE is a unified eigenspectral framework for understanding how feed-forward networks (FFNs) in large language models organize and regulate information flow in high-dimensional latent space. Despite FFNs dominating the parameter budget, their high-dimensional dynamics remain poorly understood. NerVE addresses this gap through lightweight, memory-efficient tracking of eigenspectrum dynamics via four complementary metrics.

Core finding: FFN nonlinearities do not merely rescale activations — they actively reinject variance across eigenmodes, reawakening previously inactive directions in high-dimensional latent space. Moreover, optimizer geometry strongly modulates the extent of this variance reinjection.

Metrics

NerVE tracks FFN eigenspectrum dynamics through four scale-invariant, distribution-aware metrics:

Metric	What it captures	Range
Spectral Entropy (SE)	Variance uniformity / dispersion	[0, ln D]
Participation Ratio (PR)	Effective latent dimensionality	[1, D]
Eigenvalue Early Enrichment (EEE)	Top-heaviness of the spectrum	[0, 1)
Jensen-Shannon Divergence (JS)	Distributional shift (pre → post activation)	[0, ln 2]

Installation

pip install nerve-spectral

Or install from source:

git clone https://github.com/nerve-eigenspectrum/NerVE.git
cd NerVE
pip install -e .

Requirements

pip install -r requirements.txt

Python >= 3.8
PyTorch >= 1.12
Transformers >= 4.20 (for analyzer and training callback)

Quick Start

Analyze any pretrained model (5 lines)

from transformers import AutoModelForCausalLM, AutoTokenizer
from nerve import NerVEAnalyzer

model = AutoModelForCausalLM.from_pretrained("EleutherAI/pythia-410m")
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-410m")

inputs = tokenizer("The role of feed-forward networks in transformers", return_tensors="pt")
analyzer = NerVEAnalyzer(model)  # auto-detects architecture
results = analyzer.analyze(inputs["input_ids"])
analyzer.print_summary(results)

Auto-detection supports: GPT-2, Pythia, GPT-J, LLaMA, Mistral, Gemma, Phi, Qwen2, and more. To add a new architecture:

NerVEAnalyzer.register_architecture(
    "my_model",
    block_path="backbone.layers",
    up="feed_forward.w1",
    down="feed_forward.w2",
    gated=True,
)

Monitor during training (with HuggingFace Trainer)

from nerve import FFNEigenMetricsCallback, register_ffn_hooks

callback = FFNEigenMetricsCallback(
    log_steps=200,          # compute every 200 steps
    device='cuda',
    output_dir="eigen_metrics_logs",
    num_layers=12,
    do_sampling=False,      # full-batch covariance (recommended)
)

trainer.add_callback(callback)
register_ffn_hooks(model, callback)
trainer.train()

Use metrics directly

import torch
from nerve import compute_covariance, compute_sorted_eigs, normalize_eigs
from nerve import compute_spectral_entropy, compute_participation_ratio, compute_eee, compute_js

# Given activations tensor X of shape [N, D]
cov = compute_covariance(X)
eigenvalues = compute_sorted_eigs(cov)
eigenvalues_norm = normalize_eigs(eigenvalues)

se = compute_spectral_entropy(eigenvalues_norm)    # Spectral Entropy
pr = compute_participation_ratio(eigenvalues)       # Participation Ratio
eee = compute_eee(eigenvalues)                      # Eigenvalue Early Enrichment

# JS divergence between pre and post activation spectra
js = compute_js(eigenvalues_norm_pre, eigenvalues_norm_post)

Key Findings

Experimental Axis	Key Finding
Activation (GELU vs ReLU)	Similar trend, distinct dynamics; GELU explores broader subspace
Norm-free models	GELU exhibits spectral inertia; ReLU compensates for missing LayerNorms
FFN weight geometry	Performance tracks sustained spectral flattening
Norm placement (Pre/Post/Mix)	PreLN: best return-on-width; PostLN: diminishing spectral returns
Positional encoding	RoPE prevents mid-to-deep spectral collapse
Optimizer (AdamW vs Muon)	Repair vs refinement; performance follows mid-layer capacity
Non-transformer (MLP-Mixer)	Core findings generalize beyond transformer architecture

Repository Structure

NerVE/
├── nerve/
│   ├── metrics.py          # Core metrics: SE, PR, EEE, JS (Section 2.2)
│   ├── analyzer.py         # Inference-time analysis for any pretrained model
│   ├── callback.py         # Training-time monitoring callback (Algorithm 1)
│   ├── trainer.py          # Custom HuggingFace Trainer
│   └── models/
│       └── gpt2.py         # GPT-2 architectural variants (Sections 3.1-3.4)
├── scripts/
│   └── run_clm.py          # Training entry point
├── configs/                # Hydra configuration files
└── notebooks/              # Demo notebooks (coming soon)

Reproducing Paper Experiments

All experiments are controlled via Hydra config overrides:

# Baseline GPT-2 with GELU (Section 3.1)
python scripts/run_clm.py model.activation_function=gelu

# ReLU variant
python scripts/run_clm.py model.activation_function=relu

# Norm-free models (Section 3.2)
python scripts/run_clm.py model.norm_type=free model.norm_position=free

# FFN weight geometry (Section 3.3)
python scripts/run_clm.py model.norm_type=free model.ffn_norm_type=spectral

# LayerNorm placement sweep (Section 3.4)
python scripts/run_clm.py model.norm_position=post    # PostLN
python scripts/run_clm.py model.norm_position=pre     # PreLN (default)
python scripts/run_clm.py model.norm_position=mixed model.post_ln_layers=4  # MixLN

# FFN width sweep (Section 3.4)
python scripts/run_clm.py model.mlp_width_mult=1   # D = 1d
python scripts/run_clm.py model.mlp_width_mult=4   # D = 4d (default)
python scripts/run_clm.py model.mlp_width_mult=8   # D = 8d

Citation

If you find NerVE useful in your research, please cite:

@inproceedings{jha2026nerve,
    title={NerVE: Nonlinear Eigenspectrum Dynamics in {LLM} Feed-Forward Networks},
    author={Nandan Kumar Jha and Brandon Reagen},
    booktitle={The Fourteenth International Conference on Learning Representations (ICLR)},
    year={2026},    
}

License

This project is licensed under the MIT License — see the LICENSE file for details.

Acknowledgments

Our GPT-2 implementation builds on the Simplified Transformers codebase.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
configs		configs
nerve		nerve
notebooks		notebooks
scripts		scripts
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
demo_pretrained.ipynb		demo_pretrained.ipynb
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NerVE: Nonlinear Eigenspectrum Dynamics in LLM Feed-Forward Networks

Metrics

Installation

Requirements

Quick Start

Analyze any pretrained model (5 lines)

Monitor during training (with HuggingFace Trainer)

Use metrics directly

Key Findings

Repository Structure

Reproducing Paper Experiments

Citation

License

Acknowledgments

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NerVE: Nonlinear Eigenspectrum Dynamics in LLM Feed-Forward Networks

Metrics

Installation

Requirements

Quick Start

Analyze any pretrained model (5 lines)

Monitor during training (with HuggingFace Trainer)

Use metrics directly

Key Findings

Repository Structure

Reproducing Paper Experiments

Citation

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages