On the Mechanism and Dynamics of Modular Addition

Fourier Features, Lottery Ticket, and Grokking

Jianliang He, Leda Wang, Siyu Chen, Zhuoran Yang Department of Statistics and Data Science, Yale University

Overview

This repository provides the code for studying how a two-layer neural network learns modular arithmetic $f(x,y) = (x+y) \bmod p$. We analyze three phenomena:

Fourier Feature Learning — Each neuron independently discovers a cosine wave at a single frequency, collectively implementing a discrete Fourier transform that the network was never taught.
Lottery Ticket Dynamics — Random initialization determines which frequency each neuron will specialize in: the frequency with the best initial phase alignment wins a winner-take-all competition.
Grokking — Under partial data with weight decay, the network first memorizes, then suddenly generalizes through a three-stage process: memorization → sparsification → cleanup.

An Interactive Demo on Hugging Face Spaces visualizes all results with 9 analysis tabs, interactive Plotly charts, and on-demand training for any odd $p \geq 3$. Pre-computed examples are included for $p = 15, 23, 29, 31$.

Launch Locally

pip install -r requirements.txt
python hf_app/app.py
# Opens at http://localhost:7860

Deploy to Hugging Face Spaces

We use the Hugging Face Python API to upload to Spaces, since HF now requires Xet storage for binary files (PNGs, etc.) which standard git push does not handle.

First-time setup:

pip install huggingface_hub hf_xet

Log in (get a write token from https://huggingface.co/settings/tokens):

huggingface-cli login

Upload to the Space:

python deploy_to_hf.py
# or with a custom commit message:
python deploy_to_hf.py --message "Update app"

The deploy script prepends the required HuggingFace Space metadata (SDK config, app path, etc.) to README.md before uploading, so the GitHub README stays clean.

What gets uploaded: Only the files the app needs — hf_app/, precompute/, precomputed_results/, src/, requirements.txt, README.md. Model checkpoints, notebooks, and figures are excluded.

On-demand training: Users can generate results for new $p$ values directly from the app's "Generate" button. Streaming logs show real-time training progress. New results are auto-committed back to the Space repo so they persist across restarts.

Tip: For GPU-accelerated on-demand training, select a GPU runtime in your Space settings.

Pre-computation Pipeline

The precompute/ directory trains 5 model configurations per modulus and generates all plots + interactive JSON data. See precompute/README.md for full documentation.

Quick Start

# Full pipeline for a single modulus (train → plots → analytical → verify)
bash precompute/run_pipeline.sh 23

# With custom d_mlp
bash precompute/run_pipeline.sh 23 --d_mlp 128

# Delete checkpoints after generating plots (saves disk space)
CLEANUP=1 bash precompute/run_pipeline.sh 23

# Batch: all odd p in [3, 99]
bash precompute/run_all.sh

# Or up to p=199
MAX_P=199 bash precompute/run_all.sh

Manual Steps

# Step 1: Train all 5 configurations
python precompute/train_all.py --p 23 --output ./trained_models --resume

# Step 2: Generate model-based plots (21 PNGs + 7 JSONs)
python precompute/generate_plots.py --p 23 --input ./trained_models --output ./precomputed_results

# Step 3: Generate analytical simulation plots (2 PNGs, no model needed)
python precompute/generate_analytical.py --p 23 --output ./precomputed_results

Output

Each modulus produces ~33 files in precomputed_results/p_XXX/:

Category	Files	Description
Overview (Tab 1)	2 PNGs + 1 JSON	Loss, IPR, phase scatter
Fourier Weights (Tab 2)	3 PNGs + 1 JSON	DFT heatmaps, cosine fits, neuron spectra
Phase Analysis (Tab 3)	3 PNGs	Phase distribution, alignment, magnitudes
Output Logits (Tab 4)	1 PNG + 1 JSON	Logit heatmap, interactive explorer
Lottery Mechanism (Tab 5)	3 PNGs	Magnitude race, phase convergence, contour
Grokking (Tab 6)	5 PNGs + 3 JSONs	Loss/acc curves, memorization, weight evolution
Gradient Dynamics (Tab 7)	4 PNGs	Phase alignment + DFT for Quad and ReLU
Decoupled Simulation (Tab 8)	2 PNGs	Analytical ODE integration
Metadata	2 JSONs	Config + training log

Note: Grokking results (Tab 6) require $p \geq 19$. Smaller values of $p$ have too few data points for a meaningful train/test split.

The 5 Training Configurations

Config	Activation	Optimizer	LR	Weight Decay	Data	Epochs	Used In
`standard`	ReLU	AdamW	5e-5	0	100%	5,000	Tabs 1–4
`grokking`	ReLU	AdamW	1e-4	2.0	75%	50,000	Tabs 1, 6
`quad_random`	Quad	AdamW	5e-5	0	100%	5,000	Tab 5
`quad_single_freq`	Quad	SGD	0.1	0	100%	10,000	Tab 7
`relu_single_freq`	ReLU	SGD	0.01	0	100%	10,000	Tab 7

Running a Single Experiment

For custom experiments outside the pre-computation pipeline:

cd src

# Train with default config (p=97, d_mlp=1024, ReLU, 5000 epochs)
python module_nn.py

# Train with specific parameters
python module_nn.py --p 23 --d_mlp 512 --num_epochs 5000 --lr 5e-5

# Dry run: see config without training
python module_nn.py --dry_run --p 23 --d_mlp 512

Notebooks

Interactive analysis notebooks in notebooks/:

Notebook	Description
`empirical_insight_standard.ipynb`	Fourier weight analysis, phase distributions, output logits
`empirical_insight_grokk.ipynb`	Grokking stages, weight dynamics, IPR evolution
`lottery_mechanism.ipynb`	Neuron specialization, frequency magnitude/phase tracking
`interprete_gd_dynamics.ipynb`	Phase alignment under single-frequency initialization
`decouple_dynamics_simulation.ipynb`	Analytical gradient flow simulation

Setup

Requirements

Python 3.8+
PyTorch 2.0+
CUDA-capable GPU (recommended for $p > 50$; CPU works for small $p$)

Installation

git clone https://github.com/Y-Agent/modular-addition-feature-learning.git
cd modular-addition-feature-learning
pip install -r requirements.txt

Project Structure

modular-addition-feature-learning/
├── src/                          # Core source code
│   ├── module_nn.py             # Training script with CLI
│   ├── nnTrainer.py             # Training loop and optimization
│   ├── model_base.py            # Neural network architecture (EmbedMLP)
│   ├── mechanism_base.py        # Fourier analysis and decomposition
│   ├── utils.py                 # Configuration and helpers
│   └── configs.yaml             # Default hyperparameters
├── precompute/                   # Batch training and plot generation
│   ├── run_pipeline.sh          # Full pipeline for one modulus
│   ├── run_all.sh               # Batch pipeline for all odd p
│   ├── train_all.py             # Train 5 configurations
│   ├── generate_plots.py        # Generate model-based plots + JSONs
│   ├── generate_analytical.py   # Analytical ODE simulation plots
│   └── prime_config.py          # Configurations and sizing formula
├── hf_app/                       # Gradio web application
│   └── app.py                   # Interactive visualization app
├── precomputed_results/          # Pre-computed plots and data
│   ├── p_015/                   # Results for p=15
│   ├── p_023/                   # Results for p=23
│   ├── p_029/                   # Results for p=29
│   └── p_031/                   # Results for p=31
├── notebooks/                    # Analysis and visualization notebooks
├── requirements.txt              # Python dependencies
└── README.md

Citation

@article{he2025modular,
  title={On the Mechanism and Dynamics of Modular Addition: Fourier Features, Lottery Ticket, and Grokking},
  author={He, Jianliang and Wang, Leda and Chen, Siyu and Yang, Zhuoran},
  journal={arXiv preprint arXiv:2602.16849},
  year={2025}
}

License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

On the Mechanism and Dynamics of Modular Addition

Fourier Features, Lottery Ticket, and Grokking

Overview

Launch Locally

Deploy to Hugging Face Spaces

Pre-computation Pipeline

Quick Start

Manual Steps

Output

The 5 Training Configurations

Running a Single Experiment

Notebooks

Setup

Requirements

Installation

Project Structure

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
figures		figures
hf_app		hf_app
notebooks		notebooks
precompute		precompute
precomputed_results		precomputed_results
src		src
.gitignore		.gitignore
README.md		README.md
deploy_to_hf.py		deploy_to_hf.py
requirements.txt		requirements.txt
run_experiment.sh		run_experiment.sh

Folders and files

Latest commit

History

Repository files navigation

On the Mechanism and Dynamics of Modular Addition

Fourier Features, Lottery Ticket, and Grokking

Overview

Launch Locally

Deploy to Hugging Face Spaces

Pre-computation Pipeline

Quick Start

Manual Steps

Output

The 5 Training Configurations

Running a Single Experiment

Notebooks

Setup

Requirements

Installation

Project Structure

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages