Unlocking Out-of-Distribution Generalization in Transformers via Recursive Latent Space Reasoning

Awni Altabaa · Siyu Chen · John Lafferty · Zhuoran Yang

Enhancing Transformer architectures with recursive latent space reasoning mechanisms for robust algorithmic generalization

💡 Abstract

Systematic, compositional generalization beyond the training distribution remains a core challenge in machine learning—and a critical bottleneck for the emergent reasoning abilities of modern language models.

This work investigates out-of-distribution (OOD) generalization in Transformer networks using a GSM8K-style modular arithmetic on computational graphs task as a testbed.

We introduce and explore four architectural mechanisms aimed at enhancing OOD generalization:

🔄 Input-adaptive recurrence - Recurrent architecture that scales computation through input-adaptive recurrence.
📚 Algorithmic supervision - Structured learning objectives that encode algorithmic knowledge
⚓ Anchored latent representations - Discrete bottlenecks for stable feature learning
🔧 Explicit error-correction mechanism - Built-in self-correction capabilities

Collectively, these mechanisms yield an architectural approach for native and scalable latent space reasoning in Transformer networks with robust algorithmic generalization capabilities. We complement these empirical results with a detailed mechanistic interpretability analysis that reveals how these mechanisms give rise to robust OOD generalization abilities.

📂 Organization of Codebase

experiments/ - Contains the main experimental code for training and evaluating models with the proposed architectural mechanisms
experiments/baselines/ - Baseline implementations for comparison including chain-of-thought models and standard transformers
experiments/evaluation/ - Code for evaluating the algorithmic generalization capabilities of different methods.
experiments/model_interp/ - Mechanistic interpretability analysis tools and visualizations
Simtransformer/ - Helper framework implementing Transformer modules and related utilities

See experiments/readme.md for instructions on how to reproduce the experiments in the paper.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Simtransformer		Simtransformer
experiments		experiments
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unlocking Out-of-Distribution Generalization in Transformers via Recursive Latent Space Reasoning

💡 Abstract

📂 Organization of Codebase

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Unlocking Out-of-Distribution Generalization in Transformers via Recursive Latent Space Reasoning

💡 Abstract

📂 Organization of Codebase

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages