Nested Learning (HOPE) - From Scratch Implementation

A from-scratch implementation of Google Research's Nested Learning paper and the HOPE (Hierarchical Optimizing Processing Ensemble) architecture.

🎯 What is Nested Learning?

Nested Learning treats neural networks as systems of nested optimization problems with different update frequencies - like how the human brain uses different brain waves for different types of memory.

Key Insight

Everything is Associative Memory!

Attention = Associative Memory
Momentum = Associative Memory
Weights = Associative Memory

🏗️ Architecture Overview

HOPE Block

The HOPE block combines three components that cover the full frequency spectrum:

Frequency:  ∞ ←─────────────────────────→ 0
            │                            │
        Attention    CMS modules       MLP
        (recompute)  (various τ)    (static)

Attention - Infinite frequency (recomputes every time)
- Fast, immediate context
- O(n²) standard or O(n) linear attention
CMS (Continuum Memory System) - Spectrum of frequencies
- Multiple timescales of memory (τ from 0.01 to 0.99)
- Fast modules forget quickly, slow modules remember forever
- This is HOPE's special sauce!
MLP - Zero frequency (fixed after training)
- Static knowledge learned during pre-training

📦 Key Components

1. Associative Memory

Core building block that stores key-value pairs and retrieves by similarity.

2. Continuum Memory System (CMS)

Multiple memory modules with different decay rates:

Fast memories (τ ≈ 0.01): React quickly, forget quickly
Slow memories (τ ≈ 0.99): React slowly, remember forever
Medium memories: Various timescales in between

3. HOPE Model

Complete transformer-like architecture with:

Token and position embeddings
Stacked HOPE blocks
Autoregressive generation support

4. Deep Optimizers

Optimizers that treat momentum as associative memory:

Standard momentum = weighted average of past gradients
Deep momentum = neural network processes gradient history

🚀 Installation

Prerequisites

Python 3.8+
uv package manager

Setup

# Clone the repository
git clone <your-repo-url>
cd NL

# Install dependencies
uv sync

💻 Usage

Training

uv run python train.py

Evaluation

uv run python evaluate.py

Testing Individual Components

# Test HOPE block
uv run python src/nested_learning/models/blocks.py

# Test attention mechanisms
uv run python src/nested_learning/models/attention.py

# Test CMS
uv run python src/nested_learning/memory/cms.py

# Test full HOPE model
uv run python src/nested_learning/models/hope.py

Using the Model

from nested_learning.models import HOPE

# Create model
model = HOPE(
    vocab_size=32000,
    dim=512,
    num_layers=12,
    num_heads=8,
    num_memory_modules=5,
    use_cms=True,
)

# Forward pass
import torch
tokens = torch.randint(0, 32000, (2, 128))  # batch=2, seq=128
logits, loss = model(tokens, tokens)

# Generation
prompt = torch.randint(0, 32000, (1, 10))
generated = model.generate(prompt, max_new_tokens=50, temperature=0.8)

📁 Project Structure

NL/
├── src/nested_learning/    # Core implementation
│   ├── memory/             # Associative Memory & CMS
│   │   ├── associative.py  # Basic associative memory
│   │   └── cms.py          # Continuum Memory System
│   ├── models/             # HOPE architecture
│   │   ├── hope.py         # Full HOPE model
│   │   ├── blocks.py       # HOPE block implementation
│   │   ├── attention.py    # Multi-head & linear attention
│   │   └── embeddings.py   # Token & position embeddings
│   ├── optimizers/         # Deep Momentum optimizer
│   │   └── deep_momentum.py
│   ├── data/               # Data loading utilities
│   │   ├── dataset.py       # Dataset classes
│   │   └── tokenizer.py    # Tokenization
│   └── utils/              # Helper functions
│       ├── config.py       # Configuration management
│       └── helpers.py      # Utility functions
├── configs/                # Configuration files
│   └── default.yaml
├── tests/                  # Unit tests
├── train.py               # Training script
├── evaluate.py            # Evaluation script
└── README.md              # This file

🔬 Key Features

✅ From-scratch implementation - No external transformer libraries
✅ Modular design - Each component can be used independently
✅ Comprehensive tests - Each module includes test code
✅ Memory system - CMS with multiple timescales
✅ Flexible attention - Standard O(n²) or linear O(n) attention
✅ Generation support - Autoregressive text generation
✅ Deep optimizers - Learnable momentum as associative memory

📚 Theoretical Background

Nested Learning Paper

This implementation is based on the concept that neural networks can be viewed as nested optimization problems with different update frequencies, similar to how the brain processes information at different timescales.

Brain Analogy

Gamma waves (fast): Sensory input - like attention
Beta waves (medium): Active thinking - like CMS modules
Theta waves (slow): Memory consolidation - like slower CMS modules
Delta waves (slowest): Deep, permanent memory - like MLP weights

🧪 Testing

Run tests for individual components:

# Test memory systems
uv run python tests/test_memory.py

# Test individual modules (each has built-in tests)
uv run python src/nested_learning/models/blocks.py
uv run python src/nested_learning/memory/cms.py

📝 Configuration

Edit configs/default.yaml to customize:

Model architecture (dimensions, layers, heads)
Training hyperparameters
Memory system settings
Optimizer settings

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This is an educational implementation built from scratch for learning purposes.

🙏 Acknowledgments

Google Research's Nested Learning paper
The transformer architecture (Attention is All You Need)
The PyTorch community

📧 Contact

For questions or issues, please open an issue on GitHub.

Note: This is a proof-of-concept implementation for educational purposes. For production use, consider using established libraries like PyTorch's transformer modules or Hugging Face Transformers.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
configs		configs
src/nested_learning		src/nested_learning
tests		tests
.gitignore		.gitignore
README.md		README.md
evaluate.py		evaluate.py
pyproject.toml		pyproject.toml
test_check.py		test_check.py
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

Nested Learning (HOPE) - From Scratch Implementation

🎯 What is Nested Learning?

Key Insight

🏗️ Architecture Overview

HOPE Block

📦 Key Components

1. Associative Memory

2. Continuum Memory System (CMS)

3. HOPE Model

4. Deep Optimizers

🚀 Installation

Prerequisites

Setup

💻 Usage

Training

Evaluation

Testing Individual Components

Using the Model

📁 Project Structure

🔬 Key Features

📚 Theoretical Background

Nested Learning Paper

Brain Analogy

🧪 Testing

📝 Configuration

🤝 Contributing

📄 License

🙏 Acknowledgments

📧 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages