Skip to content

AstralKS/LSTM_FROM_SCRATCH

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

I Built an LSTM From Scratch

I was curious how LSTMs actually work. So I built one from scratch using only NumPy, then compared it against PyTorch.

Model Comparison


Why My LSTM Behaved Differently From PyTorch

When I finally trained my custom LSTM and compared it to PyTorch models, something unexpected happened.

My model was much smoother.

At first I thought this meant it was worse. But the plot told a different story.

  • My LSTM captured the long-term trend extremely well
  • PyTorch models followed short-term oscillations more aggressively
  • My model was stable, calm, and conservative
  • PyTorch models were reactive and noisy

This forced me to confront a deeper idea:

Models don't just learn from data. They learn from the biases encoded in their implementation.

My small learning rate, summed gradients, momentum, and scalar input all pushed the model toward long-term structure over short-term noise.


What Writing Everything By Hand Taught Me

Writing an LSTM from scratch taught me things no framework tutorial ever did:

  • Why gradients explode before they explode
  • Why momentum is stored motion, not magic
  • Why stability often means high bias, not correctness
  • Why most deep learning bugs are shape and flow bugs, not math bugs
  • Why frameworks feel "easy" only because they hide thousands of careful design decisions

What I Found Most Interesting

The most interesting part wasn't that the model worked.

It was realizing that:

  • Learning rate is not just speed
  • Smoothness is not just underfitting
  • Correctness and stability are different goals
  • Abstraction hides tradeoffs, not complexity

The Results

Model MSE
PyTorch CNN 0.000900
PyTorch RNN 0.003673
PyTorch LSTM 0.004645
Custom LSTM 0.006117

The Architecture

┌─────────────────────────────────────────────────────────────┐
│                         LSTM Cell                          │
│                                                             │
│   xt ──┬──► [Forget Gate] ──► ft                           │
│        ├──► [Input Gate]  ──► it                           │
│        ├──► [C-tilde]     ──► c̃t  ◄── The secret sauce    │
│        └──► [Output Gate] ──► ot                           │
│                                                             │
│   Cell State:  ct = ft ⊙ ct-1  +  it ⊙ c̃t                 │
│   Hidden State: ht = ot ⊙ tanh(ct)                         │
└─────────────────────────────────────────────────────────────┘

C-Tilde: The Unsung Hero

Everyone talks about the forget gate. But the real elegance is in c̃t — the candidate cell state.

It's a proposal. C-tilde says: "Here's what I think we should remember." But it doesn't write directly to memory — it passes through the input gate first.

The network learns two things separately:

  1. What information to extract (c-tilde)
  2. How much of it to keep (input gate)

Tanh keeps it bounded [-1, 1], centered at zero, with smooth gradients. This lets c-tilde say "add this" or "subtract this" — not just accumulate.


The Bugs That Humbled Me

Bug #1: Initializing gradients with random values instead of zeros. The model was learning from noise.

Bug #2: Computing gradients perfectly... then never applying them. The optimizer step was missing entirely.

Both bugs produced models that "worked" — they ran, produced outputs, even showed decreasing loss. But they weren't learning.


The Gradient Highway

Vanilla RNNs fail because gradients multiply through the same weight matrix at each step. Eigenvalues < 1 = vanishing. Eigenvalues > 1 = exploding.

LSTMs fix this with the cell state highway. Gradients flow through the forget gate, which the network learns to control. When ft ≈ 1, gradients pass through unchanged. The network learns to keep the highway open.


Run It

pip install numpy torch plotly pandas
python testlstm.py

The best way to understand a neural network is to build one that doesn't work, then figure out why.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages