I Built an LSTM From Scratch

I was curious how LSTMs actually work. So I built one from scratch using only NumPy, then compared it against PyTorch.

Why My LSTM Behaved Differently From PyTorch

When I finally trained my custom LSTM and compared it to PyTorch models, something unexpected happened.

My model was much smoother.

At first I thought this meant it was worse. But the plot told a different story.

My LSTM captured the long-term trend extremely well
PyTorch models followed short-term oscillations more aggressively
My model was stable, calm, and conservative
PyTorch models were reactive and noisy

This forced me to confront a deeper idea:

Models don't just learn from data. They learn from the biases encoded in their implementation.

My small learning rate, summed gradients, momentum, and scalar input all pushed the model toward long-term structure over short-term noise.

What Writing Everything By Hand Taught Me

Writing an LSTM from scratch taught me things no framework tutorial ever did:

Why gradients explode before they explode
Why momentum is stored motion, not magic
Why stability often means high bias, not correctness
Why most deep learning bugs are shape and flow bugs, not math bugs
Why frameworks feel "easy" only because they hide thousands of careful design decisions

What I Found Most Interesting

The most interesting part wasn't that the model worked.

It was realizing that:

Learning rate is not just speed
Smoothness is not just underfitting
Correctness and stability are different goals
Abstraction hides tradeoffs, not complexity

The Results

Model	MSE
PyTorch CNN	0.000900
PyTorch RNN	0.003673
PyTorch LSTM	0.004645
Custom LSTM	0.006117

The Architecture

┌─────────────────────────────────────────────────────────────┐
│                         LSTM Cell                          │
│                                                             │
│   xt ──┬──► [Forget Gate] ──► ft                           │
│        ├──► [Input Gate]  ──► it                           │
│        ├──► [C-tilde]     ──► c̃t  ◄── The secret sauce    │
│        └──► [Output Gate] ──► ot                           │
│                                                             │
│   Cell State:  ct = ft ⊙ ct-1  +  it ⊙ c̃t                 │
│   Hidden State: ht = ot ⊙ tanh(ct)                         │
└─────────────────────────────────────────────────────────────┘

C-Tilde: The Unsung Hero

Everyone talks about the forget gate. But the real elegance is in c̃t — the candidate cell state.

It's a proposal. C-tilde says: "Here's what I think we should remember." But it doesn't write directly to memory — it passes through the input gate first.

The network learns two things separately:

What information to extract (c-tilde)
How much of it to keep (input gate)

Tanh keeps it bounded [-1, 1], centered at zero, with smooth gradients. This lets c-tilde say "add this" or "subtract this" — not just accumulate.

The Bugs That Humbled Me

Bug #1: Initializing gradients with random values instead of zeros. The model was learning from noise.

Bug #2: Computing gradients perfectly... then never applying them. The optimizer step was missing entirely.

Both bugs produced models that "worked" — they ran, produced outputs, even showed decreasing loss. But they weren't learning.

The Gradient Highway

Vanilla RNNs fail because gradients multiply through the same weight matrix at each step. Eigenvalues < 1 = vanishing. Eigenvalues > 1 = exploding.

LSTMs fix this with the cell state highway. Gradients flow through the forget gate, which the network learns to control. When ft ≈ 1, gradients pass through unchanged. The network learns to keep the highway open.

Run It

pip install numpy torch plotly pandas
python testlstm.py

The best way to understand a neural network is to build one that doesn't work, then figure out why.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
LSTM.py		LSTM.py
README.md		README.md
newplot.png		newplot.png
testlstm.py		testlstm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

I Built an LSTM From Scratch

Why My LSTM Behaved Differently From PyTorch

What Writing Everything By Hand Taught Me

What I Found Most Interesting

The Results

The Architecture

C-Tilde: The Unsung Hero

The Bugs That Humbled Me

The Gradient Highway

Run It

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

I Built an LSTM From Scratch

Why My LSTM Behaved Differently From PyTorch

What Writing Everything By Hand Taught Me

What I Found Most Interesting

The Results

The Architecture

C-Tilde: The Unsung Hero

The Bugs That Humbled Me

The Gradient Highway

Run It

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages