Vision Transformer on the MNIST Dataset

A custom implementation of Vision Transformer (ViT) for digit classification, extended to encoder-decoder modeling for digit sequence recognition.

🚀 Overview

This project reimplements the Vision Transformer architecture from the original ViT paper, trained on the MNIST dataset for handwritten digit recognition. In addition to a standard ViT encoder, the model is extended into an encoder-decoder architecture inspired by the Transformer model from Attention Is All You Need. The encoder-decoder model processes a grid of digit images to predict sequences. Both implementations are coded from scratch (no AI - you will have to trust me) following the original papers.

🔧 Architecture Notes

Encoder: Based on the original ViT paper, using pre-layer norm with skip connections.
Decoder: Follows post-layer norm as in the original Transformer paper.
This repo mixes both approaches for the encoder-decoder. Feel free to change.

Training

Create the val/test data:

python models/utils.py

To train the models:

# For standard ViT encoder on MNIST
python models/vit_enc.py

# For encoder-decoder on digit grids
python models/vit_enc_dec.py

Streamlit

To run the streamlit app:

streamlit run app.py

Alternatively, the Dockerfile is standalone:

docker build -t vit-mnist-app .
docker run -p 8501:8501 vit-mnist-app

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
MNIST_dataset		MNIST_dataset
models		models
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision Transformer on the MNIST Dataset

🚀 Overview

🔧 Architecture Notes

Training

Streamlit

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vision Transformer on the MNIST Dataset

🚀 Overview

🔧 Architecture Notes

Training

Streamlit

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages