This project is a minimal GPT-style language model built entirely from scratch in PyTorch. It follows the core architecture of the Transformer (multi-head self-attention + feed-forward blocks + residual connections + layer norm), trained on a text dataset to generate new sequences.
I got the code from a tutorial but to make it feel like I didn't just vibe code this, I added my own explanation + understanding in a .ipynb file which you can check out if you want!
(You can tweak these at the top of the script)
batch_size = 16 → number of sequences per batch
block_size = 32 → maximum context length
n_embed = 64 → embedding dimension
n_head = 4 → number of attention heads
n_layer = 4 → number of transformer blocks
dropout = 0.0 → dropout rate
learning_rate = 1e-3 → optimizer learning rate
max_iters = 5000 → training iterations
Clone the project
git clone https://github.com/Aruniaaa/GPT-from-scratchGo to the project directory
cd GPT-from-scratchInstall dependencies
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu126Run the code
python bigram.pyNOTE - The pip install command to download Pytorch on your machine can be different. To check the compatibility of your device and a Pytorch version, click here.
The Google Colab notebook (GPT-from-scratch.ipynb) contains:
A breakdown of the code
A short explanation of how transformers work
I’m still learning, so my explanations might not be perfect, but feedback and corrections are always welcome!
├── Del-data.txt # dataset
├── GPT-from-scratch.ipynb # code + explanation
├── README.md
└── bigram.py # main training and model script