Implementing a Decoder-Only GPT Model

This repository contains a custom implementation of a decoder-only GPT model trained on the Tiny Shakespeare dataset, using the MLX library.

Model Architecture

The model is a decoder-only GPT based on the Transformer decoder from Attention Is All You Need.

Embedding & Positional Encoding: Standard token embeddings with sinusoidal positional encodings.
Transformer Blocks: 6 layers of decoder blocks, each with:
- Multi-Head Self-Attention (6 heads, dimension 384) with causal masking
- Feed-Forward Network (hidden dimension 4× model dimension)
- Layer Normalization and residual connections
Output: Linear projection to the vocabulary size for next-token prediction.

Training Settings

Sequence length: 256
Batch size: 32
Dropout: 0.1 throughout the model
Loss function: Cross-Entropy
Optimizer: AdamW with weight decay = 0.1
Learning rate schedule: hybrid schedule similar to OneCycle:
- Linear warmup for 10% of total steps
- Cosine decay to final LR for remaining steps
- Max LR: 3e-4
Total steps: 10,000
The best model is saved based on the lowest validation loss during training.

File Structure

modules/ – core modules for the project:
- dataloader.py – is responsible for creating training batches
- model.py – contains all GPT building blocks and the full model implementation
- tokenizer.py – implements a character-level tokenizer with encoding and decoding
analysis.ipynb – notebook with training and inference analysis
train.py – full training script

Author

Created by Denys Bondarchuk. Feel free to reach out or contribute to the project.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
modules		modules
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
analysis.ipynb		analysis.ipynb
best_model.npz		best_model.npz
history.csv		history.csv
requirements.in		requirements.in
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Implementing a Decoder-Only GPT Model

Model Architecture

Training Settings

File Structure

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Implementing a Decoder-Only GPT Model

Model Architecture

Training Settings

File Structure

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages