This repository contains a decoder-only Transformer built from scratch.
The purpose of this repository is to document how I learned and understood the Transformer decoder. Each version here may introduce structural changes,add or remove modules, or explore alternative design choices to understand the decoder more deeply.
All implementations are kept small-scale and transparent, so that tensor shapes, attention mechanisms, and data flow can be inspected clearly.
This repository is not intended for production use. The focus is on understanding, experimentation, and gradual improvement rather than engineering completeness.