A chaotic, evolutionary, dependency-free GPT built from scratch.
AtomGPT is an educational and experimental project that implements a Generative Pre-trained Transformer (GPT) entirely in Python standard library. No PyTorch, no NumPy, no TensorFlow. Just pure Python logic, from the autograd engine to the transformer blocks.
Beyond a simple implementation, AtomGPT introduces an Evolutionary Forge (forge.py), where models not only learn from data but also evolve their architecture over time—growing layers, adding heads, and pruning weights to survive.
- Zero Dependencies: Runs on pure Python. If you have Python 3, you can run AtomGPT.
- Custom Autograd: A transparent backpropagation engine (
Value) built from the ground up. - Evolutionary Training: Models compete in a population. The fittest survive, clone, and mutate (add/remove layers, heads, etc.).
- Educational Core:
microgpt.pycontains the entire logic in a single file for easy study.
git clone https://github.com/pronzzz/atomgpt.git
cd atomgptOptional: Install graphviz if you want to visualize the computation graph (used in atomgpt/visualizer.py), but it is not required for the core model.
pip install graphvizWatch models evolve and generate fantasy names in real-time.
python3 forge.pyThis script will:
- Initialize a population of small GPT models.
- Train them on a dataset of fantasy names.
- Evolve the population (Select -> Clone -> Mutate).
- Generate new names periodically.
- Save the best names to
generated_names.txt.
If you want to study the bare-metal implementation:
python3 microgpt.pyThis script downloads a dataset (if missing), trains a model, and prints generated samples to the console.
Here is a step-by-step walkthrough of what happens when you run AtomGPT:
When forge.py starts, it initializes a Population of random GPT models. Each model starts small (e.g., 1 layer, 16 embedding dim) to survive the harsh environment of random initialization.
In every step, the models are fed a name (e.g., "Drakon").
- Tokenization: "Drakon" is broken down into characters.
- Forward Pass: The characters flow through the user-defined
GPTarchitecture.- Embeddings lookup.
- Attention mechanisms weigh relationships between characters.
- MLPs process the information.
- Loss Calculation: The model predicts the next character. We calculate the negative log-likelihood loss.
- Backward Pass: The custom autograd engine traces the graph backwards, calculating gradients for every weight.
- Update: An Adam-inspired optimizer tweaks the weights to reduce error.
After a set number of steps (a generation), the forge pauses to judge the models.
- Evaluation: Models are scored based on their Loss (how well they predict) and Efficiency (parameter count).
- Culling: The bottom 50% of models are deleted.
- Reproduction: The survivors are cloned to refill the population.
- Mutation: The clones undergo random mutations:
- Growth: "I need more power!" -> Adds a layer or attention head.
- Efficiency: "I am too heavy." -> Prunes small weights or shrinks embedding dimension.
- Chaos: Randomly sparsifies a dense layer.
Finally, the champion model is used to hallucinate new names. It samples character by character, following the statistical patterns it learned (and evolved to process efficiently).
- Implement more complex mutation operators (e.g., skip connection rewiring).
- Add saving/loading of model "species" (checkpoints).
- visualization of the evolutionary tree.
MIT License. See LICENSE for details.