This blog post is pretty cool
https://www.towardsdeeplearning.com/andrej-karpathy-just-built-an-entire-gpt-in-243-lines-of-python-7d66cfdfa301
It builds a GPT in 243 lines of python including the autograd engine.
It learns baby names and then creates new ones.
Q: How is this different to a simple transition matrix building probabilities off of word transitions? Token size, how it builds patterns?
I think this could form the foundation of an interesting lecture using python.
This blog post is pretty cool
https://www.towardsdeeplearning.com/andrej-karpathy-just-built-an-entire-gpt-in-243-lines-of-python-7d66cfdfa301
It builds a GPT in 243 lines of python including the
autogradengine.It learns baby names and then creates new ones.
Q: How is this different to a simple transition matrix building probabilities off of word transitions? Token size, how it builds patterns?
I think this could form the foundation of an interesting lecture using python.