-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Learning Resource
Watch this video: link
This is the core video for how transformers work. I highly recommend following along on a Google Colab.
The video uses some tricks with broadcasting, which can be a common source of confusion. It may be helpful to review it here link
By the end of this, you should have replicated the basic Shakespeare model in the video.
You can also use this opportunity to get used to Jaxtyping. Despite it's name, it also works with Pytorch, and I think it'll help catch many bugs relating to tensor shapes link
Some notes on the repo: because I expect most of this project to happen on Jupyter notebooks or Python scripts, I haven't set up too much in terms of project structure. this can change later in the term, but it should be fine for now.