learn: core transformer

## Learning Resource

Watch this video: [link](https://www.youtube.com/watch?v=kCc8FmEb1nY)

This is the core video for how transformers work. I highly recommend following along on a Google Colab.
The video uses some tricks with broadcasting, which can be a common source of confusion. It may be helpful to review it here [link](https://numpy.org/devdocs/user/basics.broadcasting.html)

By the end of this, you should have replicated the basic Shakespeare model in the video. 
You can also use this opportunity to get used to Jaxtyping. Despite it's name, it also works with Pytorch, and I think it'll help catch many bugs relating to tensor shapes [link](https://docs.kidger.site/jaxtyping/)

---

Some notes on the repo: because I expect most of this project to happen on Jupyter notebooks or Python scripts, I haven't set up too much in terms of project structure. this can change later in the term, but it should be fine for now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

learn: core transformer #1

Learning Resource

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

learn: core transformer #1

Description

Learning Resource

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions