Skip to content

learn: core transformer #1

@clay-arras

Description

@clay-arras

Learning Resource

Watch this video: link

This is the core video for how transformers work. I highly recommend following along on a Google Colab.
The video uses some tricks with broadcasting, which can be a common source of confusion. It may be helpful to review it here link

By the end of this, you should have replicated the basic Shakespeare model in the video.
You can also use this opportunity to get used to Jaxtyping. Despite it's name, it also works with Pytorch, and I think it'll help catch many bugs relating to tensor shapes link


Some notes on the repo: because I expect most of this project to happen on Jupyter notebooks or Python scripts, I haven't set up too much in terms of project structure. this can change later in the term, but it should be fine for now.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions