Skip to content

Missing matmul of ShapedAttention and Norm #5

@kimborgen

Description

@kimborgen

Hi, first of excellent paper. Thank you for the research!

I am looking to make a standalone version of the SAS-P model, and when I went through your code, I could not find any place where you matmul the Norm with the output of Shaped Attention, as indicated in Fig 10. Red circle for convenience. I am not a math guru, but I also can't find this connection in the equations. It kinda makes sense since you experiment with fewer "skip" connections. Thus, maybe this is just a bug in the figures?

image

I probably missed something, but it would be amazing if you could clarify! :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions