-
Notifications
You must be signed in to change notification settings - Fork 25
Open
Description
Hi, first of excellent paper. Thank you for the research!
I am looking to make a standalone version of the SAS-P model, and when I went through your code, I could not find any place where you matmul the Norm with the output of Shaped Attention, as indicated in Fig 10. Red circle for convenience. I am not a math guru, but I also can't find this connection in the equations. It kinda makes sense since you experiment with fewer "skip" connections. Thus, maybe this is just a bug in the figures?
I probably missed something, but it would be amazing if you could clarify! :)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
