Skip to content

Fix parameter name for values tensor in run_scaled_dot_product_attention#35

Closed
NiccoloSacchi wants to merge 1 commit intostanford-cs336:mainfrom
NiccoloSacchi:patch-1
Closed

Fix parameter name for values tensor in run_scaled_dot_product_attention#35
NiccoloSacchi wants to merge 1 commit intostanford-cs336:mainfrom
NiccoloSacchi:patch-1

Conversation

@NiccoloSacchi
Copy link
Copy Markdown

@NiccoloSacchi NiccoloSacchi commented Sep 15, 2025

I believe this dimension should be the same in V and K: shouldn't the V matrix contain the value embedding for each key?

I believe this dimension should be the same in V and K: the V matrix must contain the value embedding for each key.
@marcelroed
Copy link
Copy Markdown
Member

Yep, good catch, not sure how that slipped through. Will fix this on our end and update this repo before I close the issue.

Comment thread tests/adapters.py
K: Float[Tensor, " ... keys d_k"],
V: Float[Tensor, " ... values d_v"],
V: Float[Tensor, " ... keys d_v"],
mask: Bool[Tensor, " ... queries keys"] | None = None,
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, shouldn't mask be of size "queries keys" instead of "... queries keys"?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The masks can differ for each batch element, so ... is correct.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I considered that but thought not really necessary (thought either all batch samples attend to everything or all batch samples only attend to past tokens). Anyway, thanks for confirming! :)

@marcelroed
Copy link
Copy Markdown
Member

Fixed in 2026 version (releases on Monday)

@marcelroed marcelroed closed this Mar 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants