Open
Conversation
it was a good idea, but it breaks on all kinds of models e.g. "Qwen/Qwen3-4B-Instruct-2507" and "zai-org/GLM-4.1V-9B-Thinking" and I can't work out how to fix it easily
Contributor
|
Am I correct if I think that this code is only relevant if we use batch generation? |
Contributor
Author
|
Perhaps, in truth, I haven't pinned down exactly the cases where this happens, or the best solution. I think some models might add position_ids even with batch_size==1, but I'm not sure |
Contributor
Author
|
Another approach here would be to take the attention mask from the inputs if present (rather than working it out from the position_ids if present) but this also leads to shape errors in some models. I'm not even sure we need to mask the padding tokens if an attention mask is provided, so perhaps this section can be safely removed. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
it was a good idea to mask here, but it breaks on all kinds of models e.g. "Qwen/Qwen3-4B-Instruct-2507" and "zai-org/GLM-4.1V-9B-Thinking" and I can't work out how to fix it easily (even using the attention mask is complicated as some models reshape the hidden state and so on). It might be worth disabling it.