When generating image tokens, the attention mask can not be correctly generated.
the image has 8 tokens, but the attention mask only increases by 1, causing a torch size error.
When I run task-2 evaluation (text seq to image seq), it has error
ValueError('Attention mask should be of size (1, 1, 1, 380), but is torch.Size([1, 1, 1, 373])')
When generating image tokens, the attention mask can not be correctly generated.
the image has 8 tokens, but the attention mask only increases by 1, causing a torch size error.
When I run task-2 evaluation (text seq to image seq), it has error
ValueError('Attention mask should be of size (1, 1, 1, 380), but is torch.Size([1, 1, 1, 373])')