BF16 not supported for `x_pad_token` and `cap_pad_token`

Hello, I have been working on https://github.com/qskousen/ggufy which is a tool to aid in quantization. I've been using this node pack to test in ComfyUI. I have taken some inspiration for this tool from your quantization tools. Thank you for the work you've done here.

I've noticed while working on Lumina2 architecture that BF16, while supported generally by this node pack, does not work for two specific layers: `x_pad_token` and `cap_pad_token`. As a workaround, I am currently forcing these to upcast to F32. If I leave them in BF16, I get this error for both layers: 

```
While copying the parameter named "x_pad_token", whose dimensions in the model are torch. Size([3840]) and whose dimensions in the checkpoint are torch.Size([7680]), an exception occurred : ('The size of tensor a (3840) must match the size of tensor b (7680) at non-singleton dimension 0',)
```

The workaround works, but I am curious why these layers specifically do not support BF16 while other layers do. I don't know a lot about how stable diffusion itself works, and I am not sure how these layers are used during inference. I have noticed that other GGUF node packs don't support BF16 in GGUF at all.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BF16 not supported for `x_pad_token` and `cap_pad_token` #419

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

BF16 not supported for x_pad_token and cap_pad_token #419

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

BF16 not supported for `x_pad_token` and `cap_pad_token` #419