Skip to content

BF16 not supported for x_pad_token and cap_pad_token #419

@qskousen

Description

@qskousen

Hello, I have been working on https://github.com/qskousen/ggufy which is a tool to aid in quantization. I've been using this node pack to test in ComfyUI. I have taken some inspiration for this tool from your quantization tools. Thank you for the work you've done here.

I've noticed while working on Lumina2 architecture that BF16, while supported generally by this node pack, does not work for two specific layers: x_pad_token and cap_pad_token. As a workaround, I am currently forcing these to upcast to F32. If I leave them in BF16, I get this error for both layers:

While copying the parameter named "x_pad_token", whose dimensions in the model are torch. Size([3840]) and whose dimensions in the checkpoint are torch.Size([7680]), an exception occurred : ('The size of tensor a (3840) must match the size of tensor b (7680) at non-singleton dimension 0',)

The workaround works, but I am curious why these layers specifically do not support BF16 while other layers do. I don't know a lot about how stable diffusion itself works, and I am not sure how these layers are used during inference. I have noticed that other GGUF node packs don't support BF16 in GGUF at all.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions