🎉 INT8 is now officially supported in ComfyUI 🎉

https://github.com/Comfy-Org/ComfyUI/commit/1a510f04234e5a213d3985a1a54f65652623f4bc

No, I did not help at all with this and had no involvement. My existing quants are likely to not work due to a quant naming missmatch, but silveroxides are likely to work as they were quite involved in the process of making this happen.

Existing INT8 fast quants can be converted to the proper native format via this script https://github.com/BobJohnson24/ComfyUI-INT8-Fast/blob/main/convert_to_comfy.py

python convert_comfy_quant.py I8Fast.safetensors I8Comfy.safetensors
or
python convert_comfy_quant.py I8Fast.safetensors --inplace

I am glad to retire with a Piña colada in my hands, on the beach. Might slim this node down to an exclusively pre-lora focused node in the future, if that does not become a default comfy feature.

Comfy INT8 Acceleration

This node speeds up Flux2, Ideogram4, Chroma, Z-Image, Ernie Image in ComfyUI by using INT8 quantization, delivering between 1.5~2x faster inference on my 3090 depending on the model. It should work on any NVIDIA GPU with enough INT8 TOPS. It appears to be faster than FP8 on 40-Series and above as well. Works with lora, torch compile.

Common GPU related issues:

RTX 20-Series will require you to either use Triton-Windows on windows, triton==3.2.0 or compile triton yourself with SM75 support which was dropped in 3.3.0.

A100 has no possible INT8 Speed-up #71

FAQ:

Q: How do I quantize myself?

A: It is not recommended to quantize the human existence. If you would like to quantize a model, see example_workflows/int8_save_convrot_model.json

Q: What is ConvRot?

A: ConvRot is a variant of QuaRot. It basically rotates model weights and activations to eliminate outliers before quantization. This has some inference overhead, but is generally a large quality boost.

Q: What is Pre-Lora?

A: Pre-Lora is a way to merge the lora weights to a BF16 checkpoint within ComfyUI before you quantize the model. This requires an unquantized base model, and enabling on-the-fly quantization. It is generally a higher quality way to apply a lora.

Q: Torch compile takes forever and I hate it

A: Use the torch compile node from KJ Nodes and ensure you set the disable dynamic VRAM toggle.

Requirements:

Working ComfyKitchen (needs latest comfy and possibly pytorch with cu130)

Triton

Windows untested, but I hear triton-windows exists.

Credits:

dxqb for the entirety of the INT8 code during the very early versions of this node, it would have been impossible without them:

Nerogar/OneTrainer#1034

If you have a 30-Series GPU, OneTrainer is also the fastest current lora trainer thanks to this. Please go check them out!!

newgrit1004 for the base ConvRot code we modified into proper ConvRot

https://github.com/newgrit1004/ComfyUI-ZImage-Triton

silveroxides for providing a base to hack the INT8 conversion code onto.

https://github.com/silveroxides/convert_to_quant

Also silveroxides for showing how to properly register new data types to comfy

https://github.com/silveroxides/ComfyUI-QuantOps

Name		Name	Last commit message	Last commit date
Latest commit History 223 Commits
.github/workflows		.github/workflows
example_workflows		example_workflows
js		js
.gitignore		.gitignore
LICENSE		LICENSE
Metrics.md		Metrics.md
Models.md		Models.md
README.md		README.md
Speed.md		Speed.md
Workflow.png		Workflow.png
__init__.py		__init__.py
convert_to_comfy.py		convert_to_comfy.py
convrot.py		convrot.py
int8_fused_kernel.py		int8_fused_kernel.py
int8_lora.py		int8_lora.py
int8_quant.py		int8_quant.py
int8_save.py		int8_save.py
int8_unet_loader.py		int8_unet_loader.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎉 INT8 is now officially supported in ComfyUI 🎉

Comfy INT8 Acceleration

Common GPU related issues:

FAQ:

Requirements:

Credits:

dxqb for the entirety of the INT8 code during the very early versions of this node, it would have been impossible without them:

newgrit1004 for the base ConvRot code we modified into proper ConvRot

silveroxides for providing a base to hack the INT8 conversion code onto.

Also silveroxides for showing how to properly register new data types to comfy

The unholy trinity of AI slopsters I used to glue all this together over the course of multiple months now

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎉 INT8 is now officially supported in ComfyUI 🎉

Comfy INT8 Acceleration

Common GPU related issues:

FAQ:

Requirements:

Credits:

dxqb for the entirety of the INT8 code during the very early versions of this node, it would have been impossible without them:

newgrit1004 for the base ConvRot code we modified into proper ConvRot

silveroxides for providing a base to hack the INT8 conversion code onto.

Also silveroxides for showing how to properly register new data types to comfy

The unholy trinity of AI slopsters I used to glue all this together over the course of multiple months now

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages