Skip to content

Fix: Add memory precheck before VAE decode to prevent crash#12109

Open
tvukovic-amd wants to merge 3 commits intoComfy-Org:masterfrom
tvukovic-amd:fix/vae-decode-memory-precheck
Open

Fix: Add memory precheck before VAE decode to prevent crash#12109
tvukovic-amd wants to merge 3 commits intoComfy-Org:masterfrom
tvukovic-amd:fix/vae-decode-memory-precheck

Conversation

@tvukovic-amd
Copy link

Adding memory precheck before VAE decode to prevent Windows 0xC0000005 access violation crashes, particularly on devices with limited VRAM.

Problem

VAE decode loading could trigger 0xC0000005 (access violation) crashes when:

  • GPU memory was insufficient for full decode
  • Models couldn't be offloaded to CPU (due to --highvram, --gpu-only, or insufficient CPU RAM)

The existing OOM exception handling couldn't catch these crashes because they occur at the driver/system level before PyTorch can raise an exception.

Solution

Added a proactive memory check (use_tiled_vae_decode()) that evaluates memory conditions before attempting decode:

  1. Check if GPU has enough space for full decode (with reserves)
  2. Check if models can be offloaded to CPU (respects --highvram, --gpu-only flags)
  3. Check if CPU has enough space to receive offloaded models (respects --disable-smart-memory)

If any condition fails, switch to tiled VAE decode preemptively.

@rattus128
Copy link
Contributor

We have made a lot of VAE VRAM fixes recently so pre-emptively tiling is going to false positive without a fairly major audit of the VAE estimates. Some of them (like LTX) have non-trivial non-linear VRAM consumption patterns.

The comfy-aimdo project is trying to take things the other way and control the allocator under VRAM pressure to get us away from the maintainence of accurate model estimates.

#11845
https://pypi.org/project/comfy-aimdo/

No AMD support yet though.

Is there a path forward on pytorch being able to allocate with clean exception on OOM?

Which VAEs are the worst offenders and how much VRAM are you generally trying to support?

@MeiYi-dev
Copy link

MeiYi-dev commented Jan 29, 2026

We have made a lot of VAE VRAM fixes recently so pre-emptively tiling is going to false positive without a fairly major audit of the VAE estimates. Some of them (like LTX) have non-trivial non-linear VRAM consumption patterns.

The comfy-aimdo project is trying to take things the other way and control the allocator under VRAM pressure to get us away from the maintainence of accurate model estimates.

#11845 https://pypi.org/project/comfy-aimdo/

No AMD support yet though.

Is there a path forward on pytorch being able to allocate with clean exception on OOM?

Which VAEs are the worst offenders and how much VRAM are you generally trying to support?

Not the OP but, the most VRAM is consumed by the video VAEs, all the new image VAEs are very efficient by default. On another note, since you recently reduced the LTX 2 VAE VRAM consumption by 1/3 , ComfyUI still offloads the whole model before decoding with the LTX 2 VAE (likely because the VAE memory estimation wasn't changed), this is critical for low 16gb VRAM users since the model and TEs cannot be kept within 32GB RAM and the model spills onto pagefile. A custom node that sets the amount of model to offload before decoding will be so nice to have since LTX 2 VAE literally takes only 3GB VRAM when decoding with tiled decoding, but ComfyUI still offloads the whole model into RAM and pagefile.

#NOTE: We don't know what tensors were allocated to stack variables at the time of the
#exception and the exception itself refs them all until we get out of this except block.
#So we just set a flag for tiler fallback so that tensor gc can happen once the
#exception is fully off the books.
Copy link
Contributor

@asagi4 asagi4 Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to set do_tile = True here to actually do the tiled VAE retry.

I think this patch would be fairly helpful on AMD especially. Some VAE VRAM estimates with AMD seem to be kind of bonkers; the Flux VAE requests 11.6GB of VRAM to decode a 1 megapixel image and somehow I don't think it actually uses anywhere near that much.

EDIT: I just did a quick memory dump after a VAE decode. Torch maximum memory usage was about 6.6GB, and that would probably include the loaded VAE model and anything else that might be in VRAM. I'm not sure how to accurately tell what the actual VAE decoding used, but clearly not 11.6GB

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, do_tile should be set to True in this case. I pushed new commit with appropriate changes.

@0xDELUXA
Copy link
Contributor

0xDELUXA commented Feb 6, 2026

Has anyone tried enabling PyTorch/SDPA attention in VAE by changing this line in comfy/model_management.py:

def pytorch_attention_enabled_vae():
    if is_amd():
        return False  # enabling pytorch attention on AMD currently causes crash when doing high res
    return pytorch_attention_enabled()

to return True, for example?
A lot of time has passed since ComfyUI introduced this AMD-specific exception.

@0xDELUXA
Copy link
Contributor

0xDELUXA commented Feb 6, 2026

We have made a lot of VAE VRAM fixes recently so pre-emptively tiling is going to false positive without a fairly major audit of the VAE estimates. Some of them (like LTX) have non-trivial non-linear VRAM consumption patterns.

The comfy-aimdo project is trying to take things the other way and control the allocator under VRAM pressure to get us away from the maintainence of accurate model estimates.

#11845 https://pypi.org/project/comfy-aimdo/

No AMD support yet though.

Hopefully, AMD support will be available in the future as well.

Nvm, the second-ever PR there is already about it: Comfy-Org/comfy-aimdo#2 XD

@tvukovic-amd
Copy link
Author

Has anyone tried enabling PyTorch/SDPA attention in VAE by changing this line in comfy/model_management.py:

def pytorch_attention_enabled_vae():
    if is_amd():
        return False  # enabling pytorch attention on AMD currently causes crash when doing high res
    return pytorch_attention_enabled()

to return True, for example? A lot of time has passed since ComfyUI introduced this AMD-specific exception.

Tried with this change but VAE decoder still causes 0xC0000005 access violation crash.

@0xDELUXA
Copy link
Contributor

0xDELUXA commented Feb 9, 2026

Tried with this change but VAE decoder still causes 0xC0000005 access violation crash.

I see. But then what's the point of using split attention in VAE automatically on all AMD gpus? For me, PyTorch attention is faster / better, and it's the default on NVIDIA as well. I mean, on all gpus except AMD.

@tvukovic-amd
Copy link
Author

tvukovic-amd commented Feb 9, 2026

Tried with this change but VAE decoder still causes 0xC0000005 access violation crash.

I see. But then what's the point of using split attention in VAE automatically on all AMD gpus? For me, PyTorch attention is faster / better, and it's the default on NVIDIA as well. I mean, on all gpus except AMD.

VAE decoder uses the same amount of memory if pytorch_attention_enabled_vae is set to True except for the internal attention matrix computation which is a small portion of the total pipeline.
The problem which this PR fixes is related to VAE decoder load (which also happens when pytorch_attention_enabled_vae is set to True).

If we use changes from this PR for memory precheck (to try to switch to tiled vae decoder if there is not enough memory) and enable PyTorch/SDPA attention in VAE with

def pytorch_attention_enabled_vae():
    if is_amd():
        return True # enabling pytorch attention on AMD currently causes crash when doing high res
    return pytorch_attention_enabled()

models execute correctly.

@lostdisc
Copy link

Has anyone tried enabling PyTorch/SDPA attention in VAE by changing this line in comfy/model_management.py:

def pytorch_attention_enabled_vae():
    if is_amd():
        return False  # enabling pytorch attention on AMD currently causes crash when doing high res
    return pytorch_attention_enabled()

to return True, for example? A lot of time has passed since ComfyUI introduced this AMD-specific exception.

I see. But then what's the point of using split attention in VAE automatically on all AMD gpus? For me, PyTorch attention is faster / better, and it's the default on NVIDIA as well. I mean, on all gpus except AMD.

On my end, with MIOpen/cudnn off, Pytorch attention uses a little more peak VRAM than split attention during VAE decode. Both take mere seconds as long as they fit in memory, but exceeding memory freezes Windows. (ComfyUI does not detect the danger or auto-fallback to tiled decode; I suspect it might be double-counting shared memory on my iGPU system.) 1600x1280 is sort of on the knife's edge for me where it usually works, but any funny business can put it over the top, so I've stayed with split attention for VAE decode. But for sampler steps, Pytorch attention is appreciably faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants