fix: restore MultiGPU compatibility on current ComfyUI with basic DynamicVRAM support by pollockjj · Pull Request #175 · pollockjj/ComfyUI-MultiGPU

pollockjj · 2026-03-06T15:07:00Z

Summary

This PR ships ComfyUI-MultiGPU 2.6.0.

The focus of this release is compatibility and runtime correctness on current ComfyUI:

basic DynamicVRAM compatibility for standard MultiGPU runtime paths
correct execution on non-primary CUDA devices such as cuda:1
compatibility fixes for the advanced checkpoint loader on current ComfyUI
restoration of missing helper/compatibility nodes and inputs
startup hardening for optional custom-node integrations

This is not a DistTorch + DynamicVRAM integration PR. That work remains separate.

What changed

Runtime and DynamicVRAM compatibility

patched MultiGPU runtime execution so standard loader paths correctly honor the real CUDA device at execution time instead of only swapping logical device globals
patched comfy.model_management.current_stream(device) usage so stream lookup stays aligned with the requested CUDA device
added CUDA current-device guards around MultiGPU wrapper execution and Comfy sampling entrypoints
initialized comfy_aimdo on all visible CUDA devices when DynamicVRAM is enabled, instead of assuming only the startup device matters

Practical result: standard MultiGPU workflows that place UNet execution on a non-primary GPU such as cuda:1 now work with DynamicVRAM both on and off.

Checkpoint loader compatibility

updated the advanced checkpoint loader patch path to match current ComfyUI expectations
accepted and forwarded disable_dynamic
selected the correct patcher class for dynamic vs non-dynamic loading
updated CLIP construction and related fallback paths to match current ComfyUI behavior

Practical result: the advanced checkpoint loader path is no longer a stale fork of older ComfyUI loader semantics.

Compatibility and helper fixes

hardened custom-node detection across configured custom_nodes roots with normalized module-name matching
added a CUDA device guard for comfy_kitchen DLPack export compatibility
added the torch.cuda.synchronize() before empty_cache() to be consistent with the addition to the Comfy Core routine on Feb 3
restored DeviceSelectorMultiGPU
restored WanVideo vace_model compatibility by treating it as a backward-compatible alias for extra_model
lazily registered WanVideo nodes so optional integrations fail later and more narrowly

Tooling and examples

synced lint configuration with current ComfyUI-style Ruff/Pylint settings
cleaned the affected Python surface to pass lint and syntax checks
added a starter MultiGPU example workflow

User-visible fixes

fixes non-primary CUDA execution issues in standard MultiGPU runtime paths
fixes the advanced checkpoint loader against current ComfyUI loader semantics
fixes the comfy_kitchen cu130 cross-device DLPack export failure reported in On cu130: Can't export tensors on a different CUDA device index. cu128 works in light testing #167
restores the missing compute-device selector reported in Can't find node to add a 'Computer Device' #169
restores WanVideo vace_model compatibility reported in WanVideoModelLoaderMultiGPU doesn't have a vace_model input. #172

Validation

Validated in local workflow/harness testing and static checks:

standard MultiGPU workflow matrix passes with UNet on cuda:0 and cuda:1
standard MultiGPU workflow matrix passes with DynamicVRAM on and off
advanced checkpoint loader workflow completes successfully on the current branch
ruff check . passes
py_compile passes
pylint passes

Closes

Closes #167
Closes #169
Closes #172

Supersedes #168

Not in scope

DistTorch + DynamicVRAM hybrid behavior
broader DistTorch rework
Runpod RAM exhaustion / model load strategy issues in Not working on Runpod machines? #170
Windows HVCI / mmap checkpoint work in Windows 11 crashes with access violation 0xC0000005 when… #141
LTXV2 additions in feat: new ltxv2 nodes for multi gpu and cpu support #164

Follow-up

The next major follow-up is DistTorch + DynamicVRAM-on integration, with DistTorch retaining placement control and DynamicVRAM handling compute-side transient residency rather than replacing DistTorch's loading policy wholesale.

- Cleaned up unnecessary whitespace and comments in model_management_mgpu.py, nodes.py, wanvideo.py, and wrappers.py for better code clarity. - Replaced list comprehensions with direct list conversions in nodes.py for efficiency. - Updated memory logging format in model_management_mgpu.py to streamline data capture. - Enhanced device management in wanvideo.py by ensuring consistent device setting and loading. - Added linting configurations in pyproject.toml to enforce code quality standards. - Removed unused imports and optimized existing ones across multiple files.

…ons, fix checkpoint loading.

…ibility fixes

…rove accessibility

Copilot

Pull request overview

Release 2.6.0 focused on restoring runtime correctness and compatibility with current ComfyUI, especially for non-primary CUDA devices and basic DynamicVRAM scenarios, plus updating loader/integration glue to match current upstream behavior.

Changes:

Add CUDA current-device/runtime guards and patch Comfy stream + sampling entrypoints to keep execution aligned with the intended GPU.
Refresh advanced checkpoint loader behavior (incl. disable_dynamic) and update several integration/compat nodes (WanVideo alias input, restored device selector, safer optional-node registration).
Sync tooling/lint config and add a starter example workflow.

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`__init__.py`	Core runtime/device-guard additions; patches Comfy stream/sampling; DynamicVRAM multi-device init; hardened optional node registration.
`wrappers.py`	Adds CUDA current-device guarding for standard MultiGPU wrappers (UNet/VAE paths).
`checkpoint_multigpu.py`	Updates patched checkpoint loading path for current ComfyUI loader semantics and forwards `disable_dynamic`.
`distorch_2.py`	Refactors/updates DisTorch2 patching and allocation logic for current ComfyUI behavior.
`device_utils.py`	Improves device enumeration and cache-clearing behavior (incl. CUDA synchronize before empty_cache).
`nodes.py`	Restores `DeviceSelectorMultiGPU`; adjusts UNet LP loader call pattern; minor robustness tweaks.
`wanvideo.py`	Adds `vace_model` backward-compatible alias for `extra_model`; trims unused imports/formatting.
`model_management_mgpu.py`	Minor cleanup in memory snapshot logging loop.
`pyproject.toml`	Bumps version to `2.6.0` and adds Ruff/Pylint configuration.
`ci/run_workflows.py`	Switches prints to explicit stdout/stderr writers with flush.
`ci/summarize_log.py`	Switches prints to explicit stdout writer with flush.
`ci/extract_allocation.py`	Switches prints to explicit stdout writer with flush.
`example_workflows/ComfyUI-starter_multigpu.json`	Adds starter workflow demonstrating basic MultiGPU usage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-06T15:13:07Z

checkpoint_multigpu.py

 def patched_load_state_dict_guess_config(sd, output_vae=True, output_clip=True, output_clipvision=False,
                                        embedding_directory=None, output_model=True, model_options={},
-                                        te_model_options={}, metadata=None):
+                                        te_model_options={}, metadata=None, disable_dynamic=False):


patched_load_state_dict_guess_config uses mutable default arguments (model_options={}, te_model_options={}), which can leak state across invocations if ComfyUI mutates these dicts during loading. Use None defaults and create new dicts inside the function (e.g., model_options = model_options or {}) to avoid cross-call contamination.

Your Name and others added 6 commits March 6, 2026 01:35

fix: harden startup compat and restore low-risk MultiGPU helpers

7af256a

feat: enhance MultiGPU support with device guards and runtime management

24e4e17

feat: add starter workflow for MultiGPU setup with detailed instructi…

2aaa033

…ons, fix checkpoint loading.

Set version to 2.6.0 = basic DynamicVRAM compatibility and bug/compat…

36e5c9e

…ibility fixes

fix: change file permissions for ComfyUI-starter_multigpu.json to imp…

82e6931

…rove accessibility

pollockjj marked this pull request as ready for review March 6, 2026 15:07

Copilot AI review requested due to automatic review settings March 6, 2026 15:07

pollockjj self-assigned this Mar 6, 2026

Copilot started reviewing on behalf of pollockjj March 6, 2026 15:08 View session

Copilot AI reviewed Mar 6, 2026

View reviewed changes

pollockjj merged commit e64cdf7 into main Mar 6, 2026
4 checks passed

pollockjj mentioned this pull request Mar 6, 2026

fix CUDA device index issue on cu130 #168

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: restore MultiGPU compatibility on current ComfyUI with basic DynamicVRAM support#175

fix: restore MultiGPU compatibility on current ComfyUI with basic DynamicVRAM support#175
pollockjj merged 6 commits intomainfrom
dvram

pollockjj commented Mar 6, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pollockjj commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Runtime and DynamicVRAM compatibility

Checkpoint loader compatibility

Compatibility and helper fixes

Tooling and examples

User-visible fixes

Validation

Closes

Not in scope

Follow-up

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pollockjj commented Mar 6, 2026 •

edited

Loading