Skip to content

fix: restore MultiGPU compatibility on current ComfyUI with basic DynamicVRAM support#175

Merged
pollockjj merged 6 commits intomainfrom
dvram
Mar 6, 2026
Merged

fix: restore MultiGPU compatibility on current ComfyUI with basic DynamicVRAM support#175
pollockjj merged 6 commits intomainfrom
dvram

Conversation

@pollockjj
Copy link
Owner

@pollockjj pollockjj commented Mar 6, 2026

Summary

This PR ships ComfyUI-MultiGPU 2.6.0.

The focus of this release is compatibility and runtime correctness on current ComfyUI:

  • basic DynamicVRAM compatibility for standard MultiGPU runtime paths
  • correct execution on non-primary CUDA devices such as cuda:1
  • compatibility fixes for the advanced checkpoint loader on current ComfyUI
  • restoration of missing helper/compatibility nodes and inputs
  • startup hardening for optional custom-node integrations

This is not a DistTorch + DynamicVRAM integration PR. That work remains separate.

What changed

Runtime and DynamicVRAM compatibility

  • patched MultiGPU runtime execution so standard loader paths correctly honor the real CUDA device at execution time instead of only swapping logical device globals
  • patched comfy.model_management.current_stream(device) usage so stream lookup stays aligned with the requested CUDA device
  • added CUDA current-device guards around MultiGPU wrapper execution and Comfy sampling entrypoints
  • initialized comfy_aimdo on all visible CUDA devices when DynamicVRAM is enabled, instead of assuming only the startup device matters

Practical result: standard MultiGPU workflows that place UNet execution on a non-primary GPU such as cuda:1 now work with DynamicVRAM both on and off.

Checkpoint loader compatibility

  • updated the advanced checkpoint loader patch path to match current ComfyUI expectations
  • accepted and forwarded disable_dynamic
  • selected the correct patcher class for dynamic vs non-dynamic loading
  • updated CLIP construction and related fallback paths to match current ComfyUI behavior

Practical result: the advanced checkpoint loader path is no longer a stale fork of older ComfyUI loader semantics.

Compatibility and helper fixes

  • hardened custom-node detection across configured custom_nodes roots with normalized module-name matching
  • added a CUDA device guard for comfy_kitchen DLPack export compatibility
  • added the torch.cuda.synchronize() before empty_cache() to be consistent with the addition to the Comfy Core routine on Feb 3
  • restored DeviceSelectorMultiGPU
  • restored WanVideo vace_model compatibility by treating it as a backward-compatible alias for extra_model
  • lazily registered WanVideo nodes so optional integrations fail later and more narrowly

Tooling and examples

  • synced lint configuration with current ComfyUI-style Ruff/Pylint settings
  • cleaned the affected Python surface to pass lint and syntax checks
  • added a starter MultiGPU example workflow

User-visible fixes

Validation

Validated in local workflow/harness testing and static checks:

  • standard MultiGPU workflow matrix passes with UNet on cuda:0 and cuda:1
  • standard MultiGPU workflow matrix passes with DynamicVRAM on and off
  • advanced checkpoint loader workflow completes successfully on the current branch
  • ruff check . passes
  • py_compile passes
  • pylint passes

Closes

Closes #167
Closes #169
Closes #172

Supersedes #168

Not in scope

Follow-up

The next major follow-up is DistTorch + DynamicVRAM-on integration, with DistTorch retaining placement control and DynamicVRAM handling compute-side transient residency rather than replacing DistTorch's loading policy wholesale.

Your Name and others added 6 commits March 6, 2026 01:35
- Cleaned up unnecessary whitespace and comments in model_management_mgpu.py, nodes.py, wanvideo.py, and wrappers.py for better code clarity.
- Replaced list comprehensions with direct list conversions in nodes.py for efficiency.
- Updated memory logging format in model_management_mgpu.py to streamline data capture.
- Enhanced device management in wanvideo.py by ensuring consistent device setting and loading.
- Added linting configurations in pyproject.toml to enforce code quality standards.
- Removed unused imports and optimized existing ones across multiple files.
@pollockjj pollockjj marked this pull request as ready for review March 6, 2026 15:07
Copilot AI review requested due to automatic review settings March 6, 2026 15:07
@pollockjj pollockjj self-assigned this Mar 6, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Release 2.6.0 focused on restoring runtime correctness and compatibility with current ComfyUI, especially for non-primary CUDA devices and basic DynamicVRAM scenarios, plus updating loader/integration glue to match current upstream behavior.

Changes:

  • Add CUDA current-device/runtime guards and patch Comfy stream + sampling entrypoints to keep execution aligned with the intended GPU.
  • Refresh advanced checkpoint loader behavior (incl. disable_dynamic) and update several integration/compat nodes (WanVideo alias input, restored device selector, safer optional-node registration).
  • Sync tooling/lint config and add a starter example workflow.

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
__init__.py Core runtime/device-guard additions; patches Comfy stream/sampling; DynamicVRAM multi-device init; hardened optional node registration.
wrappers.py Adds CUDA current-device guarding for standard MultiGPU wrappers (UNet/VAE paths).
checkpoint_multigpu.py Updates patched checkpoint loading path for current ComfyUI loader semantics and forwards disable_dynamic.
distorch_2.py Refactors/updates DisTorch2 patching and allocation logic for current ComfyUI behavior.
device_utils.py Improves device enumeration and cache-clearing behavior (incl. CUDA synchronize before empty_cache).
nodes.py Restores DeviceSelectorMultiGPU; adjusts UNet LP loader call pattern; minor robustness tweaks.
wanvideo.py Adds vace_model backward-compatible alias for extra_model; trims unused imports/formatting.
model_management_mgpu.py Minor cleanup in memory snapshot logging loop.
pyproject.toml Bumps version to 2.6.0 and adds Ruff/Pylint configuration.
ci/run_workflows.py Switches prints to explicit stdout/stderr writers with flush.
ci/summarize_log.py Switches prints to explicit stdout writer with flush.
ci/extract_allocation.py Switches prints to explicit stdout writer with flush.
example_workflows/ComfyUI-starter_multigpu.json Adds starter workflow demonstrating basic MultiGPU usage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines 33 to +35
def patched_load_state_dict_guess_config(sd, output_vae=True, output_clip=True, output_clipvision=False,
embedding_directory=None, output_model=True, model_options={},
te_model_options={}, metadata=None):
te_model_options={}, metadata=None, disable_dynamic=False):
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

patched_load_state_dict_guess_config uses mutable default arguments (model_options={}, te_model_options={}), which can leak state across invocations if ComfyUI mutates these dicts during loading. Use None defaults and create new dicts inside the function (e.g., model_options = model_options or {}) to avoid cross-call contamination.

Copilot uses AI. Check for mistakes.
@pollockjj pollockjj merged commit e64cdf7 into main Mar 6, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants