Conversation
- Cleaned up unnecessary whitespace and comments in model_management_mgpu.py, nodes.py, wanvideo.py, and wrappers.py for better code clarity. - Replaced list comprehensions with direct list conversions in nodes.py for efficiency. - Updated memory logging format in model_management_mgpu.py to streamline data capture. - Enhanced device management in wanvideo.py by ensuring consistent device setting and loading. - Added linting configurations in pyproject.toml to enforce code quality standards. - Removed unused imports and optimized existing ones across multiple files.
…ons, fix checkpoint loading.
…rove accessibility
There was a problem hiding this comment.
Pull request overview
Release 2.6.0 focused on restoring runtime correctness and compatibility with current ComfyUI, especially for non-primary CUDA devices and basic DynamicVRAM scenarios, plus updating loader/integration glue to match current upstream behavior.
Changes:
- Add CUDA current-device/runtime guards and patch Comfy stream + sampling entrypoints to keep execution aligned with the intended GPU.
- Refresh advanced checkpoint loader behavior (incl.
disable_dynamic) and update several integration/compat nodes (WanVideo alias input, restored device selector, safer optional-node registration). - Sync tooling/lint config and add a starter example workflow.
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
__init__.py |
Core runtime/device-guard additions; patches Comfy stream/sampling; DynamicVRAM multi-device init; hardened optional node registration. |
wrappers.py |
Adds CUDA current-device guarding for standard MultiGPU wrappers (UNet/VAE paths). |
checkpoint_multigpu.py |
Updates patched checkpoint loading path for current ComfyUI loader semantics and forwards disable_dynamic. |
distorch_2.py |
Refactors/updates DisTorch2 patching and allocation logic for current ComfyUI behavior. |
device_utils.py |
Improves device enumeration and cache-clearing behavior (incl. CUDA synchronize before empty_cache). |
nodes.py |
Restores DeviceSelectorMultiGPU; adjusts UNet LP loader call pattern; minor robustness tweaks. |
wanvideo.py |
Adds vace_model backward-compatible alias for extra_model; trims unused imports/formatting. |
model_management_mgpu.py |
Minor cleanup in memory snapshot logging loop. |
pyproject.toml |
Bumps version to 2.6.0 and adds Ruff/Pylint configuration. |
ci/run_workflows.py |
Switches prints to explicit stdout/stderr writers with flush. |
ci/summarize_log.py |
Switches prints to explicit stdout writer with flush. |
ci/extract_allocation.py |
Switches prints to explicit stdout writer with flush. |
example_workflows/ComfyUI-starter_multigpu.json |
Adds starter workflow demonstrating basic MultiGPU usage. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| def patched_load_state_dict_guess_config(sd, output_vae=True, output_clip=True, output_clipvision=False, | ||
| embedding_directory=None, output_model=True, model_options={}, | ||
| te_model_options={}, metadata=None): | ||
| te_model_options={}, metadata=None, disable_dynamic=False): |
There was a problem hiding this comment.
patched_load_state_dict_guess_config uses mutable default arguments (model_options={}, te_model_options={}), which can leak state across invocations if ComfyUI mutates these dicts during loading. Use None defaults and create new dicts inside the function (e.g., model_options = model_options or {}) to avoid cross-call contamination.
Summary
This PR ships
ComfyUI-MultiGPU2.6.0.The focus of this release is compatibility and runtime correctness on current ComfyUI:
cuda:1This is not a DistTorch + DynamicVRAM integration PR. That work remains separate.
What changed
Runtime and DynamicVRAM compatibility
comfy.model_management.current_stream(device)usage so stream lookup stays aligned with the requested CUDA devicecomfy_aimdoon all visible CUDA devices when DynamicVRAM is enabled, instead of assuming only the startup device mattersPractical result: standard MultiGPU workflows that place UNet execution on a non-primary GPU such as
cuda:1now work with DynamicVRAM both on and off.Checkpoint loader compatibility
disable_dynamicPractical result: the advanced checkpoint loader path is no longer a stale fork of older ComfyUI loader semantics.
Compatibility and helper fixes
custom_nodesroots with normalized module-name matchingcomfy_kitchenDLPack export compatibilitytorch.cuda.synchronize()beforeempty_cache()to be consistent with the addition to the Comfy Core routine on Feb 3DeviceSelectorMultiGPUvace_modelcompatibility by treating it as a backward-compatible alias forextra_modelTooling and examples
User-visible fixes
comfy_kitchencu130 cross-device DLPack export failure reported in On cu130: Can't export tensors on a different CUDA device index. cu128 works in light testing #167vace_modelcompatibility reported in WanVideoModelLoaderMultiGPU doesn't have a vace_model input. #172Validation
Validated in local workflow/harness testing and static checks:
cuda:0andcuda:1ruff check .passespy_compilepassespylintpassesCloses
Closes #167
Closes #169
Closes #172
Supersedes #168
Not in scope
access violation 0xC0000005when… #141Follow-up
The next major follow-up is DistTorch + DynamicVRAM-on integration, with DistTorch retaining placement control and DynamicVRAM handling compute-side transient residency rather than replacing DistTorch's loading policy wholesale.