[Feat] Add channels-last layout optimization pass for conv-heavy models by themistbeforedawn · Pull Request #36 · SandAI-org/MagiCompiler

themistbeforedawn · 2026-06-22T03:56:50Z

🗂️ PR Category

📝 Description

Motivation

cuDNN's channels-last (NHWC/NDHWC) conv kernels are much faster than contiguous
NC(D)HW on Ampere+, but activations are stored contiguous by default, so cuDNN
pays an internal NCHW⇄NHWC conversion around every conv. Inductor only hoists
this for conv2d (ndim == 2 / the 4D len(...) == 4 gate); 5D conv3d has
no native channels-last path, so conv3d-dense models (e.g. VAE decode) keep
paying that per-conv cost.

What this PR adds

ConvChannelsLastPass: an opt-in post-grad ATen pass that brings channels-last
to conv3d (and conv2d) by graph rewriting only — no patching of PyTorch
internals. It sets layout_optimization=False and owns layout itself:

Inserts aten.clone(memory_format=channels_last(_3d)) on each conv
input/weight and marks the clone's meta["val"] channels-last. The clone
lowering ignores memory_format (a TODO in lowering.py), so the signal
lives purely in the FX meta strides — which constrain_conv_to_fx_strides
then reads to pin the conv channels-last, so cuDNN skips its internal
conversions.
The clone lowers to a FlexibleLayout Pointwise, so the stride freeze is
zero-cost: the buffer is allocated channels-last directly and fuses into the
neighboring elementwise kernel (silu/groupnorm) instead of adding a copy.
Shared inputs/weights convert once (clone_cache); the conversion is hoisted
through constant_pad_nd to fuse with the upstream producer.

Gating

pass_config.enable_conv_channels_last is binary:

True (default): Registered; its internal heuristics decide at runtime whether to apply (fires only on static, conv-heavy graphs).
False: Off (not registered at all).

Performance (WAN 2.2 VAE decode, 540p, static)

torch.compile	magi_compile	Speedup
520 ms	430 ms	~1.2x

Tests

Logic (test_conv_channels_last_switch.py): Verifies the corrected binary gating (static conv-heavy rewrites; dynamic or conv-sparse skips) and configuration registration. Refactored to leverage shared conftest fixtures and clean out fragile, pass-unrelated integration logic.
Perf (test_conv_channels_last_perf.py): Uses static, conv-heavy VAE-decode-like workload; achieves 1.22x speedup over vanilla torch.compile (asserts >= 1.20x). Leverages centralized VAEDecoderLike and scoped config_patch to prevent baseline config leakage.

jiahy0825

LGTM

themistbeforedawn requested a review from jiahy0825 June 22, 2026 04:02

jiahy0825 reviewed Jun 22, 2026

View reviewed changes

Comment thread magi_compiler/passes/piecewise_graph/post_grad_pass_manager.py Outdated

jiahy0825 reviewed Jun 22, 2026

View reviewed changes

Comment thread tests/perf_tests/test_conv_channels_last_perf.py Outdated

Comment thread magi_compiler/passes/piecewise_graph/conv_channels_last.py Outdated

Comment thread magi_compiler/passes/piecewise_graph/conv_channels_last.py

themistbeforedawn added 2 commits June 23, 2026 16:54

[Feat] Add channels-last layout optimization pass for conv-heavy models

2578491

[chores] update docs

9a1e5c8

themistbeforedawn force-pushed the feat/conv-channels-last-optimization-pass branch from 8cb57dd to 9a1e5c8 Compare June 23, 2026 08:58

[Refactor] refactor codes

e0d3b49

jiahy0825 approved these changes Jun 23, 2026

View reviewed changes

jiahy0825 merged commit c497615 into SandAI-org:main Jun 23, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feat] Add channels-last layout optimization pass for conv-heavy models#36

[Feat] Add channels-last layout optimization pass for conv-heavy models#36
jiahy0825 merged 3 commits into
SandAI-org:mainfrom
themistbeforedawn:feat/conv-channels-last-optimization-pass

themistbeforedawn commented Jun 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jiahy0825 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

themistbeforedawn commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🗂️ PR Category

📝 Description

Motivation

What this PR adds

Gating

Performance (WAN 2.2 VAE decode, 540p, static)

Tests

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jiahy0825 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

themistbeforedawn commented Jun 22, 2026 •

edited

Loading