Skip to content

Insert contiguous clone before collectives that require it#376

Draft
fmassa wants to merge 1 commit into
mainfrom
fmassa/contiguous_clone_before_collective
Draft

Insert contiguous clone before collectives that require it#376
fmassa wants to merge 1 commit into
mainfrom
fmassa/contiguous_clone_before_collective

Conversation

@fmassa
Copy link
Copy Markdown
Contributor

@fmassa fmassa commented Mar 20, 2026

NCCL collectives like all_gather_into_tensor and reduce_scatter_tensor require contiguous input tensors, but AP's collective insertion via DTensor redistribute doesn't guarantee this — the input may be non-contiguous after upstream ops like transpose or view.

This adds a graph pass that walks the parallel FX graph and inserts aten.clone(memory_format=contiguous_format) before any such collective whose input isn't already a contiguous clone. The pass runs after cleanup_graph in api.py, so it operates on the final parallel graph. The clone is a no-op when the tensor is already contiguous.

Authored with Claude.

NCCL collectives like all_gather_into_tensor and reduce_scatter_tensor require contiguous input tensors, but AP's collective insertion via DTensor redistribute doesn't guarantee this — the input may be
non-contiguous after upstream ops like transpose or view.

This adds a graph pass that walks the parallel FX graph and inserts aten.clone(memory_format=contiguous_format) before any such collective whose input isn't already a contiguous clone. The pass runs after
cleanup_graph in api.py, so it operates on the final parallel graph. The clone is a no-op when the tensor is already contiguous.

Authored with Claude.
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 20, 2026
@fmassa fmassa marked this pull request as draft March 21, 2026 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant