Fuse chained allgathers on different subgroups into single full-mesh allgather by fmassa · Pull Request #472 · meta-pytorch/autoparallel

fmassa · 2026-05-21T12:30:29Z

When the forward and backward use different shardings for a weight, the backward's recomputed allgather chain can decompose into two sequential allgathers on different mesh dimensions — e.g. S(0)S(0) → RS(0) (dp allgather) then RS(0) → RR (tp allgather) — with cancelling permute pairs between them. While the forward path already fuses S(0)S(0) → RR into a single collective via _optimize_same_nd_sharding_as_1d, these recomputed backward chains bypass that optimization and produce two separate NCCL kernel launches.

This adds fuse_chained_allgathers, a graph pass that detects these chains and replaces them with a single allgather on the flattened mesh process group. The pass validates that both allgathers are on known mesh subgroups in descending dim order, their group sizes multiply to the full mesh size, and the intermediate view ops compose to the identity (verified via FakeTensor shape/stride metadata). The pass runs on the partitioned forward and backward graphs during the first compilation and on the inference path, gated on mesh.ndim > 1.

Authored with Claude.

…allgather When weights are placed as `S(0)S(0)` on a multi-dim mesh, `apply_sharding` decomposes the `S(0)S(0) → RR` redistribution into per-dim allgathers: a dp-dim allgather followed by a tp-dim allgather, with cancelling permute pairs between them. Each pair produces two separate NCCL kernel launches when a single full-mesh allgather would suffice. This adds `fuse_chained_allgathers`, a graph pass that detects these chains and replaces them with a single allgather on the flattened mesh process group. The pass validates that both allgathers are on known mesh subgroups in descending dim order, their group sizes multiply to the full mesh size, and the intermediate view ops compose to the identity (verified via FakeTensor shape/stride metadata). The pass runs on the partitioned forward and backward graphs during the first compilation and on the inference path, gated on `mesh.ndim > 1`. Authored with Claude.

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 21, 2026

fmassa added 2 commits May 21, 2026 12:33

Add missing file

ca8936e

Bugfix

bf4c912

fmassa marked this pull request as draft May 29, 2026 06:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fuse chained allgathers on different subgroups into single full-mesh allgather#472

Fuse chained allgathers on different subgroups into single full-mesh allgather#472
fmassa wants to merge 3 commits into
mainfrom
fmassa/fuse-allgather

fmassa commented May 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fmassa commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fmassa commented May 21, 2026 •

edited

Loading