Skip to content

Move FSDP recompute tagging to the placement compile path#443

Merged
sanketpurandare merged 1 commit into
mainfrom
sanketpurandare/stack/5
May 8, 2026
Merged

Move FSDP recompute tagging to the placement compile path#443
sanketpurandare merged 1 commit into
mainfrom
sanketpurandare/stack/5

Conversation

@sanketpurandare

@sanketpurandare sanketpurandare commented May 4, 2026

Copy link
Copy Markdown
Contributor

Stacked PRs:


Move FSDP recompute tagging to the placement compile path

This moves mark_fsdp_all_gather_recomputation out of apply_placement_common and into apply_placement, after the sharded graph has been cleaned up, traced, converted from view to reshape, functionalized for fresh index_put mutations, written back to joint descriptors, and prepared for AOT compilation. The common placement helper now only builds and normalizes the parallel graph, while the training compile path applies the FSDP all-gather recomputation tags immediately before invoking aot_compile_joint_with_descriptors.

Keeping the tag insertion at the apply_placement boundary makes the graph mutation order explicit: graph rewrites that affect structure happen first, descriptor state is refreshed, wait_tensor DCE behavior is installed, and then recompute metadata is added to the graph that the joint compiler consumes. This avoids mixing placement graph construction with compile-time recompute metadata and keeps the common helper usable for future placement flows that should not eagerly stamp FSDP recompute tags.

The compile backend behavior is otherwise unchanged, but the Inductor overlap-scheduling patch set is now centralized in _INDUCTOR_OVERLAP_PATCHES and selected directly when overlap_scheduling is enabled. That keeps autoparallel_backend focused on installing optional functorch AC and Inductor overlap config patches around compile_fx without rebuilding the same overlap dictionary on each backend construction.

Authored with Claude.

@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/4 branch from cbfb975 to 6791d71 Compare May 4, 2026 03:14
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/5 branch from 33e84e1 to 9800603 Compare May 4, 2026 03:14
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 4, 2026
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/4 branch from 6791d71 to 18dd7f7 Compare May 4, 2026 03:18
sanketpurandare added a commit that referenced this pull request May 4, 2026
stack-info: PR: #443, branch: sanketpurandare/stack/5
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/5 branch from 9800603 to 7e33971 Compare May 4, 2026 03:18
@sanketpurandare sanketpurandare requested a review from aditvenk May 4, 2026 03:22
@sanketpurandare sanketpurandare marked this pull request as draft May 4, 2026 03:32
@sanketpurandare sanketpurandare changed the base branch from sanketpurandare/stack/4 to main May 4, 2026 03:32
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/5 branch from 7e33971 to 2eed64d Compare May 4, 2026 03:32
@sanketpurandare sanketpurandare changed the base branch from main to sanketpurandare/stack/4 May 4, 2026 03:32
@sanketpurandare sanketpurandare marked this pull request as ready for review May 4, 2026 03:32
@sanketpurandare sanketpurandare marked this pull request as draft May 4, 2026 04:02
@sanketpurandare sanketpurandare changed the base branch from sanketpurandare/stack/4 to main May 4, 2026 04:02
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/5 branch 2 times, most recently from 6e33a9d to 7bddf7d Compare May 4, 2026 04:07
@sanketpurandare sanketpurandare changed the base branch from main to sanketpurandare/stack/3 May 4, 2026 04:08
@sanketpurandare sanketpurandare marked this pull request as ready for review May 4, 2026 04:08
@sanketpurandare sanketpurandare marked this pull request as draft May 4, 2026 20:00
@sanketpurandare sanketpurandare changed the base branch from sanketpurandare/stack/3 to main May 4, 2026 20:00
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/5 branch from 7bddf7d to 529bae8 Compare May 4, 2026 20:00
@sanketpurandare sanketpurandare changed the base branch from main to sanketpurandare/stack/3 May 4, 2026 20:00
@sanketpurandare sanketpurandare marked this pull request as ready for review May 4, 2026 20:00
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/3 branch from d1ce828 to 6bb34f3 Compare May 4, 2026 20:28
@sanketpurandare sanketpurandare changed the title Export AutoParallel backend compile policy helpers Export AutoParallel backend compile helpers May 4, 2026
@sanketpurandare sanketpurandare changed the base branch from main to sanketpurandare/stack/3 May 4, 2026 23:50
@sanketpurandare sanketpurandare marked this pull request as ready for review May 4, 2026 23:50
@sanketpurandare sanketpurandare marked this pull request as draft May 8, 2026 00:23
@sanketpurandare sanketpurandare changed the base branch from sanketpurandare/stack/3 to main May 8, 2026 00:23
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/5 branch from 1798240 to 1d58a94 Compare May 8, 2026 00:23
@sanketpurandare sanketpurandare changed the title Export AutoParallel backend compile helpers Move FSDP recompute tagging to the placement compile path May 8, 2026
@sanketpurandare sanketpurandare changed the base branch from main to sanketpurandare/stack/3 May 8, 2026 00:23
@sanketpurandare sanketpurandare marked this pull request as ready for review May 8, 2026 00:23
@sanketpurandare sanketpurandare marked this pull request as draft May 8, 2026 00:30
@sanketpurandare sanketpurandare changed the base branch from sanketpurandare/stack/3 to main May 8, 2026 00:30
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/5 branch from 1d58a94 to 9f3c42f Compare May 8, 2026 00:30
@sanketpurandare sanketpurandare changed the base branch from main to sanketpurandare/stack/3 May 8, 2026 00:30
@sanketpurandare sanketpurandare marked this pull request as ready for review May 8, 2026 00:31
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/3 branch from 539c35b to f6f6c35 Compare May 8, 2026 00:32
sanketpurandare added a commit that referenced this pull request May 8, 2026
This moves mark_fsdp_all_gather_recomputation out of _apply_placement_common and into apply_placement, after the sharded graph has been cleaned up, traced, converted from view to reshape, functionalized for fresh index_put_ mutations, written back to joint descriptors, and prepared for AOT compilation. The common placement helper now only builds and normalizes the parallel graph, while the training compile path applies the FSDP all-gather recomputation tags immediately before invoking aot_compile_joint_with_descriptors.

Keeping the tag insertion at the apply_placement boundary makes the graph mutation order explicit: graph rewrites that affect structure happen first, descriptor state is refreshed, wait_tensor DCE behavior is installed, and then recompute metadata is added to the graph that the joint compiler consumes. This avoids mixing placement graph construction with compile-time recompute metadata and keeps the common helper usable for future placement flows that should not eagerly stamp FSDP recompute tags.

The compile backend behavior is otherwise unchanged, but the Inductor overlap-scheduling patch set is now centralized in _INDUCTOR_OVERLAP_PATCHES and selected directly when overlap_scheduling is enabled. That keeps autoparallel_backend focused on installing optional functorch AC and Inductor overlap config patches around compile_fx without rebuilding the same overlap dictionary on each backend construction.

Authored with Claude.

stack-info: PR: #443, branch: sanketpurandare/stack/5
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/5 branch from 9f3c42f to e944be4 Compare May 8, 2026 00:32
@sanketpurandare sanketpurandare marked this pull request as draft May 8, 2026 00:33
@sanketpurandare sanketpurandare changed the base branch from sanketpurandare/stack/3 to main May 8, 2026 00:33
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/5 branch from e944be4 to 48ac620 Compare May 8, 2026 00:33
@sanketpurandare sanketpurandare changed the base branch from main to sanketpurandare/stack/3 May 8, 2026 00:34
@sanketpurandare sanketpurandare marked this pull request as ready for review May 8, 2026 00:34
sanketpurandare added a commit that referenced this pull request May 8, 2026
This moves mark_fsdp_all_gather_recomputation out of _apply_placement_common and into apply_placement, after the sharded graph has been cleaned up, traced, converted from view to reshape, functionalized for fresh index_put_ mutations, written back to joint descriptors, and prepared for AOT compilation. The common placement helper now only builds and normalizes the parallel graph, while the training compile path applies the FSDP all-gather recomputation tags immediately before invoking aot_compile_joint_with_descriptors.

Keeping the tag insertion at the apply_placement boundary makes the graph mutation order explicit: graph rewrites that affect structure happen first, descriptor state is refreshed, wait_tensor DCE behavior is installed, and then recompute metadata is added to the graph that the joint compiler consumes. This avoids mixing placement graph construction with compile-time recompute metadata and keeps the common helper usable for future placement flows that should not eagerly stamp FSDP recompute tags.

The compile backend behavior is otherwise unchanged, but the Inductor overlap-scheduling patch set is now centralized in _INDUCTOR_OVERLAP_PATCHES and selected directly when overlap_scheduling is enabled. That keeps autoparallel_backend focused on installing optional functorch AC and Inductor overlap config patches around compile_fx without rebuilding the same overlap dictionary on each backend construction.

Authored with Claude.

stack-info: PR: #443, branch: sanketpurandare/stack/5
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/5 branch from 48ac620 to 70241c6 Compare May 8, 2026 00:34
@sanketpurandare sanketpurandare changed the base branch from sanketpurandare/stack/3 to main May 8, 2026 00:34
This moves mark_fsdp_all_gather_recomputation out of _apply_placement_common and into apply_placement, after the sharded graph has been cleaned up, traced, converted from view to reshape, functionalized for fresh index_put_ mutations, written back to joint descriptors, and prepared for AOT compilation. The common placement helper now only builds and normalizes the parallel graph, while the training compile path applies the FSDP all-gather recomputation tags immediately before invoking aot_compile_joint_with_descriptors.

Keeping the tag insertion at the apply_placement boundary makes the graph mutation order explicit: graph rewrites that affect structure happen first, descriptor state is refreshed, wait_tensor DCE behavior is installed, and then recompute metadata is added to the graph that the joint compiler consumes. This avoids mixing placement graph construction with compile-time recompute metadata and keeps the common helper usable for future placement flows that should not eagerly stamp FSDP recompute tags.

The compile backend behavior is otherwise unchanged, but the Inductor overlap-scheduling patch set is now centralized in _INDUCTOR_OVERLAP_PATCHES and selected directly when overlap_scheduling is enabled. That keeps autoparallel_backend focused on installing optional functorch AC and Inductor overlap config patches around compile_fx without rebuilding the same overlap dictionary on each backend construction.

Authored with Claude.

stack-info: PR: #443, branch: sanketpurandare/stack/5
@sanketpurandare sanketpurandare marked this pull request as draft May 8, 2026 00:35
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/5 branch from 70241c6 to b24cf46 Compare May 8, 2026 00:35
@sanketpurandare sanketpurandare marked this pull request as ready for review May 8, 2026 00:35
@sanketpurandare sanketpurandare merged commit d984c66 into main May 8, 2026
6 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants