Trim non-pipeline input grads before caching in bwd_cache by aditvenk · Pull Request #431 · meta-pytorch/autoparallel

aditvenk · 2026-04-22T03:19:22Z

The backward graph may produce gradients for inputs beyond the pipeline activations (e.g. labels when loss is fused into the last stage). get_bwd_send_ops zips bwd_cache with grad_send_info using strict=True, and grad_send_info only has entries for pipeline activation inputs, so extra grads cause a ValueError.

Mirror the trimming that upstream PipelineStage does at torch/distributed/pipelining/stage.py:997.

Authored with Claude.

The backward graph may produce gradients for inputs beyond the pipeline activations (e.g. labels when loss is fused into the last stage). get_bwd_send_ops zips bwd_cache with grad_send_info using strict=True, and grad_send_info only has entries for pipeline activation inputs, so extra grads cause a ValueError. Mirror the trimming that upstream PipelineStage does at torch/distributed/pipelining/stage.py:997. Authored with Claude. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

aditvenk requested a review from sanketpurandare April 22, 2026 03:19

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 22, 2026

aditvenk requested a review from xmfan April 22, 2026 03:25

aditvenk closed this Apr 24, 2026

sanketpurandare reopened this Apr 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trim non-pipeline input grads before caching in bwd_cache#431

Trim non-pipeline input grads before caching in bwd_cache#431
aditvenk wants to merge 1 commit into
mainfrom
user/avenkataraman/pp-fix

aditvenk commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aditvenk commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants