Bypass aliases for consumers that would redistribute to the producer#479
Open
fmassa wants to merge 2 commits into
Open
Bypass aliases for consumers that would redistribute to the producer#479fmassa wants to merge 2 commits into
fmassa wants to merge 2 commits into
Conversation
When the ILP picks a placement for an `aten.alias.default` node that differs from its producer's placement, any consumer whose `input_spec` matches the producer's placement would redistribute the alias back to the producer's placement at runtime. The original tensor must then stay alive longer than necessary just to feed the redistribution. The LLaMA-3 backward graph hits this on every transformer block: the gradient at each residual add (`grad_h2`, S(0)S(1)) feeds an alias that the optimizer assigns to S(0)R for two einsum consumers, but a third consumer (the skip-add) wants S(0)S(1) — forcing a redistribution from R back to S(1). `eliminate_alias_round_trips` runs after `get_solution()` and rewires each such consumer directly to the alias's producer. The alias keeps serving any consumer that genuinely needs its placement; if no users remain, the alias is erased from both the graph and the solution dict. Unit tests in `tests/test_graph_utils.py` cover rewiring, alias erasure, the no-op case (alias placement matches producer), intermediate redistributions (consumer wants a third placement), and repeated inputs (`x + x`-style consumers). On the LLaMA-3 8B example (32 layers, 128 GPUs), the pass eliminates 32 round-trips. Authored with Claude. Authored with Claude
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When the ILP picks a placement for an
aten.alias.defaultnode that differs from its producer's placement, any consumer whoseinput_specmatches the producer's placement would redistribute the alias back to the producer's placement at runtime. The original tensor must then stay alive longer than necessary just to feed the redistribution.The LLaMA-3 backward graph hits this on every transformer block: the gradient at each residual add (
grad_h2, S(0)S(1)) feeds an alias that the optimizer assigns to S(0)R for two einsum consumers, but a third consumer (the skip-add) wants S(0)S(1) — forcing a redistribution from R back to S(1).eliminate_alias_round_tripsruns afterget_solution()and rewires each such consumer directly to the alias's producer. The alias keeps serving any consumer that genuinely needs its placement; if no users remain, the alias is erased from both the graph and the solution dict.Unit tests in
tests/test_graph_utils.pycover rewiring, alias erasure, the no-op case (alias placement matches producer), intermediate redistributions (consumer wants a third placement), and repeated inputs (x + x-style consumers). On the LLaMA-3 8B example (32 layers, 128 GPUs), the pass eliminates 32 round-trips. Authored with Claude.Authored with Claude