Define local get_op_idx after upstream removal#442
Conversation
cbfb975 to
6791d71
Compare
2a2aec7 to
7425c44
Compare
torch._inductor.comms.get_op_idx was removed in PyTorch commit a79a695988d as dead code cleanup. The function was trivially int(snode.get_name()[2:]) — inline it at the single call site. stack-info: PR: #442, branch: sanketpurandare/stack/4
6791d71 to
18dd7f7
Compare
18dd7f7 to
59e3b40
Compare
124fe52 to
96297cc
Compare
fmassa
left a comment
There was a problem hiding this comment.
I'm ok to approve to unblock, but I think it might also be worth considering if we should just remove those autobucketing passes from AutoParallel, if they already live in PyTorch. @IvanKobzarev wdyt?
| @@ -1,4 +1,4 @@ | |||
| exclude: 'build|stubs' | |||
| exclude: 'build|stubs|autoparallel/tools/overlap_simulator|tests/test_nccl_cost_model.py' | |||
There was a problem hiding this comment.
Is the test_nccl_cost_model.py problematic?
There was a problem hiding this comment.
Fixed the source linting errors. Sorry this was done sneakily by codex.
| ) # sort nodes by original operation order | ||
| ag_related_snodes = sorted( | ||
| ag_related_snode_set, key=lambda x: get_op_idx(x) | ||
| ag_related_snode_set, key=lambda x: int(x.get_name()[2:]) |
There was a problem hiding this comment.
This was the exact definition in upstream before it got removed. It does a very dumb thing, if you have your op string as "op142", it just strips the "op" part and gives you "142" which is then converted to an int.
There was a problem hiding this comment.
But I added it it as a separate definition now and also added some asserts and a unit test.
@IvanKobzarev just leaving a comment here that if you want to remove the passes you can remove them in a later PR, thanks! |
torch._inductor.comms.get_op_idx was removed in PyTorch commit a79a695988d as dead code cleanup. Keep the trivial op-name parser local so the autobucketing sort still validates scheduler node names before extracting the numeric index. Add focused coverage for the local parser, and stop excluding tests/test_nccl_cost_model.py from pre-commit by fixing its existing lint issues. Authored with Claude. stack-info: PR: #442, branch: sanketpurandare/stack/4
96297cc to
054648f
Compare
Stacked PRs:
Define local get_op_idx after upstream removal
torch._inductor.comms.get_op_idx was removed in PyTorch commit a79a695988d as dead code cleanup. Keep the trivial op-name parser local so the autobucketing sort still validates scheduler node names before extracting the numeric index.
Add focused coverage for the local parser, and stop excluding tests/test_nccl_cost_model.py from pre-commit by fixing its existing lint issues.
Authored with Claude.