Enable AllGather Triton Backend#799
Enable AllGather Triton Backend#799mfrancepillois wants to merge 10 commits intoci_maxime_allreduce_triton_rocm_elementwise_rocmfrom
Conversation
Review SummaryThis PR extends the collective emitter infrastructure (originally built for AllReduce) to support AllGather via the Triton backend. It adds two kernel implementations (default and swizzled), tuple unpacking for AllGatherStart's Key issues found:
Details in inline comments. Automated review by Claude |
3a41014 to
5b6054a
Compare
5b6054a to
0db520d
Compare
|
wondering is this branch is based on upstream or xla-0.9.1? |
This branch is based on |
| // group_size_m = min(num_pid_m - first_pid_m, GROUP_SIZE_M) | ||
| // pid_m = first_pid_m + ((tile_id % num_pid_in_group) % group_size_m) | ||
| // pid_n = (tile_id % num_pid_in_group) // group_size_m | ||
| mlir::LogicalResult EmitAllGatherSwizzled(int64_t group_size_m) { |
There was a problem hiding this comment.
Currently, the swizzled kernel is not called but I'm keeping it until the performance evaluation is complete.
This PR enables AllGather triton backend:
AllGatherop (that returns a tuple)(This support needed the triton-xla atomics operations to be implemented. That's why it is based on top of the branch
ci_maxime_allreduce_triton_rocm_elementwise_rocm)