fix: tensor dispatch with TP enabled by hann-wang · Pull Request #23 · AMD-AGI/ALTO

hann-wang · 2026-06-02T04:51:39Z

When TP enabled, mm/addmm is dispatch via __torch_dispatch__ not __torch_function__. We have to manually call the func to make sure LPT kernels are correctly invoked.

Note: this is a temp fix!!! Autograd backward is not working in __torch_dispatch__.

Copilot

Pull request overview

This PR aims to make the training weight wrapper tensor dispatch behave correctly when Tensor Parallelism (TP) is enabled, so that distributed weight movement and GEMM/grouped GEMM paths don’t accidentally drop the wrapper semantics needed for low-precision routing.

Changes:

Preserve the wrapper subclass across c10d.scatter_ (used by TP to distribute weights).
Special-case GEMM-like ops and _grouped_mm in __torch_dispatch__ to avoid the generic unwrap-and-call path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

+        elif func.__name__ in gemm_ops or func.__name__ == "_grouped_mm":
+            # Delegate to the subclass' GEMM / grouped_mm overrides without
+            # unwrapping the wrapper tensor, avoiding __torch_dispatch__ recursion.
+            return cls.__torch_function__(func, types, args, kwargs or {})


+        elif func.__name__ in gemm_ops or func.__name__ == "_grouped_mm":
+            # Delegate to the subclass' GEMM / grouped_mm overrides without
+            # unwrapping the wrapper tensor, avoiding __torch_dispatch__ recursion.
+            return cls.__torch_function__(func, types, args, kwargs or {})


+    # required for TP - scatter_ is used to distribute weights
+    torch.ops.c10d.scatter_.default,


fix: tensor dispatch with TP enabled

524deed

Copilot AI review requested due to automatic review settings June 2, 2026 04:51

Copilot started reviewing on behalf of hann-wang June 2, 2026 04:51 View session

Copilot AI reviewed Jun 2, 2026

View reviewed changes

Comment thread alto/kernels/dispatch/tensor.py Outdated

Delegate to the subclass' function

1891b5b

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings June 2, 2026 05:00

Copilot started reviewing on behalf of hann-wang June 2, 2026 05:00 View session

hann-wang merged commit 47b4886 into main Jun 2, 2026

hann-wang deleted the han/fix-tp-dispatch branch June 2, 2026 05:06

Copilot AI reviewed Jun 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: tensor dispatch with TP enabled#23

fix: tensor dispatch with TP enabled#23
hann-wang merged 2 commits into
mainfrom
han/fix-tp-dispatch

hann-wang commented Jun 2, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# required for TP - scatter_ is used to distribute weights
		torch.ops.c10d.scatter_.default,

Conversation

hann-wang commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hann-wang commented Jun 2, 2026 •

edited

Loading