[Pallas] Exclude output-only tensors from pallas_call inputs by norx1991 · Pull Request #1998 · pytorch/helion

norx1991 · 2026-04-09T21:26:25Z

Summary

Builds on #1984 (HBM donate for output-only tensors). Output-only tensors are now excluded from pallas_call inputs entirely, and the launcher returns the pallas_call results as torch tensors. The generated host code captures the return value.

#1984 already eliminated VMEM pressure via HBM in_specs, but the donated tensor still triggered OpSplitMode::kSplitBoth in torch_tpu, inserting an empty.1 broadcast op (~127 us overhead). This PR eliminates that by making input_output_aliases empty for output-only kernels — no donation, no graph split.

Generated code before (#1984):

out = torch.empty_like(x)
_launcher(kernel, grid, x, out, _output_indices=[1], _inplace_indices=[], ...)
return out

Generated code after:

out = torch.empty_like(x)
out = _launcher(kernel, grid, x, out, _output_indices=[1], _inplace_indices=[], ...)
return out

The launcher excludes out from pallas_call inputs and returns the result directly. For multiple output-only tensors: out1, out2 = _launcher(...).

Authored with Claude Code.

Output-only tensors (allocated with empty/empty_like/new_empty and never read by the kernel) are excluded from pallas_call inputs entirely. The launcher returns the pallas_call results for these tensors, and the generated host code captures the return value. This eliminates both VMEM pressure (output-only tensor not loaded) and the OpSplitMode::kSplitBoth graph split in torch_tpu (empty input_output_aliases for output-only kernels).

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 9, 2026

norx1991 force-pushed the yifeixu/pallas-vmem-fix-v2 branch from 76fcda7 to 6a459f3 Compare April 9, 2026 21:42

norx1991 force-pushed the yifeixu/pallas-vmem-fix-v2 branch from 6a459f3 to 3a26351 Compare April 9, 2026 23:06

norx1991 mentioned this pull request Apr 9, 2026

[Pallas] Exclude output-only tensors from Pallas pallas_call inputs to improve performance #1849

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Pallas] Exclude output-only tensors from pallas_call inputs#1998

[Pallas] Exclude output-only tensors from pallas_call inputs#1998
norx1991 wants to merge 1 commit intomainfrom
yifeixu/pallas-vmem-fix-v2

norx1991 commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

norx1991 commented Apr 9, 2026

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant