[TPU][Pallas]Fix example/cross_entropy.py on Pallas TPU by yarongmu-google · Pull Request #2002 · pytorch/helion

yarongmu-google · 2026-04-10T22:39:06Z

The kernels currently has 2 common issues that need support:

Long types are not supported in Pallas/Mosaic (XLA does support it but Helion doesn't go through XLA).
Directly indexing into vectors on HBM.
Add CI workflow #2 is the bigger fix here.

After this PR:

Benchmark Results

Implementation Time (ms) Speedup

helion 0.3826 1.10x
torch 0.4208 1.00x (ref)

…ivide dim (pytorch#1937)

…errors and fix zero division in block size calculation

…py.py to avoid unaligned HBM gather This optimizes the cross_entropy kernel to be hardware agnostic. By calculating the target logits via a boolean mask over the streaming dense block, it stays entirely within TensorCore/VMEM boundaries on TPU and perfectly coalesced on GPU, eliminating the unaligned 1D HBM gather which Pallas TC kernels do not natively support without SC DMA staging.

…s in backend.py

norx1991 and others added 7 commits April 2, 2026 19:06

[Pallas] Add test for Pallas OOB slice when reduction_loops doesn't d…

e40e4e7

…ivide dim (pytorch#1937)

Merge branch 'pytorch:main' into main

97b35b3

[Pallas] Add test for Pallas OOB slice when reduction_loops doesn't d…

fdbd20b

…ivide dim (pytorch#1937)

Merge branch 'pytorch:main' into main

8201686

fix(pallas): add mapping for 64-bit dtypes to 32-bit to avoid Pallas …

9878a8f

…errors and fix zero division in block size calculation

style: apply ruff and pyrefly auto-formatting across project files

a978600

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 10, 2026

yarongmu-google and others added 2 commits April 10, 2026 15:41

Merge branch 'pytorch:main' into main

54a1bdc

Merge branch 'main' into fix-pallas-dtype-mapping to resolve conflict…

7c6c52e

…s in backend.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TPU][Pallas]Fix example/cross_entropy.py on Pallas TPU#2002

[TPU][Pallas]Fix example/cross_entropy.py on Pallas TPU#2002
yarongmu-google wants to merge 9 commits intopytorch:mainfrom
yarongmu-google:fix-pallas-dtype-mapping

yarongmu-google commented Apr 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yarongmu-google commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

After this PR:

Benchmark Results

Implementation Time (ms) Speedup

helion 0.3826 1.10x torch 0.4208 1.00x (ref)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yarongmu-google commented Apr 10, 2026 •

edited

Loading

helion 0.3826 1.10x
torch 0.4208 1.00x (ref)