Skip to content

Add GPU-side Gumbel-max sampling for CUDA graph compatibility#18844

Open
Gasoonjia wants to merge 3 commits intocuda-graphfrom
cuda-graph-sampling
Open

Add GPU-side Gumbel-max sampling for CUDA graph compatibility#18844
Gasoonjia wants to merge 3 commits intocuda-graphfrom
cuda-graph-sampling

Conversation

@Gasoonjia
Copy link
Copy Markdown
Contributor

@Gasoonjia Gasoonjia commented Apr 13, 2026

This PR replaces cpu sampler with CUDA sampler and fuse sampler with forward method to both eliminate unnecessary data transfer and improve sampling efficient. Decode performance increases from 113.8 token/s to 119.5 token/s

Once we land the device support pipeline, we should decompose the forward method with sampling.

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Apr 13, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18844

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 7 New Failures, 4 Unrelated Failures

As of commit e207ffc with merge base 2eaa16c (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 13, 2026
@Gasoonjia Gasoonjia force-pushed the cuda-graph-sampling branch from f05ebaa to b4f9eca Compare April 13, 2026 19:13
@Gasoonjia Gasoonjia force-pushed the cuda-graph-sampling branch from 1bf973d to 028894e Compare April 13, 2026 21:26
@Gasoonjia Gasoonjia marked this pull request as ready for review April 13, 2026 23:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/cuda CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant