Remove inter-trial device syncs during autotuning. by romerojosh · Pull Request #116 · NVIDIA/cuDecomp

romerojosh · 2026-03-17T22:24:38Z

Currently, the cuDecomp autotuning implementation enforces device synchronization between trials to query timings from recorded cudaEvents. These synchronizations aren't really necessary and can pollute timings with more exposed kernel launch latency from draining the GPU work queue between trials.

This PR updates the autotuning implementation to remove inter-trial device syncs where possible.

Signed-off-by: Josh Romero <joshr@nvidia.com>

romerojosh · 2026-03-17T22:38:20Z

/build

github-actions · 2026-03-17T22:38:29Z

🚀 Build workflow triggered! View run

github-actions · 2026-03-17T22:49:08Z

✅ Build workflow passed! View run

romerojosh added 2 commits March 17, 2026 15:14

Remove inter-trial device syncs during autotuning.

005d496

Signed-off-by: Josh Romero <joshr@nvidia.com>

Formatting.

8f3ae34

Signed-off-by: Josh Romero <joshr@nvidia.com>

romerojosh merged commit 2d8e1cb into main Mar 19, 2026
4 checks passed

romerojosh deleted the reduce_autotuning_syncs branch March 23, 2026 21:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove inter-trial device syncs during autotuning.#116

Remove inter-trial device syncs during autotuning.#116
romerojosh merged 2 commits intomainfrom
reduce_autotuning_syncs

romerojosh commented Mar 17, 2026

Uh oh!

romerojosh commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

romerojosh commented Mar 17, 2026

Uh oh!

romerojosh commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant