Device handling improvements. by romerojosh · Pull Request #100 · NVIDIA/TorchFort

romerojosh · 2025-12-08T18:32:49Z

In related development to #99, this PR makes some additional device handling improvements in the TorchFort backend. In particular, this PR adds:

Usage of CUDAGuard objects within supervised and RL functions to properly set/unset the current CUDA device to expected model device. This is not fixing any current issue in the implementation but better sets up the code for direct CUDA runtime call utilization (e.g. CUDA graph capture/replay).
Checks on the user-supplied CUDA stream to ensure it is on the same device as the model.

Signed-off-by: Josh Romero <joshr@nvidia.com>

…er version. Signed-off-by: Josh Romero <joshr@nvidia.com>

Signed-off-by: Josh Romero <joshr@nvidia.com>

romerojosh · 2025-12-08T18:47:45Z

/build_and_test

github-actions · 2025-12-08T18:47:54Z

🚀 Build workflow triggered! View run

github-actions · 2025-12-08T19:00:02Z

✅ Build workflow passed! View run

azrael417

LGTM. What are the remaining issues we need to take care of.

romerojosh added 6 commits December 8, 2025 10:15

Better handling of non-default GPU/multi-GPU per process use cases.

5525970

Signed-off-by: Josh Romero <joshr@nvidia.com>

Add device context switch checks to supervised learning tests.

281a6d6

Signed-off-by: Josh Romero <joshr@nvidia.com>

Adding tests. Conditional use of cuStreamGetDevice based on CUDA driv…

12fce11

…er version. Signed-off-by: Josh Romero <joshr@nvidia.com>

Update tests.

83ebe32

Signed-off-by: Josh Romero <joshr@nvidia.com>

Update tests.

0963170

Signed-off-by: Josh Romero <joshr@nvidia.com>

Formatting fixes.

9e90507

Signed-off-by: Josh Romero <joshr@nvidia.com>

romerojosh requested a review from azrael417 December 8, 2025 19:01

azrael417 approved these changes Jan 5, 2026

View reviewed changes

romerojosh merged commit 9c82a64 into master Jan 5, 2026
4 checks passed

romerojosh deleted the device_handling_improvements branch January 6, 2026 17:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Device handling improvements.#100

Device handling improvements.#100
romerojosh merged 6 commits intomasterfrom
device_handling_improvements

romerojosh commented Dec 8, 2025

Uh oh!

romerojosh commented Dec 8, 2025

Uh oh!

github-actions bot commented Dec 8, 2025

Uh oh!

github-actions bot commented Dec 8, 2025

Uh oh!

azrael417 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

romerojosh commented Dec 8, 2025

Uh oh!

romerojosh commented Dec 8, 2025

Uh oh!

github-actions bot commented Dec 8, 2025

Uh oh!

github-actions bot commented Dec 8, 2025

Uh oh!

azrael417 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants