[fix]: bugfix for RAY_EXPERIMENTAL_NOSET_ASCEND/CUDA_RT_VISIBLE_DEVICES in RL by xiazhahe · Pull Request #151 · ISEEKYAN/mbridge

xiazhahe · 2026-06-11T11:46:36Z

If using a reinforcement learning framework, such as Verl, and setting the global variables RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES or RAY_EXPERIMENTAL_NOSET_CUDA_RT_VISIBLE_DEVICES, the behavior changes from each card only being able to see its own card (i.e., each card is rank 0) to all cards in the device being visible. However, device = get_device_name() implies that tensors are loaded on rank 0 by default, which can easily lead to all cards' tensors being loaded on rank 0, causing an OOM (Out of Memory) error in weight = f.get_tensor(name).

Therefore, in this PR, the tensor is loaded from the current device rank obtained by torch instead of rank 0 by default.

bugfix for RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES in RL

9b35aef

xiazhahe changed the title ~~bugfix for RAY_EXPERIMENTAL_NOSET_ASCEND/CUDA_RT_VISIBLE_DEVICES in RL~~ [fix]: bugfix for RAY_EXPERIMENTAL_NOSET_ASCEND/CUDA_RT_VISIBLE_DEVICES in RL Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix]: bugfix for RAY_EXPERIMENTAL_NOSET_ASCEND/CUDA_RT_VISIBLE_DEVICES in RL#151

[fix]: bugfix for RAY_EXPERIMENTAL_NOSET_ASCEND/CUDA_RT_VISIBLE_DEVICES in RL#151
xiazhahe wants to merge 1 commit into
ISEEKYAN:mainfrom
xiazhahe:main

xiazhahe commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xiazhahe commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant