Skip to content

Windows上多机多卡时报 Socket timeout #2

@Linxu59

Description

@Linxu59

您好,我在windows上运行您的代码,我是两台机器,每台机器一个显卡,运行时会报以下错误(我把代码中的后端改成gloo了,因为在windows上不支持nccl)。想请教下您有在windows上试过pytorch的多机多卡训练吗?
Traceback (most recent call last):
File "main.py", line 92, in
world_size = args.world_size
File "D:\Software\Anaconda3\envs\torch18\lib\site-packages\torch\distributed\distributed_c10d.py", line 510, in init_process_group
timeout=timeout))
File "D:\Software\Anaconda3\envs\torch18\lib\site-packages\torch\distributed\distributed_c10d.py", line 592, in _new_process_group_helper
timeout=timeout)
RuntimeError: Socket Timeout

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions