Conversation
…ed several functions to multigpu
…g up spawn-based jobs
allevitan
left a comment
There was a problem hiding this comment.
So, I finally got a chance to read through the code. All in all I think it's really close, but I'm still seeing too many issues to be comfortable pulling it in now.
The two big overall issues that I'm seeing:
- It doesn't provide a speedup when run on the HPC node I have access to, a 4-GPU node with V100 cards, which was allocated exclusively. See below:
* It occasionally fails to kill the child processes on control-C
And there are a few smaller comments as well.
@yoshikisd, I know you won't have time going into the future to continue development, so my plan will be to take over the development of this branch (if that's okay with you), and try to work through these comments and issues, before trying to fold it in at a later time. Sorry for not having more time to get to the review earlier!
| for param in model.parameters(): | ||
| if param.requires_grad: | ||
| dist.all_reduce(param.grad.data, op=dist.ReduceOp.SUM) | ||
| param.grad.data /= world_size |
There was a problem hiding this comment.
Are we sure this should be an average, and not a sum? I would have expected that we'd want to sum gradients from the models across different GPUs.
|
|
||
| def sync_loss(loss): | ||
| """ | ||
| Synchronizes Rank 0 GPU's learning rate to all participating GPUs. |
There was a problem hiding this comment.
Typo - synchronizes loss
| world_size = get_world_size() | ||
|
|
||
| t.cuda.set_device(rank) | ||
| if rank == 0: |
There was a problem hiding this comment.
needs a check for verbosity (and the other print statements)
| print('[INFO]: RNG seed synchronized across all subprocesses.') | ||
|
|
||
|
|
||
| def cleanup(): |
There was a problem hiding this comment.
add a verbosity flag, default True
| backend: str = 'nccl', | ||
| timeout: int = 30, | ||
| seed: int = None, | ||
| verbose: bool = False): |
There was a problem hiding this comment.
I think a default of "True" will be best
There was a problem hiding this comment.
This looks good overall, but the more I think about it, the more I feel like the multigpu print protections might be more compact appearing elsewhere. I want to think about it a bit
|
|
||
| # Avg and sync gradients for multi-GPU jobs | ||
| if self.multi_gpu_used: | ||
| multigpu.sync_and_avg_grads(model=self.model, |
There was a problem hiding this comment.
Check on consequences of averaging vs summing - look here as well when potentially changing the earlier line
| @@ -0,0 +1,70 @@ | |||
| import cdtools | |||
There was a problem hiding this comment.
Add a comment or docstring indicating how it should be run
There was a problem hiding this comment.
I think I want to suggest to move much of the plot protection/save protection somewhere else, but I need to think on it a bit more.
|
|
||
| loss_mean_list, loss_std_list, \ | ||
| _, _, speed_up_mean_list, speed_up_std_list\ | ||
| = multigpu.run_speed_test(fn=reconstruct, |
There was a problem hiding this comment.
I get an error on this line when running the test,
RuntimeError: torch.multiprocessing.spawn was detected as the launching
E method, but either rank, world_size, master_addr, or
E master_port has not been explicitly defined. Please ensure
E that either these parameters have been explicitly defined,
E MASTER_ADDR/MASTER_PORT have been defined as environment
E variables, or launch the multi-GPU job with torchrun.
|
Hey Abe, please go ahead and work your magic with this project! I won't have time in the future to work with this on a GPU cluster. |
The spiritual successor to #17, this PR enables reconstruction scripts to be executed as a multi-GPU job initialized with
torchrunortorch.multiprocessing.spawn.Currently, only AdamReconstructor can take advantage of multi-GPU acceleration
Multi-GPU processing is based off of distributed data parallelism, where each subprocess/GPU
What's new
cdtools.tools.multigpu: Contains functions that enables CDTools reconstructions to run as multi-GPU jobsPtycho2DDataset,CDataset, andCDIModelonly saves/plots if they're running on the 'Rank 0' GPU subprocess (i.e., plots are generated only by one subprocess rather than all subprocesses).torchrunorspawn: Two example scripts gold_ball_ptycho_torchrun.py and gold_ball_ptycho_spawn.py shows how to set up the scripts accordingly.--runmultigpustarts up the multi-GPU pytests, which uses up to 2 GPUs.cdtools.tools.multigpu.run_speed_test: The example script gold_ball_ptycho_speedtest.py shows how to set up the test. The implementation is also much simpler than in [feature] Multi-GPU support #17.