torch execute gpu ops in async, so when you call numpy, torch will sync up all ops and transfer to cpu, then transfer back when you call torch ops again, which is extremely slow.
|
yy1 = np.maximum(dets[i, 0].to("cpu").numpy(), dets[pos:, 0].to("cpu").numpy()) |
torch execute gpu ops in async, so when you call numpy, torch will sync up all ops and transfer to cpu, then transfer back when you call torch ops again, which is extremely slow.
Soft-NMS/softnms_pytorch.py
Line 51 in 95dab79