saving output from different ranks#62
Conversation
forklady42
left a comment
There was a problem hiding this comment.
A few more suggestions. Feel free to merge after addressing.
| idx = indices[i] | ||
| pred_i = preds[i].numpy() | ||
| np.save(out_dir / f"{idx}.npy", pred_i) | ||
| np.save(self.out_dir / f"{idx}.npy", preds[i].squeeze(0).cpu().numpy()) |
There was a problem hiding this comment.
Would be safer to include self.global_rank in addition to the index in the file name. DistributedSampler pads the dataset to be evenly divisible, which can make for weirdness with the added indices.
There was a problem hiding this comment.
@forklady42 do we want this to be saved/logged in the metrics (csv) file as well?
There was a problem hiding this comment.
I think global rank is already in the metrics file name? I'm seeing tmp_csv = self.tmp_dir / f"metrics_batch_{self.global_rank}_{batch_idx}.csv"
There was a problem hiding this comment.
yes, I meant using it when combining the temporary files into metrics.csv. If several ranks process the same index, it would appear >1 times in the file.
There was a problem hiding this comment.
ah, yes, probably good to include the rank to make the duplicates easier to reason about.
hanaol
left a comment
There was a problem hiding this comment.
addressed the comments
Handles logging/saving the performance metric across multiple ranks.