Can transformer based backbone (i.e., ViT) be applied for training WDRO?
Can transformer based backbone (i.e., ViT) be applied for training WDRO?