I understand from #2 that these losses are intentionally not retrieved during evaluation as they do not exist in the dictionary. Is there a reason why dino and clip features are not inferred from the evaluation dataset and the respective losses computed?
I am wondering if I misunderstood something and there is some limitation that makes this impossible. I would like to look at the clip and dino losses to check if the features lifting converges.
I understand from #2 that these losses are intentionally not retrieved during evaluation as they do not exist in the dictionary. Is there a reason why dino and clip features are not inferred from the evaluation dataset and the respective losses computed?
I am wondering if I misunderstood something and there is some limitation that makes this impossible. I would like to look at the clip and dino losses to check if the features lifting converges.