I believe that line 112 of SJE/sje.py (replicated here) serves to normalize the projected image feature vector before taking its dot product with the class embeddings.
line 112: XW = preprocessing.scale(XW)
However, using this function normalizes the elements of the vector XW such that it has zero mean and unit variance, instead of scaling it by its L2 norm. This would then change the direction of XW itself, making the dot product meaningless.
Should this be changed to the following?
line 112: XW = XW / np.linalg.norm(XW)
What's surprising to me is that decent results are achieved even with the original version. I tested a few hyperparameter configs with the new code (by no means the full grid search), and achieved similar results on AWA2 with a lower margin of 0.25 (makes sense given that it now actually has unit norm).
I believe that line 112 of SJE/sje.py (replicated here) serves to normalize the projected image feature vector before taking its dot product with the class embeddings.
line 112: XW = preprocessing.scale(XW)However, using this function normalizes the elements of the vector XW such that it has zero mean and unit variance, instead of scaling it by its L2 norm. This would then change the direction of XW itself, making the dot product meaningless.
Should this be changed to the following?
line 112: XW = XW / np.linalg.norm(XW)What's surprising to me is that decent results are achieved even with the original version. I tested a few hyperparameter configs with the new code (by no means the full grid search), and achieved similar results on AWA2 with a lower margin of 0.25 (makes sense given that it now actually has unit norm).