Hi author.
There is a new published paper proposed additive margin softmax(AM-softmax), which seems easier to train than sphereface.
https://arxiv.org/abs/1801.05599
I implemented main part of this paper here by tensorflow:
https://github.com/Joker316701882/Additive-Margin-Softmax
With exact hyper-parameters with author, I can also only achieve 98.x% accu. So I'm wondering is it the problem of tensorflow low-level implementation (like optimizer) different from caffe so that with same parameters, it's hard to reach exact performance. Have you ever tried loss in paper like AM-softmax(cosface), arcface etc.
Will be glad to know your idea about this.
Hi author.
There is a new published paper proposed additive margin softmax(AM-softmax), which seems easier to train than sphereface.
https://arxiv.org/abs/1801.05599
I implemented main part of this paper here by tensorflow:
https://github.com/Joker316701882/Additive-Margin-Softmax
With exact hyper-parameters with author, I can also only achieve 98.x% accu. So I'm wondering is it the problem of tensorflow low-level implementation (like optimizer) different from caffe so that with same parameters, it's hard to reach exact performance. Have you ever tried loss in paper like AM-softmax(cosface), arcface etc.
Will be glad to know your idea about this.