You have done a very nice job on your paper! I tried to implement your proposed network these days, but I found several problems.
-
The first one is I found that the Fc layer after descriptor always have negative impacts on the result, on the CUB200-2011, I get top-1 recall 72 on the final l2 layer, but I can get 76 on GD(1) layer (under MG configuration). I think maybe it is influenced strongly by the auxiliary classification branch? I want to know if the loss of ranking loss branch should get more weight?
-
The second one is how much iters do you train, what is your strategy to tune the lr? (I use 4000 iters and Adam, and the lr divide by 10 on iter 1000, 2000, 3000)
-
The third question is do you use bias on every fc layer?
-
The last one is do you fix the bn layer in the backbone?
You have done a very nice job on your paper! I tried to implement your proposed network these days, but I found several problems.
The first one is I found that the Fc layer after descriptor always have negative impacts on the result, on the CUB200-2011, I get top-1 recall 72 on the final l2 layer, but I can get 76 on GD(1) layer (under MG configuration). I think maybe it is influenced strongly by the auxiliary classification branch? I want to know if the loss of ranking loss branch should get more weight?
The second one is how much iters do you train, what is your strategy to tune the lr? (I use 4000 iters and Adam, and the lr divide by 10 on iter 1000, 2000, 3000)
The third question is do you use bias on every fc layer?
The last one is do you fix the bn layer in the backbone?