What is the configuration used in the paper for ResNet-500, i.e. #layers per stage? Is the configuration carefully chosen using some empirical study?
What is the configuration used in the paper for ResNet-500, i.e. #layers per stage?
Is the configuration carefully chosen using some empirical study?