For the file resnet_cbam.py, I think line31 and line 33 are not consistent with the paper. fc1 and fc2 should be nn.Linear because the paper said:
Both descriptors are then forwarded to a shared network
to produce our channel attention map Mc 2 RC�1�1. The shared network is
composed of multi-layer perceptron (MLP) with one hidden layer. To reduce
parameter overhead, the hidden activation size is set to RC=r�1�1, where r is
the reduction ratio.
May I know why you use conv instead of Linear ?