After applying Channel Attention Module, maybe it would be better to apply a convolution layer in order to modify the channels to the original value (usually 3 channels), instead of applying Spatial Attention Module instantly. Or Spatial Attention Module can't make sense.