What do you think about the network's adapatablity to training and inference on the audio data ?. Have you tested it out?, Or have a opinion on it ?