- mask multiply change to other operation??
- validate pydub on node
- rewrite perceptual quality to only use size and wetness
- annotate data with attributes used for PEAQ?
- Use PESQ as additional label (is speech more important than music for us?)
from asteroid.models import ConvTasNet
import soundfile as sf
import numpy as np
import torch
input, sr = sf.read("mix.wav", dtype="float32")
input = torch.from_numpy(input).unsqueeze(0).requires_grad_()
print(input.grad)
model = ConvTasNet.from_pretrained("JorisCos/ConvTasNet_Libri1Mix_enhsingle_16k")
output = model(input)
output.backward(torch.ones_like(output))
print(input.grad)