Hi, I was trying to employ the same vae way to guide a text generation task, but during my training, the kl loss always exploded and I got a nan Z after several thousands of steps. Could you please share some ideas of how to jointly train a model like this?
Hi, I was trying to employ the same vae way to guide a text generation task, but during my training, the kl loss always exploded and I got a nan Z after several thousands of steps. Could you please share some ideas of how to jointly train a model like this?