Thanks for sharing your wonderful work! The training goes into failure case after several training iterations, and 'Nan' is printed. Though I have set the learning rate to 1e-8, and clip the gradient, the issue still exists. Could you please help me go against the problem. Thank you very much! Best regards.
Thanks for sharing your wonderful work! The training goes into failure case after several training iterations, and 'Nan' is printed. Though I have set the learning rate to 1e-8, and clip the gradient, the issue still exists. Could you please help me go against the problem. Thank you very much! Best regards.