Hello, really appreciate your nice work.
I hope this message finds you well. I have two questions regarding the calculation of the Hessian matrix in your code. Specifically, I'm looking at the function where you calculate the second-order derivatives for each parameter with respect to all parameters:
row = self.gradient(grad[j], inputs[i:], retain_graph=True)[j:]
(1) I wonder why only the [j:] part of the result is taken? Is it assumed that the derivative has no effect on the preceding parameters?
(2) Additionally, when assigning values, why is the assignment done as follows and could you please explain the reasoning behind these specific assignments?
out.data[ai, ai:].add_(row.clone().type_as(out).data) # ai's row
if ai + 1 < n:
out.data[ai + 1:, ai].add_(row.clone().type_as(out).data[1:]) # ai's column
Thank you very much for your time and effort in maintaining this project. Your help is greatly appreciated.
Best regards,
Shun Lu
Hello, really appreciate your nice work.
I hope this message finds you well. I have two questions regarding the calculation of the Hessian matrix in your code. Specifically, I'm looking at the function where you calculate the second-order derivatives for each parameter with respect to all parameters:
(1) I wonder why only the [j:] part of the result is taken? Is it assumed that the derivative has no effect on the preceding parameters?
(2) Additionally, when assigning values, why is the assignment done as follows and could you please explain the reasoning behind these specific assignments?
Thank you very much for your time and effort in maintaining this project. Your help is greatly appreciated.
Best regards,
Shun Lu