-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Hello,
I have noticed a discrepancy in the implementation of the reprojection loss in the PyTorch version compared to what is described in the original D3VO paper.
The paper theoretically defines the per-pixel reprojection loss (assuming a Laplacian noise model) as:
πΏ= (π / π) + log(π)
where π is the photometric residual and π represents the predicted uncertainty. This formulation ensures that when the predicted uncertainty is low (i.e., the pixel is reliable), the residual is weighted more heavily, and when the uncertainty is high, the residual's impact is reduced.
However, in the provided PyTorch code, the reprojection loss is computed as follows:
def compute_reprojection_loss(self, pred, target, sigma):
"""Computes reprojection loss between a batch of predicted and target images"""
abs_diff = torch.abs(target - pred)
l1_loss = abs_diff.mean(1, True)
if self.opt.no_ssim:
reprojection_loss = l1_loss
else:
ssim_loss = (self.ssim(pred, target)).mean(1, True)
reprojection_loss = 0.85 * ssim_loss + 0.15 * l1_loss
# Reference: https://github.com/no-Seaweed/Learning-Deep-Learning-1/blob/master/paper_notes/sfm_learner.md
# transformed_sigma = (10 * sigma + 0.1)
# Exp 1
# transformed_sigma = sigma + 0.001
# reprojection_loss = (reprojection_loss / transformed_sigma) + torch.log(transformed_sigma)
reprojection_loss = (reprojection_loss * sigma)
return reprojection_loss
Here, instead of dividing by π and adding log(π) the loss is simply multiplied by π
(i.e., reprojection_loss = reprojection_loss * sigma).
This approach is contrary to the theoretical formulationβsince if π is low (indicating high confidence), the loss should be high, not scaled down by multiplication.
Additionally, there are commented-out lines that suggest experimental attempts (e.g., using transformed_sigma = sigma + 0.001) that would be closer to the π / π +log(π) formulation, but these are not used in the final code.
Could we please discuss:
The rationale behind opting for a simple multiplication over the theoretically motivated formulation?
Whether the current approach is empirically validated to work better, and if so, what might be the trade-offs?
Suggestions for reconciling the implementation with the original paper's formulation without sacrificing training stability.
Looking forward to your insights on this matter.