Skip to content

Discrepancy in Reprojection Loss Calculation: Multiplication vs. π‘Ÿ / 𝜎 + log ⁑ ( 𝜎 ) r/Οƒ+log(Οƒ)Β #11

@chansoopark98

Description

@chansoopark98

Hello,

I have noticed a discrepancy in the implementation of the reprojection loss in the PyTorch version compared to what is described in the original D3VO paper.

The paper theoretically defines the per-pixel reprojection loss (assuming a Laplacian noise model) as:

𝐿= (π‘Ÿ / 𝜎) + log(𝜎)

where π‘Ÿ is the photometric residual and 𝜎 represents the predicted uncertainty. This formulation ensures that when the predicted uncertainty is low (i.e., the pixel is reliable), the residual is weighted more heavily, and when the uncertainty is high, the residual's impact is reduced.

However, in the provided PyTorch code, the reprojection loss is computed as follows:

def compute_reprojection_loss(self, pred, target, sigma):
    """Computes reprojection loss between a batch of predicted and target images"""
    abs_diff = torch.abs(target - pred)
    l1_loss = abs_diff.mean(1, True)

    if self.opt.no_ssim:
        reprojection_loss = l1_loss
    else:
        ssim_loss = (self.ssim(pred, target)).mean(1, True)
        reprojection_loss = 0.85 * ssim_loss + 0.15 * l1_loss

        # Reference: https://github.com/no-Seaweed/Learning-Deep-Learning-1/blob/master/paper_notes/sfm_learner.md
        # transformed_sigma = (10 * sigma + 0.1)
        
        # Exp 1 
        # transformed_sigma = sigma + 0.001
        # reprojection_loss = (reprojection_loss / transformed_sigma) + torch.log(transformed_sigma)

        reprojection_loss = (reprojection_loss * sigma)

    return reprojection_loss

Here, instead of dividing by 𝜎 and adding log(𝜎) the loss is simply multiplied by 𝜎
(i.e., reprojection_loss = reprojection_loss * sigma).

This approach is contrary to the theoretical formulationβ€”since if 𝜎 is low (indicating high confidence), the loss should be high, not scaled down by multiplication.

Additionally, there are commented-out lines that suggest experimental attempts (e.g., using transformed_sigma = sigma + 0.001) that would be closer to the π‘Ÿ / 𝜎 +log(𝜎) formulation, but these are not used in the final code.

Could we please discuss:

The rationale behind opting for a simple multiplication over the theoretically motivated formulation?

Whether the current approach is empirically validated to work better, and if so, what might be the trade-offs?

Suggestions for reconciling the implementation with the original paper's formulation without sacrificing training stability.

Looking forward to your insights on this matter.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions