Skip to content

关于训练过程中聚类损失为负值的问题 #4

@blblp

Description

@blblp

您好,您的研究非常具有创新意义。想请教您一下,我在复现过程中遇到了在训练过程中由于类间项的数值规模超过类内项,导致总损失为负后梯度更新方向紊乱问题。
即auxi_loss = torch.mean(self.anomaly_score, dim=1) - 0.1torch.sum(self.diff_cos, dim=[1,2]) / (self.center_num * self.center_num - self.center_num)中的0.1torch.sum(self.diff_cos, dim=[1,2]) / (self.center_num * self.center_num - self.center_num)项数值大于torch.mean(self.anomaly_score, dim=1)导致auxi_loss损失为负,进而导致模型无法收敛。如以下的一个训练步骤信息:
dir_loss.mean(): 0.00891896989196539
auxi_loss.mean(): -0.4987548887729645
anomaly_score mean: 0.0077686607837677
diff_cos sum: 22692.25390625
Final loss: -0.453815221786499
我并未改动您在config里的参数,且同样在mvtec数据集上进行的训练,想请问您在调试过程中有遇到这个问题吗,能指点一二吗,非常感谢!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions