Skip to content

Questions about "NeRAF audio-visual joint training improves vision performance" #7

@yangbang18

Description

@yangbang18

Thanks for sharing. Table 2 in the paper shows that NeRAF audio-visual joint training improves vision performance. After reading the code, I have the following puzzle.

It is noted that the 3D grid in NeRAF is initialized as a tensor rather than parameters that require gradients:

def reset_grid(self,device=None):
        device = self.device if device is None else device
        self.grid = torch.zeros((7, int((self.grid_size[1] - self.grid_size[0]) / self.grid_step),
                                 int((self.grid_size[3] - self.grid_size[2]) / self.grid_step),
                                 int((self.grid_size[5] - self.grid_size[4]) / self.grid_step)),dtype=torch.float32,device=device)
        # Add coordinates
        grid_coordinates = torch.meshgrid(torch.arange(self.grid_size[0]+self.grid_step/2, self.grid_size[1], self.grid_step), torch.arange(self.grid_size[2]+self.grid_step/2, self.grid_size[3], self.grid_step), torch.arange(self.grid_size[4]+self.grid_step/2, self.grid_size[5], self.grid_step), indexing='ij')
        grid_coordinates = torch.stack(grid_coordinates, dim=0)
        self.grid[4:,:,:,:] = grid_coordinates

Then, when updating 3D grid values during training, the 3D grid is detached from the computation graph:

self.grid = self.grid.detach()

self.grid[0, xs, ys, zs] = color[:, 0].float().squeeze()
self.grid[1, xs, ys, zs] = color[:, 1].float().squeeze()
self.grid[2, xs, ys, zs] = color[:, 2].float().squeeze()
self.grid[3, xs, ys, zs] = alpha.float().squeeze()

Although such a 3D grid guides acoustic modeling, it seems that the audio loss can not be backpropagated to NeRF. So, why does NeRAF audio-visual joint training improve vision performance?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions