I was checking the PCFLayer class and I have a question regarding the gathered_feat calculation:
|
gathered_feat = (gathered_feat.view(B, -1, self.num_heads, K, M) |
|
* guidance_score.permute(0, 3, 2, 1)).view(B, -1, K, M) |
Here, Let's assume that B=4, num_heads=8, K=16, M=128 and in_ch=16. The .view() operation transforms the gathered_feat from shape (B, in_ch, K, M) -> (B, -1, num_heads, K, M), which with the correct numbers substituted is the following:
(4, 16, 16, 128) -> (4, 2, 8, 16, 128), resulting in a 5 dimensional tensor. Then this tensor is element-wise multiplied with the guidance_score which has a shape of (B, num_heads, K, M) after the .permute() ( (4, 8, 16,128) ).
So if gathered_feat = (gathered_feat.view(B, -1, self.num_heads, K, M) * guidance_score.permute(0, 3, 2, 1)) is calculated, an element wise multiplication is calculated between a 5 and a 4 dimensional tensor:
( (4, 2, 8, 16, 128)*(4, 8, 16,128) )
According to https://pytorch.org/docs/stable/notes/broadcasting.html#broadcasting-semantics this is possible due to the broadcasting, however it only works if:
When iterating over the dimension sizes, starting at the trailing dimension, the dimension sizes must either be equal, one of them is 1, or one of them does not exist.
In this case (B=4) this would fail as the 0th dimension of guidance_score and the 1st dimension of gathered_feat is not the same (4 and 2 respectively).
My question is, am I correct that it only works if B=1? Or I missunderstood some part of the code.
In the PCFLayer class description, you mention that the batch size is usually 1, which led me to open this issue as in this case everything works as expected.
|
Note: batch_size is usually 1 since we are using the packed representation packing multiple point clouds into one. However this dimension needs to be there for pyTorch to work properly. |
Thank you in advance for your answer!
I was checking the PCFLayer class and I have a question regarding the gathered_feat calculation:
ml-pointconvformer/layers.py
Lines 384 to 385 in a6f53d5
Here, Let's assume that
B=4,num_heads=8,K=16,M=128andin_ch=16. The.view()operation transforms thegathered_featfrom shape(B, in_ch, K, M) -> (B, -1, num_heads, K, M), which with the correct numbers substituted is the following:(4, 16, 16, 128) -> (4, 2, 8, 16, 128), resulting in a 5 dimensional tensor. Then this tensor is element-wise multiplied with theguidance_scorewhich has a shape of(B, num_heads, K, M)after the.permute()((4, 8, 16,128)).So if
gathered_feat = (gathered_feat.view(B, -1, self.num_heads, K, M) * guidance_score.permute(0, 3, 2, 1))is calculated, an element wise multiplication is calculated between a 5 and a 4 dimensional tensor:(
(4, 2, 8, 16, 128)*(4, 8, 16,128))According to https://pytorch.org/docs/stable/notes/broadcasting.html#broadcasting-semantics this is possible due to the broadcasting, however it only works if:
In this case (
B=4) this would fail as the 0th dimension ofguidance_scoreand the 1st dimension ofgathered_featis not the same (4 and 2 respectively).My question is, am I correct that it only works if
B=1? Or I missunderstood some part of the code.In the PCFLayer class description, you mention that the batch size is usually 1, which led me to open this issue as in this case everything works as expected.
ml-pointconvformer/layers.py
Line 216 in a6f53d5
Thank you in advance for your answer!