Hi! Following the footsteps of #59, #65 and #70, I would like to ask for some further clarification.
These issues suggest that within a concrete step/iteration, it is allowed that two operations require a different strip of the same tensor.
Doesn't that contradict the "unified execution grid" idea, if we need to simultaneously execute the same operations on different (maybe overlapping) parts of the same tensor?
Closely related, is it indeed possible that a MatMul operation will have the same tensor as both of its inputs, as asked in #59?
Seems to me like this raises some questions. Consider a MatMul Op0 where both inputs are Tensor0, the output is Tensor1. Both tensors are 512x512, and we use execution granularity [128, 128, 512] (no split-k). In each step, we always need one row-strip and one column-strip from Tensor0. How would e.g. implicit reuse be defined in this case? Would the row strip be compared to the last row strip, and the column strip to the last column strip, as if they were independent tensors?
In a more complex case, we could have column strips of different widths from the same tensor, e.g.
Tensor0, Tensor1 -> Op0 (MatMul) -> Tensor3
Tensor2, Tensor1 -> Op1 (MatMul) -> Tensor4
Tensor3, Tensor4 -> Op2 (MatMul) -> Tensor5
All tensors 512x512, execution granularity [128, 128, 256]. Op0 uses column strips of width 256 from Tensor1 due to split-k, Op1 uses column strips of width 128 from Tensor1 due to spatial tiling. How would implicit reuse work here, is the last tile remembered per shape?
It seems to me that allowing different tiles of the same matrix in the same step may create some ambiguity for this simplified model.
Thanks a lot in advance!
Hi! Following the footsteps of #59, #65 and #70, I would like to ask for some further clarification.
These issues suggest that within a concrete step/iteration, it is allowed that two operations require a different strip of the same tensor.
Doesn't that contradict the "unified execution grid" idea, if we need to simultaneously execute the same operations on different (maybe overlapping) parts of the same tensor?
Closely related, is it indeed possible that a MatMul operation will have the same tensor as both of its inputs, as asked in #59?
Seems to me like this raises some questions. Consider a MatMul Op0 where both inputs are Tensor0, the output is Tensor1. Both tensors are 512x512, and we use execution granularity [128, 128, 512] (no split-k). In each step, we always need one row-strip and one column-strip from Tensor0. How would e.g. implicit reuse be defined in this case? Would the row strip be compared to the last row strip, and the column strip to the last column strip, as if they were independent tensors?
In a more complex case, we could have column strips of different widths from the same tensor, e.g.
Tensor0, Tensor1 -> Op0 (MatMul) -> Tensor3
Tensor2, Tensor1 -> Op1 (MatMul) -> Tensor4
Tensor3, Tensor4 -> Op2 (MatMul) -> Tensor5
All tensors 512x512, execution granularity [128, 128, 256]. Op0 uses column strips of width 256 from Tensor1 due to split-k, Op1 uses column strips of width 128 from Tensor1 due to spatial tiling. How would implicit reuse work here, is the last tile remembered per shape?
It seems to me that allowing different tiles of the same matrix in the same step may create some ambiguity for this simplified model.
Thanks a lot in advance!