Thanks for the question.
-
Suppose two operations in a subgraph produce output matrices of sizes 256x256 and 128x256, respectively. Can I use an execution granularity of 128x128 and simply mask/silently ignore the execution when the spatial grid exceeds the boundaries of the smaller matrix? -> Yes, this is valid. The execution granularity [w, h, k] sets the tile size, not the grid dimensions. Different operations in the same subgraph may have different tile counts based on their output sizes; each operation's output is fully tiled and computed. This is standard output-stationary scheduling, as shown in several PROBLEM.md examples.
-
Additionally, is it allowed for a single subgraph to have multiple output operations of different types—specifically, one being a MatMul and the other a Pointwise operation? -> yes, as long as your scheduling produced all results at least once.
Originally posted by @yarongmu-google in #58
According to the clarification, it is possible to mask some of tile executions to respect the tensor boundaries. In this case, how do we specify the iteration order? Do we simply specify the iteration order for the tensor with the most iteration?
Also, the clarification in #20 introduced a notion of "sharing a same iteration space" and it seems like this is related to the issue presented in #58. Can you clarify what that notion exactly means?
Originally posted by @yarongmu-google in #58
According to the clarification, it is possible to mask some of tile executions to respect the tensor boundaries. In this case, how do we specify the iteration order? Do we simply specify the iteration order for the tensor with the most iteration?
Also, the clarification in #20 introduced a notion of "sharing a same iteration space" and it seems like this is related to the issue presented in #58. Can you clarify what that notion exactly means?