Skip to content

Fix discrepancy in a_stride_m/a_stride_k for transposed dot kernels#9840

Open
copybara-service[bot] wants to merge 1 commit intomasterfrom
test_892702737
Open

Fix discrepancy in a_stride_m/a_stride_k for transposed dot kernels#9840
copybara-service[bot] wants to merge 1 commit intomasterfrom
test_892702737

Conversation

@copybara-service
Copy link
Copy Markdown
Contributor

Fix discrepancy in a_stride_m/a_stride_k for transposed dot kernels

Currently, the stride of k for transposed kernels (passed as a_stride_m because this is the row dimension when A is transposed) is the stride of tile_k values of k. This is inconsistent, because the stride is not for one value of k, which we assume in several places. This leads to multiplying or dividing strides to make them consistent.

In particular, run_dot multiplies the stride by k, while kernels do not, which means we can't use the same stride for both run_dot and a kernel. This discrepancy is preventing refactoring run_dot to capture the strides to pass to the kernels easily, which I think is a necessary step towards addressing some issues (packing A/B in the loops of run_dot).

Currently, the stride of k for transposed kernels (passed as a_stride_m because this is the row dimension when A is transposed) is the stride of tile_k values of k. This is inconsistent, because the stride is not for one value of k, which we assume in several places. This leads to multiplying or dividing strides to make them consistent.

In particular, `run_dot` multiplies the stride by k, while kernels do not, which means we can't use the same stride for both `run_dot` and a kernel. This discrepancy is preventing refactoring `run_dot` to capture the strides to pass to the kernels easily, which I think is a necessary step towards addressing some issues (packing A/B in the loops of `run_dot`).

PiperOrigin-RevId: 892702737
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant