We are planning to replace the underneath kernel implementation with the newly developed CK tile-programming fmha kernel. The performance is much better for MI200/MI300, especially for MI300 cases. After this is done, the current implementation in main branch will be deprecated.
We are planning to replace the underneath kernel implementation with the newly developed CK tile-programming fmha kernel. The performance is much better for MI200/MI300, especially for MI300 cases. After this is done, the current implementation in main branch will be deprecated.