Hi all, could you kindly introduce the difference between auto-tensorize and auto-tensorize-v4 ? from the observation of amos-gemm benchmarking, the performance of this two strategies is quite resemblance
| M |
K |
N |
amos-1000-step-fp16-simple(ms) |
amos-1000-step-fp16(ms) |
| 2 |
2 |
2 |
Failed to Run |
Failed to Run |
| 4 |
4 |
4 |
Failed to Run |
Failed to Run |
| 8 |
8 |
8 |
Failed to Run |
Failed to Run |
| 16 |
16 |
16 |
0.004545906 |
0.003936828 |
| 32 |
32 |
32 |
0.004610093 |
0.004310548 |
| 64 |
64 |
64 |
0.004638971 |
0.004614832 |
| 128 |
128 |
128 |
0.005128772 |
0.005059945 |
| 256 |
256 |
256 |
0.006975747 |
0.007367229 |
| 512 |
512 |
512 |
0.018055338 |
0.016287096 |
| 1024 |
1024 |
1024 |
0.066839093 |
0.071785023 |
| 2048 |
2048 |
2048 |
0.382059749 |
0.336489417 |
| 4096 |
4096 |
4096 |
2.00519422 |
2.252330443 |
| 8192 |
8192 |
8192 |
21.62599663 |
18.10944683 |
| 16384 |
16384 |
16384 |
111.4660256 |
132.6751751 |
Hi all, could you kindly introduce the difference between auto-tensorize and auto-tensorize-v4 ? from the observation of amos-gemm benchmarking, the performance of this two strategies is quite resemblance