helion.Config(block_sizes=[32, 64, 16], indexing=['pointer', 'pointer', 'tensor_descriptor', 'pointer', 'pointer', 'pointer'], l2_groupings=[64], load_eviction_policies=['last', 'last', '', 'last'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=8, num_stages=4, num_warps=2, pid_type='persistent_interleaved', range_flattens=[None, None], range_multi_buffers=[True, False], range_unroll_factors=[0, 0], range_warp_specializes=[])
We would get mis-aligned address .
And avoid TMA indeed fixes the issue.
for the kernel added in #1823 , if we force the config to be:
We would get mis-aligned address .
Claude said
And avoid TMA indeed fixes the issue.
I'm confirming with symm-mem folks