I was able to build the flashmm package and install it for "compute_70" without any error, but, when running the test, an error occurred:
$ python3 test_flash_mm.py
max diff for mm block: tensor(2.3842e-05, device='cuda:0', grad_fn=)
average diff for mm block: tensor(1.7822e-06, device='cuda:0', grad_fn=)
Traceback (most recent call last):
File "test_flash_mm.py", line 182, in
print("max diff:", diff[argmax_diff])
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
Flashmm works on another machine having a GPU with compute 8.6, but it seems not to work on sm_70.
Is it expected?
CONDA ENVIRONMENT:
PyTorch Version: 2.1.0+cu121
PyTorch CUDA version: 12.1
PyTorch arch_list: ['sm_50', 'sm_60', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'sm_90']
PyTorch CUDNN: True 8902
GPU PyTorch Logical Name 0 : Tesla V100S-PCIE-32GB
Capability: (7, 0)
Total memory: 34072559616
Another test with CUDA_LAUNCH_BLOCKING=1 is more descriptive:
$ CUDA_LAUNCH_BLOCKING=1 CUDA_VISIBLE_DEVICES=0 python3 test_flash_mm.py
max diff for mm block: tensor(2.3842e-05, device='cuda:0', grad_fn=)
average diff for mm block: tensor(1.7822e-06, device='cuda:0', grad_fn=)
Traceback (most recent call last):
File "test_flash_mm.py", line 176, in
out = fast_hyena_filter(
File "test_flash_mm.py", line 112, in fast_hyena_filter
k = hyena_filter_fwd(
RuntimeError: CUDA error: an illegal memory access was encountered
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
Many thanks for you help.
I was able to build the flashmm package and install it for "compute_70" without any error, but, when running the test, an error occurred:
$ python3 test_flash_mm.py
max diff for mm block: tensor(2.3842e-05, device='cuda:0', grad_fn=)
average diff for mm block: tensor(1.7822e-06, device='cuda:0', grad_fn=)
Traceback (most recent call last):
File "test_flash_mm.py", line 182, in
print("max diff:", diff[argmax_diff])
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with
TORCH_USE_CUDA_DSAto enable device-side assertions.Flashmm works on another machine having a GPU with compute 8.6, but it seems not to work on sm_70.
Is it expected?
CONDA ENVIRONMENT:
PyTorch Version: 2.1.0+cu121
PyTorch CUDA version: 12.1
PyTorch arch_list: ['sm_50', 'sm_60', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'sm_90']
PyTorch CUDNN: True 8902
GPU PyTorch Logical Name 0 : Tesla V100S-PCIE-32GB
Capability: (7, 0)
Total memory: 34072559616
Another test with CUDA_LAUNCH_BLOCKING=1 is more descriptive:
$ CUDA_LAUNCH_BLOCKING=1 CUDA_VISIBLE_DEVICES=0 python3 test_flash_mm.py
max diff for mm block: tensor(2.3842e-05, device='cuda:0', grad_fn=)
average diff for mm block: tensor(1.7822e-06, device='cuda:0', grad_fn=)
Traceback (most recent call last):
File "test_flash_mm.py", line 176, in
out = fast_hyena_filter(
File "test_flash_mm.py", line 112, in fast_hyena_filter
k = hyena_filter_fwd(
RuntimeError: CUDA error: an illegal memory access was encountered
Compile with
TORCH_USE_CUDA_DSAto enable device-side assertions.Many thanks for you help.