Skip to content

[Issue]: Test failing with ROCm 6.3.1 on MI250X #120

@al-rigazzi

Description

@al-rigazzi

Problem Description

I have built flash-attention in a fresh environment with ROCm 6.3.1, running on MI250X, and I am confused by the test results.

I believe that the test file to be used is tests/test_flash_attn_ck.py, as the in the non-ck one, a very large portion of the tests fails.

Nevertheless, this is the output of pytest tests/test_flash_attn_ck.py:

FAILED tests/test_flash_attn_ck.py::test_flash_attn_bwd_overflow[5-16-False-dtype0] - AssertionError: assert 0.0750732421875 <= ((5 * 0.01171875) + 0.001)

I have two questions:

  1. is it normal for this test to fail?
  2. I see that, w.r.t. the standard test_flash_attn.py tests, the tolerance has been raised from a factor 2 to a factor 10, mentioning that bwd needs to be fixed. Does this impact the performances of the library, when used in production?

Operating System

SLES 15-SP5

CPU

AMD EPYC 7A53 64-Core Processor

GPU

AMD Instinct MI250X

ROCm Version

ROCm 6.3.0

ROCm Component

No response

Steps to Reproduce

Torch was installed with

python3 -m pip install --no-cache-dir --pre torch==2.7.0.dev20250128+rocm6.3 --index-url https://download.pytorch.org/whl/nightly/rocm6.3

and repo is at

22c0358 (HEAD -> main, tag: v2.7.3-cktile, origin/main, origin/HEAD)

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions