Skip to content

ais-check: Add better HIP detection#216

Closed
derobins wants to merge 7 commits intodevelopfrom
derobins/ais-check_better_hip_detection
Closed

ais-check: Add better HIP detection#216
derobins wants to merge 7 commits intodevelopfrom
derobins/ais-check_better_hip_detection

Conversation

@derobins
Copy link
Copy Markdown
Collaborator

@derobins derobins commented Mar 13, 2026

Adds better HIP detection and output

AIHIPFILE-156

Copilot AI review requested due to automatic review settings March 13, 2026 20:02
@derobins derobins marked this pull request as draft March 13, 2026 20:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves ais-check’s HIP runtime detection by scanning common ROCm/HIP installation locations and reporting which discovered HIP runtimes support AIS-related symbols.

Changes:

  • Add filesystem-based discovery for libamdhip64.so across common ROCm/HIP paths.
  • Probe each discovered HIP runtime for AIS symbol availability and store per-library support state.
  • Print a per-HIP-library AIS support summary in non-quiet mode.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@derobins derobins marked this pull request as ready for review March 13, 2026 21:26
Copilot AI review requested due to automatic review settings March 13, 2026 21:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enhances tools/ais-check to improve HIP runtime detection for AMD Infinity Storage (AIS) and to provide more detailed output about which HIP runtime libraries were found and whether each supports AIS.

Changes:

  • Add filesystem/environment-based discovery of libamdhip64.so across multiple common ROCm install locations.
  • Probe AIS symbol availability across all discovered HIP runtime libraries and aggregate results.
  • Print a per-library AIS support summary in the non-quiet output.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +164 to +169
if err != 0:
if symbol_status.value != 1:
symbol = symbol.decode("utf-8")
err_str = hip.hipGetErrorString(err).decode("utf-8")
print(
f"hipGetProcAddress({symbol}) failed with err code"
Comment on lines +64 to +75
# 1. Respect runtime linker paths
for p in os.environ.get("LD_LIBRARY_PATH", "").split(":"):
if p:
# Clean up the path by removing `..`, etc. and getting
# an absolute path.
safe_p = os.path.abspath(os.path.normpath(p))

candidates.append(os.path.join(safe_p, "libamdhip64.so"))

# 2. Environment variables commonly set by ROCm or modules
for var in ("ROCM_HOME", "ROCM_PATH", "HIP_PATH"):
base = os.environ.get(var)
@kurtmcmillan
Copy link
Copy Markdown
Collaborator

I think that ais-check should return success if the current environment supports the fastpath. I think this change breaks that. If the current environment is using ROCm 7.0, but ROCm 7.2 is also installed, would this script return success? If so, I don't think it should.

@jordan-turbofish
Copy link
Copy Markdown
Collaborator

Played around with getting the absolute path when using find_library:

def dlinfo_path(library_name):
    class LinkMap(ctypes.Structure):
        _fields_ = [
            ("l_addr", ctypes.c_void_p),
            ("l_name", ctypes.c_char_p),
        ]

    dl_path = ctypes.util.find_library("dl")
    library_path = ctypes.util.find_library(library_name)
    if dl_path is None or library_path is None:
        return None

    try:
        libdl = ctypes.CDLL(dl_path)
        libwanted = ctypes.CDLL(library_path)
    except OSError:
        return None

    dlinfo = libdl.dlinfo
    dlinfo.argtypes = ctypes.c_void_p, ctypes.c_int, ctypes.c_void_p
    dlinfo.restype = ctypes.c_int

    lmptr = ctypes.POINTER(LinkMap)()
    err = dlinfo(libwanted._handle, 2, ctypes.byref(lmptr))
    if err != 0:
        return None

    abspath = lmptr.contents.l_name.decode('ascii')
    return abspath

print(f"amdhip64 path:{dlinfo_path(library_name='amdhip64')}")

Testing:

> python3 test.py
amdhip64 path:None
> LD_LIBRARY_PATH=/home/jpatters/hip-7.2.0-dbg/lib:/opt/rocm/lib python3 test.py
amdhip64 path:/home/jpatters/hip-7.2.0-dbg/lib/libamdhip64.so.7

@derobins derobins force-pushed the derobins/ais-check_better_hip_detection branch from 6662ca1 to 3715340 Compare April 1, 2026 21:34
Copilot AI review requested due to automatic review settings April 2, 2026 19:13
@derobins derobins force-pushed the derobins/ais-check_better_hip_detection branch from 3715340 to a891ab7 Compare April 2, 2026 19:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves ais-check’s HIP runtime detection by searching common ROCm install locations for libamdhip64.so, probing each candidate for AIS-related symbols, and reporting per-library AIS support to help diagnose mismatched/duplicate HIP runtimes.

Changes:

  • Add find_hip_runtimes() to discover candidate HIP runtime library paths from environment variables and standard ROCm locations.
  • Update HIP AIS probing to iterate over discovered libraries and track per-library AIS support.
  • Extend output to list found HIP libraries and whether each supports AIS.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +123 to +139
hipError_t = ctypes.c_int
hipDriverProcAddressQueryResult = ctypes.c_int

hip.hipGetProcAddress.argtypes = [
ctypes.c_char_p,
ctypes.POINTER(ctypes.c_void_p),
ctypes.c_int,
ctypes.c_uint64,
ctypes.POINTER(hipDriverProcAddressQueryResult),
]
hip.hipGetProcAddress.restype = hipError_t
hip.hipRuntimeGetVersion.argtypes = [ctypes.POINTER(ctypes.c_int)]
hip.hipRuntimeGetVersion.restype = hipError_t

hip.hipGetErrorString.argtypes = [hipError_t]
hip.hipGetErrorString.restype = ctypes.c_char_p
hip.hipGetProcAddress.argtypes = [
ctypes.c_char_p,
ctypes.POINTER(ctypes.c_void_p),
ctypes.c_int,
ctypes.c_uint64,
ctypes.POINTER(hipDriverProcAddressQueryResult),
]
hip.hipGetProcAddress.restype = hipError_t

version = ctypes.c_int()
err = hip.hipRuntimeGetVersion(ctypes.byref(version))
if err != 0:
err_str = hip.hipGetErrorString(err).decode("utf-8")
print(
f"hipRuntimeGetVersion failed with err code {err} ({err_str})",
file=sys.stderr,
)
return False

for symbol in [b"hipAmdFileWrite", b"hipAmdFileRead"]:
func_ptr = ctypes.c_void_p()
symbol_status = hipDriverProcAddressQueryResult()
err = hip.hipGetProcAddress(
symbol,
ctypes.byref(func_ptr),
version.value,
0,
ctypes.byref(symbol_status),
)
hip.hipGetErrorString.argtypes = [hipError_t]
hip.hipGetErrorString.restype = ctypes.c_char_p
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When probing each libamdhip64.so, the code assumes hipRuntimeGetVersion, hipGetProcAddress, and hipGetErrorString exist. If a candidate library loads but is missing one of these symbols, ctypes will raise AttributeError and the script will crash instead of treating that path as “not a HIP runtime”. Wrap the symbol lookups/argtypes setup in a try/except AttributeError (or use getattr checks) and continue to the next candidate.

Copilot uses AI. Check for mistakes.
Comment on lines +164 to +173
if err != 0:
if symbol_status.value != 1:
symbol = symbol.decode("utf-8")
err_str = hip.hipGetErrorString(err).decode("utf-8")
print(
f"hipGetProcAddress({symbol}) failed with err code"
f" {err} ({err_str}) and symbolStatus"
f" {symbol_status.value}",
file=sys.stderr,
)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AIS symbol check only treats err != 0 as a failure. In this repo’s C++ HIP wrapper (src/amd_detail/hip.cpp), hipGetProcAddress can succeed yet return a null pointer for an unavailable symbol (the wrapper checks for nullptr). In that case, this script would incorrectly mark the runtime as AIS-supported. After hipGetProcAddress, also verify the returned pointer (e.g., func_ptr.value) and/or symbol_status indicates the symbol is found before keeping supported = True.

Suggested change
if err != 0:
if symbol_status.value != 1:
symbol = symbol.decode("utf-8")
err_str = hip.hipGetErrorString(err).decode("utf-8")
print(
f"hipGetProcAddress({symbol}) failed with err code"
f" {err} ({err_str}) and symbolStatus"
f" {symbol_status.value}",
file=sys.stderr,
)
if err != 0 or func_ptr.value is None or symbol_status.value != 1:
symbol_name = symbol.decode("utf-8")
if err != 0:
err_str = hip.hipGetErrorString(err).decode("utf-8")
else:
err_str = "success"
print(
f"hipGetProcAddress({symbol_name}) failed with err code"
f" {err} ({err_str}), symbolStatus {symbol_status.value},"
f" func_ptr {func_ptr.value}",
file=sys.stderr,
)

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 2, 2026 20:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +84 to +92
# 3. Standard ROCm install paths
candidates += [
"/opt/rocm/lib/libamdhip64.so",
"/opt/rocm/lib64/libamdhip64.so",
]

# 4. Versioned installs (/opt/rocm-5.x, etc.)
for d in glob.glob("/opt/rocm*/lib*/libamdhip64.so"):
candidates.append(d)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find_hip_runtimes() only looks for an unversioned libamdhip64.so in a small set of locations (/opt/rocm*, LD_LIBRARY_PATH, ROCM_* / HIP_PATH). This drops the previous ctypes.util.find_library('amdhip64') behavior and can miss systems where the HIP runtime is discoverable via the default loader paths (e.g., /usr/lib*) or where only a versioned SONAME exists (e.g., libamdhip64.so.6 without the unversioned symlink). Consider adding a fallback candidate from ctypes.util.find_library('amdhip64') and/or globbing for libamdhip64.so.* in the searched directories to avoid false negatives.

Copilot uses AI. Check for mistakes.
@ammallya
Copy link
Copy Markdown
Collaborator

ammallya commented Apr 9, 2026

Imported to ROCm/rocm-systems pull 4862

@ammallya ammallya closed this Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants