Conversation
There was a problem hiding this comment.
Pull request overview
Improves ais-check’s HIP runtime detection by scanning common ROCm/HIP installation locations and reporting which discovered HIP runtimes support AIS-related symbols.
Changes:
- Add filesystem-based discovery for
libamdhip64.soacross common ROCm/HIP paths. - Probe each discovered HIP runtime for AIS symbol availability and store per-library support state.
- Print a per-HIP-library AIS support summary in non-quiet mode.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
Enhances tools/ais-check to improve HIP runtime detection for AMD Infinity Storage (AIS) and to provide more detailed output about which HIP runtime libraries were found and whether each supports AIS.
Changes:
- Add filesystem/environment-based discovery of
libamdhip64.soacross multiple common ROCm install locations. - Probe AIS symbol availability across all discovered HIP runtime libraries and aggregate results.
- Print a per-library AIS support summary in the non-quiet output.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| if err != 0: | ||
| if symbol_status.value != 1: | ||
| symbol = symbol.decode("utf-8") | ||
| err_str = hip.hipGetErrorString(err).decode("utf-8") | ||
| print( | ||
| f"hipGetProcAddress({symbol}) failed with err code" |
| # 1. Respect runtime linker paths | ||
| for p in os.environ.get("LD_LIBRARY_PATH", "").split(":"): | ||
| if p: | ||
| # Clean up the path by removing `..`, etc. and getting | ||
| # an absolute path. | ||
| safe_p = os.path.abspath(os.path.normpath(p)) | ||
|
|
||
| candidates.append(os.path.join(safe_p, "libamdhip64.so")) | ||
|
|
||
| # 2. Environment variables commonly set by ROCm or modules | ||
| for var in ("ROCM_HOME", "ROCM_PATH", "HIP_PATH"): | ||
| base = os.environ.get(var) |
|
I think that ais-check should return success if the current environment supports the fastpath. I think this change breaks that. If the current environment is using ROCm 7.0, but ROCm 7.2 is also installed, would this script return success? If so, I don't think it should. |
|
Played around with getting the absolute path when using find_library: Testing: |
6662ca1 to
3715340
Compare
3715340 to
a891ab7
Compare
There was a problem hiding this comment.
Pull request overview
This PR improves ais-check’s HIP runtime detection by searching common ROCm install locations for libamdhip64.so, probing each candidate for AIS-related symbols, and reporting per-library AIS support to help diagnose mismatched/duplicate HIP runtimes.
Changes:
- Add
find_hip_runtimes()to discover candidate HIP runtime library paths from environment variables and standard ROCm locations. - Update HIP AIS probing to iterate over discovered libraries and track per-library AIS support.
- Extend output to list found HIP libraries and whether each supports AIS.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| hipError_t = ctypes.c_int | ||
| hipDriverProcAddressQueryResult = ctypes.c_int | ||
|
|
||
| hip.hipGetProcAddress.argtypes = [ | ||
| ctypes.c_char_p, | ||
| ctypes.POINTER(ctypes.c_void_p), | ||
| ctypes.c_int, | ||
| ctypes.c_uint64, | ||
| ctypes.POINTER(hipDriverProcAddressQueryResult), | ||
| ] | ||
| hip.hipGetProcAddress.restype = hipError_t | ||
| hip.hipRuntimeGetVersion.argtypes = [ctypes.POINTER(ctypes.c_int)] | ||
| hip.hipRuntimeGetVersion.restype = hipError_t | ||
|
|
||
| hip.hipGetErrorString.argtypes = [hipError_t] | ||
| hip.hipGetErrorString.restype = ctypes.c_char_p | ||
| hip.hipGetProcAddress.argtypes = [ | ||
| ctypes.c_char_p, | ||
| ctypes.POINTER(ctypes.c_void_p), | ||
| ctypes.c_int, | ||
| ctypes.c_uint64, | ||
| ctypes.POINTER(hipDriverProcAddressQueryResult), | ||
| ] | ||
| hip.hipGetProcAddress.restype = hipError_t | ||
|
|
||
| version = ctypes.c_int() | ||
| err = hip.hipRuntimeGetVersion(ctypes.byref(version)) | ||
| if err != 0: | ||
| err_str = hip.hipGetErrorString(err).decode("utf-8") | ||
| print( | ||
| f"hipRuntimeGetVersion failed with err code {err} ({err_str})", | ||
| file=sys.stderr, | ||
| ) | ||
| return False | ||
|
|
||
| for symbol in [b"hipAmdFileWrite", b"hipAmdFileRead"]: | ||
| func_ptr = ctypes.c_void_p() | ||
| symbol_status = hipDriverProcAddressQueryResult() | ||
| err = hip.hipGetProcAddress( | ||
| symbol, | ||
| ctypes.byref(func_ptr), | ||
| version.value, | ||
| 0, | ||
| ctypes.byref(symbol_status), | ||
| ) | ||
| hip.hipGetErrorString.argtypes = [hipError_t] | ||
| hip.hipGetErrorString.restype = ctypes.c_char_p |
There was a problem hiding this comment.
When probing each libamdhip64.so, the code assumes hipRuntimeGetVersion, hipGetProcAddress, and hipGetErrorString exist. If a candidate library loads but is missing one of these symbols, ctypes will raise AttributeError and the script will crash instead of treating that path as “not a HIP runtime”. Wrap the symbol lookups/argtypes setup in a try/except AttributeError (or use getattr checks) and continue to the next candidate.
| if err != 0: | ||
| if symbol_status.value != 1: | ||
| symbol = symbol.decode("utf-8") | ||
| err_str = hip.hipGetErrorString(err).decode("utf-8") | ||
| print( | ||
| f"hipGetProcAddress({symbol}) failed with err code" | ||
| f" {err} ({err_str}) and symbolStatus" | ||
| f" {symbol_status.value}", | ||
| file=sys.stderr, | ||
| ) |
There was a problem hiding this comment.
The AIS symbol check only treats err != 0 as a failure. In this repo’s C++ HIP wrapper (src/amd_detail/hip.cpp), hipGetProcAddress can succeed yet return a null pointer for an unavailable symbol (the wrapper checks for nullptr). In that case, this script would incorrectly mark the runtime as AIS-supported. After hipGetProcAddress, also verify the returned pointer (e.g., func_ptr.value) and/or symbol_status indicates the symbol is found before keeping supported = True.
| if err != 0: | |
| if symbol_status.value != 1: | |
| symbol = symbol.decode("utf-8") | |
| err_str = hip.hipGetErrorString(err).decode("utf-8") | |
| print( | |
| f"hipGetProcAddress({symbol}) failed with err code" | |
| f" {err} ({err_str}) and symbolStatus" | |
| f" {symbol_status.value}", | |
| file=sys.stderr, | |
| ) | |
| if err != 0 or func_ptr.value is None or symbol_status.value != 1: | |
| symbol_name = symbol.decode("utf-8") | |
| if err != 0: | |
| err_str = hip.hipGetErrorString(err).decode("utf-8") | |
| else: | |
| err_str = "success" | |
| print( | |
| f"hipGetProcAddress({symbol_name}) failed with err code" | |
| f" {err} ({err_str}), symbolStatus {symbol_status.value}," | |
| f" func_ptr {func_ptr.value}", | |
| file=sys.stderr, | |
| ) |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # 3. Standard ROCm install paths | ||
| candidates += [ | ||
| "/opt/rocm/lib/libamdhip64.so", | ||
| "/opt/rocm/lib64/libamdhip64.so", | ||
| ] | ||
|
|
||
| # 4. Versioned installs (/opt/rocm-5.x, etc.) | ||
| for d in glob.glob("/opt/rocm*/lib*/libamdhip64.so"): | ||
| candidates.append(d) |
There was a problem hiding this comment.
find_hip_runtimes() only looks for an unversioned libamdhip64.so in a small set of locations (/opt/rocm*, LD_LIBRARY_PATH, ROCM_* / HIP_PATH). This drops the previous ctypes.util.find_library('amdhip64') behavior and can miss systems where the HIP runtime is discoverable via the default loader paths (e.g., /usr/lib*) or where only a versioned SONAME exists (e.g., libamdhip64.so.6 without the unversioned symlink). Consider adding a fallback candidate from ctypes.util.find_library('amdhip64') and/or globbing for libamdhip64.so.* in the searched directories to avoid false negatives.
|
Imported to ROCm/rocm-systems pull 4862 |
Adds better HIP detection and output
AIHIPFILE-156