-
Notifications
You must be signed in to change notification settings - Fork 240
cuda.core.system: Better checks for when we expect APIs to be unsupported #1510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
7cddcb4 to
2df0c31
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR improves test coverage for NVML APIs in cuda.core.system by implementing better checks for when APIs are expected to be unsupported based on device architecture. The changes replace blanket exception catching with architecture-aware validation using a new unsupported_before context manager.
Changes:
- Introduced
unsupported_beforecontext manager to conditionally skip or assert on API support based on device architecture - Renamed
DeviceArchitectureclass toDeviceArchto align with the underlying NVML enum - Simplified test logic by removing complex skip reason collection and replacing it with the new context manager pattern
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 35 comments.
Show a summary per file
| File | Description |
|---|---|
| cuda_core/tests/system/conftest.py | Added unsupported_before context manager for architecture-aware test skipping |
| cuda_bindings/tests/nvml/conftest.py | Added unsupported_before context manager for NVML bindings tests |
| cuda_core/tests/system/test_system_device.py | Refactored tests to use unsupported_before and updated to use DeviceArch |
| cuda_bindings/tests/nvml/test_pynvml.py | Updated tests to use unsupported_before |
| cuda_bindings/tests/nvml/test_gpu.py | Updated tests to use unsupported_before |
| cuda_bindings/tests/nvml/test_compute_mode.py | Updated tests to use unsupported_before |
| cuda_core/cuda/core/system/_device.pyx | Renamed DeviceArchitecture to DeviceArch and moved functionality to use NVML enum directly |
| cuda_core/docs/source/api.rst | Updated documentation to reference DeviceArch instead of DeviceArchitecture |
Comments suppressed due to low confidence (1)
cuda_core/cuda/core/system/_device.pyx:1050
- The docstring still references the old class name "DeviceArchitecture" instead of the new "DeviceArch". The documentation should be updated to match the renamed class.
"""
return DeviceEvents(self._handle, events)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
/ok to test |
|
|
/ok to test |
|
/ok to test |
|
/ok to test |
|
/ok to test |
| return pci_info | ||
|
|
||
|
|
||
| @contextmanager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The below came from Cursor (after discarding my cuda.bindings.tests.helpers idea I posted offline).
The idea is that you can give this to Cursor on your end to play with the options.
Suggestion: Deduplicating unsupported_before
I noticed we now have two nearly identical unsupported_before context managers:
cuda_bindings/tests/nvml/conftest.pycuda_core/tests/system/conftest.py
The logic is the same, but they use different API surfaces:
| Aspect | nvml conftest | cuda.core conftest |
|---|---|---|
| Get arch | nvml.device_get_architecture(device) |
device.arch |
| Get name | nvml.device_get_name(device) |
device.name |
| Arch enum | nvml.DeviceArch |
system.DeviceArch |
| Exception | nvml.NotSupportedError |
system.NotSupportedError |
Options to consider
Option 1: Factory pattern in cuda_python_test_helpers (recommended)
Keep cuda_python_test_helpers CUDA-agnostic but share the logic via dependency injection:
# cuda_python_test_helpers/__init__.py
from contextlib import contextmanager
def make_unsupported_before(*, get_arch, get_name, arch_enum, not_supported_error):
"""Factory to create an unsupported_before context manager with injected dependencies."""
@contextmanager
def unsupported_before(device, expected_device_arch):
device_arch = get_arch(device)
if isinstance(expected_device_arch, arch_enum):
expected_device_arch_int = int(expected_device_arch)
elif expected_device_arch == "FERMI":
expected_device_arch_int = 1
else:
expected_device_arch_int = 0
if (
expected_device_arch is None
or expected_device_arch == "HAS_INFOROM"
or device_arch == arch_enum.UNKNOWN
):
try:
yield
except not_supported_error:
import pytest
pytest.skip(
f"Unsupported call for device architecture {arch_enum(device_arch).name} "
f"on device '{get_name(device)}'"
)
elif int(device_arch) < expected_device_arch_int:
import pytest
with pytest.raises(not_supported_error):
yield
pytest.skip(f"Unsupported before {expected_device_arch.name}, got {get_name(device)}")
else:
yield
return unsupported_beforeThen in each conftest:
# cuda_bindings/tests/nvml/conftest.py
from cuda_python_test_helpers import make_unsupported_before
from cuda.bindings import _nvml as nvml
unsupported_before = make_unsupported_before(
get_arch=nvml.device_get_architecture,
get_name=nvml.device_get_name,
arch_enum=nvml.DeviceArch,
not_supported_error=nvml.NotSupportedError,
)# cuda_core/tests/system/conftest.py
from cuda_python_test_helpers import make_unsupported_before
from cuda.core import system
unsupported_before = make_unsupported_before(
get_arch=lambda d: d.arch,
get_name=lambda d: d.name,
arch_enum=system.DeviceArch,
not_supported_error=system.NotSupportedError,
)Pros: Single source of truth for the logic, cuda_python_test_helpers stays CUDA-agnostic.
Cons: Slightly more indirection.
Option 2: Let cuda_python_test_helpers depend on cuda-bindings
Provide a single nvml-based implementation. cuda.core tests would extract device handles.
Pros: Simpler API.
Cons: Reverses the direction of commit 6afdd5c; ties test helpers to CUDA packages.
Option 3: Accept the duplication
~40 lines duplicated in 2 files. The implementations use different abstraction layers, so some might argue they're legitimately different.
Pros: No new abstractions.
Cons: Bug fixes / enhancements need to be applied twice.
I'd lean toward Option 1 since it keeps the CUDA-agnostic design while eliminating the duplication. Happy to pair on this if you want to explore it further.
Prior to this PR, unit tests in
cuda.core.systemwould just always skip the test if an API returned anNotSupportedError. This meant that a test could be skipped even when the documentation suggests that it should in fact work. This PR makes it so such a skip would only be acceptable if the API is documented as not being supported for a given architecture.There is some nuance, though. Some of these APIs are still failing as not supported even when the documentation says they should be. This may be because some other aspect of the device (not just the architecture) makes it unsupported. Unfortunately, the NVML headers don't offer any clues there, but at least now we know where all of these cases are (because they are noted with
unsupported_before(device, None), and we can follow up by filing documentation bugs with NVML and refining the criteria by which tests are skipped even further.This also removes the behavior of some tests where it would collect different skip reasons for each device and report all of them. This added a lot of complexity for little benefit -- it is probably very rare that a system would have two devices of different architectures, especially on systems that we would use for testing -- so I just took all of that out to simplify things.