Use runs-on GPU runners for CI by dfalbel · Pull Request #212 · mlverse/cuda.ml

dfalbel · 2026-04-28T17:46:32Z

Summary

Replace [self-hosted, gpu] runners with runs-on g4dn.xlarge spot instances (same approach as Use runs-on GPU runners for CI torch#1439)
Update all action versions to latest (checkout@v4, setup-python@v5, setup-r@v2, setup-pandoc@v2, setup-r-dependencies@v2, upload-artifact@v4)
Fix deprecated ::set-output → $GITHUB_OUTPUT
Update container base from ubuntu18.04 (EOL) → ubuntu20.04
Add --runtime=nvidia and concurrency groups with cancel-in-progress
Simplify matrix to single config: CUDA 11.2.1, cuML 21.12, R release (down from 12 jobs)
Drop ASAN matrix dimension (can be re-added as a scheduled workflow later)

Test plan

Verify the GPU runner spins up and the CUDA container starts
Verify nvidia-smi / GPU access works inside the container
Verify R CMD check passes with cuML 21.12

Replace self-hosted GPU runners with runs-on g4dn.xlarge spot instances, matching the approach in mlverse/torch#1439. Also modernizes the workflow: - Action versions: checkout@v4, setup-python@v5, setup-r@v2, etc. - Fix deprecated ::set-output → $GITHUB_OUTPUT - Container: ubuntu18.04 → ubuntu20.04 (18.04 is EOL) - Add --runtime=nvidia to container options - Add concurrency groups with cancel-in-progress - Simplify matrix to single config (CUDA 11.2.1, cuML 21.12, R release) - Drop ASAN matrix dimension

- Build Docker image with cuda.ml pre-installed on ubuntu-latest (free) - Run tests on runs-on g4dn.xlarge GPU runner using the pre-built image - Add .github/docker/Dockerfile following the same pattern as mlverse/torch - Make CMAKE_CUDA_ARCHITECTURES configurable via env var (defaults to NATIVE) so cross-compilation works on runners without a GPU (targets T4 = SM 75) - Remove miniconda install (no longer needed for reticulate tests)

The 'sklearn' PyPI package is deprecated in favor of 'scikit-learn'. Also switch from py_install() to py_require() which is the modern reticulate API for declaring Python dependencies.

- Move download_libcuml() before normalizePath() so the directory exists - Reference CUML_STUB_HEADERS_DIR in both Treelite found/not-found branches so cmake doesn't warn about unused variable

SVD components are only defined up to sign, so different implementations can produce sign-flipped vectors that are mathematically equivalent. Align signs before comparing components and transformed data.

Modern sklearn strictly validates that max_iter is an int. R's default numeric type is double, which reticulate passes as a Python float. Using 10000L ensures it's passed as a Python int.

Runs R CMD check --as-cran on ubuntu-latest with R release and devel. No nvcc/CUDA available, so the package builds with stub headers — matching what CRAN would see.

- Escape literal braces in roxygen comments across R source files and templates (e.g. {cuda.ml} -> \{cuda.ml\}, {"opt1",...} -> \{"opt1",...\}) - Regenerate all affected Rd files via devtools::document() - Skip test_check() when cuML is not linked (CRAN-like environments) - Use R CMD check directly in CRAN job (avoids rcmdcheck NOT_CRAN=true)

- Revert brace escaping inside @examples blocks (R code, not Rd markup) - Define cuda_ml_can_predict_class_probabilities methods as proper functions so roxygen registers them as S3method() in NAMESPACE

Build infrastructure: - Dockerfile: CUDA 12.8.1 + Ubuntu 22.04 base image - libcuml_versions.R: add 26.04 entry pointing to PyPI libcuml-cu12 wheel - cuml.R: handle pip wheel extraction (lib64/ layout, .whl extension) - configure.R: handle lib64/ vs lib/ for pip wheels - CMakeLists.txt.in: C++17, rapids-cmake branch-26.04 - Workflow: target cuML 26.04 C++ API changes for cuML 26.04: - svm_serde.h: namespace alias MLCommon::Matrix -> ML::matrix for KernelParams and KernelType (header renamed kernelparams.h -> kernel_params.hpp) - fil.cu, fil_utils.h, fil_utils.cu: disable FIL on 26.04 with stubs (fil.h replaced by modular headers; full adaptation TODO) - random_projection.cu: disable on 26.04 with stubs (C++ API removed) - knn.cu: disable on 26.04 with stubs (raft::spatial::knn types removed) - random_forest_classifier.cu, random_forest_regressor.cu: guard FIL prediction paths for 26.04 Backward compatible: cuML 21.x with CUDA 11 still works.

- Dockerfile: accept CUDA_IMAGE as build arg for different base images - Workflow: matrix over cuML 21.12 (CUDA 11.2) and 26.04 (CUDA 12.8) - Each version gets its own build-image and test-gpu job

- CMakeLists.txt.in: template RAPIDS_CMAKE_TAG and CMAKE_CXX_STANDARD so they adapt to the cuML version being built against - configure.R: set rapids-cmake tag (v26.04.00 for 26.x, branch-21.10 for 21.x) and C++ standard (17 for 26.x, 14 for 21.x) - cuml.R: don't create premature lib symlink in download_libcuml()

Use vYY.MM.00 for cuML >= 23.02 (stable tags), vYY.MM.00a for older versions (only alpha tags available).

rapids-cmake v26.04 needs cmake >= 3.30.4. The existing auto-download logic handles this, but the min version threshold was hardcoded to 3.21.1. Now it's 3.30.4 for cuML >= 23.02, 3.21.1 for older versions.

- Download libraft-cu12 and librmm-cu12 wheels alongside libcuml-cu12 (cuml headers include raft/rmm headers which are in separate packages) - Merge raft/rmm headers into libcuml/include/ during download - Remove static_assert(CUML_VERSION_MAJOR == 21) — allow 26+ - Guard raft::mr::device::allocator (removed in raft 26.x) with version conditionals in device_allocator.cu/.h and stream_allocator.cu - Use raft/core/handle.hpp instead of raft/handle.hpp for v26+

- Add tools/config/utils/pypi.R with resolve_native_deps() that walks the PyPI dependency tree for a package and returns download URLs for all native C++ dependencies (libraft, librmm, rapids-logger, cccl, etc.) - libcuml_versions.R: cuML 26.04 entry is now just "libcuml-cu12" (the PyPI package name), not a hardcoded URL - cuml.R: download_libcuml() detects PyPI package names vs direct URLs, resolves the full dep tree, downloads all wheels, and merges their include/ directories into libcuml/include/ - configure.R: load pypi.R utility - Uses jsonlite for PyPI JSON API parsing

RMM 26.04 headers require CCCL >= 3.3 at compile time, but CCCL is not a pip dependency (it's normally bundled with the CUDA toolkit). CUDA 12.x ships CCCL 2.x which is too old. Download CCCL v3.3.0 from GitHub releases (header-only, ~2MB) and merge into libcuml/include/. Also handle pip wheels that extract to nested dirs like nvidia/<subpackage>/include/.

CCCL 3.3 headers (bundled in libcuml/include/) must take precedence over the CUDA 12 toolkit's older CCCL 2.x headers. Swap include order so cuml/raft/rmm/cccl headers are found first.

- Use RAPIDS-pinned CCCL commit (CUDA 12 compatible) instead of v3.3.0 release tag which includes CUDA 13-only code - pinned_host_vector.h: guard thrust::cuda::experimental::pinned_allocator (removed in CCCL 3.x); use plain host_vector on v26+ - handle_utils.cu: raft::handle_t no longer has set_stream(); reconstruct with stream_view via constructor on v26+

cuML 26.04's rmm headers require CCCL >= 3.3 which conflicts with CUDA 12.x toolkit's CCCL 2.x. cuML 25.12 vendors its own CCCL in librmm/include/rapids/ and has no CCCL version check — clean CUDA 12 compatibility. - Target cuML 25.12 instead of 26.04 - Version guards: >= 26 -> >= 25 (same API changes apply) - Re-enable KNN (knn.hpp exists in 25.12 with same API) - Remove CCCL GitHub download (not needed) - Update PyPI resolver to handle version pins (==25.12.*)

RMM headers require this define (normally set automatically by RMM's cmake config, but we're using headers directly from the pip wheel).

All RAPIDS 25.x+ pip wheels require CCCL 3.x headers which are incompatible with CUDA 12's bundled CCCL 2.x. No version of libcuml-cu12 can be compiled against a stock CUDA 12 toolkit. Revert to cuML 21.12 as the default for now. Supporting newer cuML will require either CUDA 13 or a custom build environment.

dfalbel and others added 28 commits April 28, 2026 14:46

Revert container to ubuntu18.04 for CUDA 11.2 compatibility

69013df

Use CUDA 11.2.2 container (11.2.1 removed from Docker Hub)

f5aa3ac

Bump container to ubuntu20.04 (18.04 glibc too old for Node 20 actions)

53d2f8f

Fix sklearn install: use scikit-learn package name and py_require()

93d564a

The 'sklearn' PyPI package is deprecated in favor of 'scikit-learn'. Also switch from py_install() to py_require() which is the modern reticulate API for declaring Python dependencies.

Fix configure warnings: normalizePath ordering and cmake unused variable

cc90bdf

- Move download_libcuml() before normalizePath() so the directory exists - Reference CUML_STUB_HEADERS_DIR in both Treelite found/not-found branches so cmake doesn't warn about unused variable

Fix TSVD tests for SVD sign ambiguity between cuML and sklearn

843cc4e

SVD components are only defined up to sign, so different implementations can produce sign-flipped vectors that are mathematically equivalent. Align signs before comparing components and transformed data.

Fix sklearn max_iter type: use integer (10000L) not float (10000.0)

f626979

Modern sklearn strictly validates that max_iter is an int. R's default numeric type is double, which reticulate passes as a Python float. Using 10000L ensures it's passed as a Python int.

Add CRAN-like check job (no CUDA, stub headers, ubuntu-latest)

0203f78

Runs R CMD check --as-cran on ubuntu-latest with R release and devel. No nvcc/CUDA available, so the package builds with stub headers — matching what CRAN would see.

Update roxygen

a65f660

export S3 methods

ee3b9a4

roxygen updates

0a55c04

Fix examples brace escaping and register S3 methods

15c988a

- Revert brace escaping inside @examples blocks (R code, not Rd markup) - Define cuda_ml_can_predict_class_probabilities methods as proper functions so roxygen registers them as S3method() in NAMESPACE

Test both cuML 21.12 and 26.04 in CI

135d18a

- Dockerfile: accept CUDA_IMAGE as build arg for different base images - Workflow: matrix over cuML 21.12 (CUDA 11.2) and 26.04 (CUDA 12.8) - Each version gets its own build-image and test-gpu job

Derive rapids-cmake tag from cuML version instead of hardcoding

c8bd6ed

Use vYY.MM.00 for cuML >= 23.02 (stable tags), vYY.MM.00a for older versions (only alpha tags available).

Require cmake 3.30.4+ for cuML 26.04 (auto-downloaded if missing)

1591b2b

rapids-cmake v26.04 needs cmake >= 3.30.4. The existing auto-download logic handles this, but the min version threshold was hardcoded to 3.21.1. Now it's 3.30.4 for cuML >= 23.02, 3.21.1 for older versions.

Put CUML_INCLUDE_DIR before CUDA toolkit includes

bb27ca1

CCCL 3.3 headers (bundled in libcuml/include/) must take precedence over the CUDA 12 toolkit's older CCCL 2.x headers. Swap include order so cuml/raft/rmm/cccl headers are found first.

Define LIBCUDACXX_ENABLE_EXPERIMENTAL_MEMORY_RESOURCE for RMM

473db2a

RMM headers require this define (normally set automatically by RMM's cmake config, but we're using headers directly from the pip wheel).

dfalbel merged commit 8070dec into main Apr 29, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use runs-on GPU runners for CI#212

Use runs-on GPU runners for CI#212
dfalbel merged 28 commits intomainfrom
ci/runs-on-gpu-runners

dfalbel commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dfalbel commented Apr 28, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants