Skip to content

Use runs-on GPU runners for CI#212

Merged
dfalbel merged 28 commits intomainfrom
ci/runs-on-gpu-runners
Apr 29, 2026
Merged

Use runs-on GPU runners for CI#212
dfalbel merged 28 commits intomainfrom
ci/runs-on-gpu-runners

Conversation

@dfalbel
Copy link
Copy Markdown
Member

@dfalbel dfalbel commented Apr 28, 2026

Summary

  • Replace [self-hosted, gpu] runners with runs-on g4dn.xlarge spot instances (same approach as Use runs-on GPU runners for CI torch#1439)
  • Update all action versions to latest (checkout@v4, setup-python@v5, setup-r@v2, setup-pandoc@v2, setup-r-dependencies@v2, upload-artifact@v4)
  • Fix deprecated ::set-output$GITHUB_OUTPUT
  • Update container base from ubuntu18.04 (EOL) → ubuntu20.04
  • Add --runtime=nvidia and concurrency groups with cancel-in-progress
  • Simplify matrix to single config: CUDA 11.2.1, cuML 21.12, R release (down from 12 jobs)
  • Drop ASAN matrix dimension (can be re-added as a scheduled workflow later)

Test plan

  • Verify the GPU runner spins up and the CUDA container starts
  • Verify nvidia-smi / GPU access works inside the container
  • Verify R CMD check passes with cuML 21.12

dfalbel and others added 28 commits April 28, 2026 14:46
Replace self-hosted GPU runners with runs-on g4dn.xlarge spot instances,
matching the approach in mlverse/torch#1439. Also modernizes the workflow:

- Action versions: checkout@v4, setup-python@v5, setup-r@v2, etc.
- Fix deprecated ::set-output → $GITHUB_OUTPUT
- Container: ubuntu18.04 → ubuntu20.04 (18.04 is EOL)
- Add --runtime=nvidia to container options
- Add concurrency groups with cancel-in-progress
- Simplify matrix to single config (CUDA 11.2.1, cuML 21.12, R release)
- Drop ASAN matrix dimension
- Build Docker image with cuda.ml pre-installed on ubuntu-latest (free)
- Run tests on runs-on g4dn.xlarge GPU runner using the pre-built image
- Add .github/docker/Dockerfile following the same pattern as mlverse/torch
- Make CMAKE_CUDA_ARCHITECTURES configurable via env var (defaults to NATIVE)
  so cross-compilation works on runners without a GPU (targets T4 = SM 75)
- Remove miniconda install (no longer needed for reticulate tests)
The 'sklearn' PyPI package is deprecated in favor of 'scikit-learn'.
Also switch from py_install() to py_require() which is the modern
reticulate API for declaring Python dependencies.
- Move download_libcuml() before normalizePath() so the directory exists
- Reference CUML_STUB_HEADERS_DIR in both Treelite found/not-found branches
  so cmake doesn't warn about unused variable
SVD components are only defined up to sign, so different implementations
can produce sign-flipped vectors that are mathematically equivalent.
Align signs before comparing components and transformed data.
Modern sklearn strictly validates that max_iter is an int. R's default
numeric type is double, which reticulate passes as a Python float.
Using 10000L ensures it's passed as a Python int.
Runs R CMD check --as-cran on ubuntu-latest with R release and devel.
No nvcc/CUDA available, so the package builds with stub headers — matching
what CRAN would see.
- Escape literal braces in roxygen comments across R source files and
  templates (e.g. {cuda.ml} -> \{cuda.ml\}, {"opt1",...} -> \{"opt1",...\})
- Regenerate all affected Rd files via devtools::document()
- Skip test_check() when cuML is not linked (CRAN-like environments)
- Use R CMD check directly in CRAN job (avoids rcmdcheck NOT_CRAN=true)
- Revert brace escaping inside @examples blocks (R code, not Rd markup)
- Define cuda_ml_can_predict_class_probabilities methods as proper
  functions so roxygen registers them as S3method() in NAMESPACE
Build infrastructure:
- Dockerfile: CUDA 12.8.1 + Ubuntu 22.04 base image
- libcuml_versions.R: add 26.04 entry pointing to PyPI libcuml-cu12 wheel
- cuml.R: handle pip wheel extraction (lib64/ layout, .whl extension)
- configure.R: handle lib64/ vs lib/ for pip wheels
- CMakeLists.txt.in: C++17, rapids-cmake branch-26.04
- Workflow: target cuML 26.04

C++ API changes for cuML 26.04:
- svm_serde.h: namespace alias MLCommon::Matrix -> ML::matrix for
  KernelParams and KernelType (header renamed kernelparams.h ->
  kernel_params.hpp)
- fil.cu, fil_utils.h, fil_utils.cu: disable FIL on 26.04 with stubs
  (fil.h replaced by modular headers; full adaptation TODO)
- random_projection.cu: disable on 26.04 with stubs (C++ API removed)
- knn.cu: disable on 26.04 with stubs (raft::spatial::knn types removed)
- random_forest_classifier.cu, random_forest_regressor.cu: guard FIL
  prediction paths for 26.04

Backward compatible: cuML 21.x with CUDA 11 still works.
- Dockerfile: accept CUDA_IMAGE as build arg for different base images
- Workflow: matrix over cuML 21.12 (CUDA 11.2) and 26.04 (CUDA 12.8)
- Each version gets its own build-image and test-gpu job
- CMakeLists.txt.in: template RAPIDS_CMAKE_TAG and CMAKE_CXX_STANDARD
  so they adapt to the cuML version being built against
- configure.R: set rapids-cmake tag (v26.04.00 for 26.x, branch-21.10
  for 21.x) and C++ standard (17 for 26.x, 14 for 21.x)
- cuml.R: don't create premature lib symlink in download_libcuml()
Use vYY.MM.00 for cuML >= 23.02 (stable tags), vYY.MM.00a for older
versions (only alpha tags available).
rapids-cmake v26.04 needs cmake >= 3.30.4. The existing auto-download
logic handles this, but the min version threshold was hardcoded to 3.21.1.
Now it's 3.30.4 for cuML >= 23.02, 3.21.1 for older versions.
- Download libraft-cu12 and librmm-cu12 wheels alongside libcuml-cu12
  (cuml headers include raft/rmm headers which are in separate packages)
- Merge raft/rmm headers into libcuml/include/ during download
- Remove static_assert(CUML_VERSION_MAJOR == 21) — allow 26+
- Guard raft::mr::device::allocator (removed in raft 26.x) with version
  conditionals in device_allocator.cu/.h and stream_allocator.cu
- Use raft/core/handle.hpp instead of raft/handle.hpp for v26+
- Add tools/config/utils/pypi.R with resolve_native_deps() that walks
  the PyPI dependency tree for a package and returns download URLs for
  all native C++ dependencies (libraft, librmm, rapids-logger, cccl, etc.)
- libcuml_versions.R: cuML 26.04 entry is now just "libcuml-cu12"
  (the PyPI package name), not a hardcoded URL
- cuml.R: download_libcuml() detects PyPI package names vs direct URLs,
  resolves the full dep tree, downloads all wheels, and merges their
  include/ directories into libcuml/include/
- configure.R: load pypi.R utility
- Uses jsonlite for PyPI JSON API parsing
RMM 26.04 headers require CCCL >= 3.3 at compile time, but CCCL is not
a pip dependency (it's normally bundled with the CUDA toolkit). CUDA 12.x
ships CCCL 2.x which is too old. Download CCCL v3.3.0 from GitHub
releases (header-only, ~2MB) and merge into libcuml/include/.

Also handle pip wheels that extract to nested dirs like
nvidia/<subpackage>/include/.
CCCL 3.3 headers (bundled in libcuml/include/) must take precedence
over the CUDA 12 toolkit's older CCCL 2.x headers. Swap include order
so cuml/raft/rmm/cccl headers are found first.
- Use RAPIDS-pinned CCCL commit (CUDA 12 compatible) instead of v3.3.0
  release tag which includes CUDA 13-only code
- pinned_host_vector.h: guard thrust::cuda::experimental::pinned_allocator
  (removed in CCCL 3.x); use plain host_vector on v26+
- handle_utils.cu: raft::handle_t no longer has set_stream(); reconstruct
  with stream_view via constructor on v26+
cuML 26.04's rmm headers require CCCL >= 3.3 which conflicts with
CUDA 12.x toolkit's CCCL 2.x. cuML 25.12 vendors its own CCCL in
librmm/include/rapids/ and has no CCCL version check — clean CUDA 12
compatibility.

- Target cuML 25.12 instead of 26.04
- Version guards: >= 26 -> >= 25 (same API changes apply)
- Re-enable KNN (knn.hpp exists in 25.12 with same API)
- Remove CCCL GitHub download (not needed)
- Update PyPI resolver to handle version pins (==25.12.*)
RMM headers require this define (normally set automatically by RMM's
cmake config, but we're using headers directly from the pip wheel).
All RAPIDS 25.x+ pip wheels require CCCL 3.x headers which are
incompatible with CUDA 12's bundled CCCL 2.x. No version of
libcuml-cu12 can be compiled against a stock CUDA 12 toolkit.

Revert to cuML 21.12 as the default for now. Supporting newer cuML
will require either CUDA 13 or a custom build environment.
@dfalbel dfalbel merged commit 8070dec into main Apr 29, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants