Conversation
Replace self-hosted GPU runners with runs-on g4dn.xlarge spot instances, matching the approach in mlverse/torch#1439. Also modernizes the workflow: - Action versions: checkout@v4, setup-python@v5, setup-r@v2, etc. - Fix deprecated ::set-output → $GITHUB_OUTPUT - Container: ubuntu18.04 → ubuntu20.04 (18.04 is EOL) - Add --runtime=nvidia to container options - Add concurrency groups with cancel-in-progress - Simplify matrix to single config (CUDA 11.2.1, cuML 21.12, R release) - Drop ASAN matrix dimension
- Build Docker image with cuda.ml pre-installed on ubuntu-latest (free) - Run tests on runs-on g4dn.xlarge GPU runner using the pre-built image - Add .github/docker/Dockerfile following the same pattern as mlverse/torch - Make CMAKE_CUDA_ARCHITECTURES configurable via env var (defaults to NATIVE) so cross-compilation works on runners without a GPU (targets T4 = SM 75) - Remove miniconda install (no longer needed for reticulate tests)
The 'sklearn' PyPI package is deprecated in favor of 'scikit-learn'. Also switch from py_install() to py_require() which is the modern reticulate API for declaring Python dependencies.
- Move download_libcuml() before normalizePath() so the directory exists - Reference CUML_STUB_HEADERS_DIR in both Treelite found/not-found branches so cmake doesn't warn about unused variable
SVD components are only defined up to sign, so different implementations can produce sign-flipped vectors that are mathematically equivalent. Align signs before comparing components and transformed data.
Modern sklearn strictly validates that max_iter is an int. R's default numeric type is double, which reticulate passes as a Python float. Using 10000L ensures it's passed as a Python int.
Runs R CMD check --as-cran on ubuntu-latest with R release and devel. No nvcc/CUDA available, so the package builds with stub headers — matching what CRAN would see.
- Escape literal braces in roxygen comments across R source files and
templates (e.g. {cuda.ml} -> \{cuda.ml\}, {"opt1",...} -> \{"opt1",...\})
- Regenerate all affected Rd files via devtools::document()
- Skip test_check() when cuML is not linked (CRAN-like environments)
- Use R CMD check directly in CRAN job (avoids rcmdcheck NOT_CRAN=true)
- Revert brace escaping inside @examples blocks (R code, not Rd markup) - Define cuda_ml_can_predict_class_probabilities methods as proper functions so roxygen registers them as S3method() in NAMESPACE
Build infrastructure: - Dockerfile: CUDA 12.8.1 + Ubuntu 22.04 base image - libcuml_versions.R: add 26.04 entry pointing to PyPI libcuml-cu12 wheel - cuml.R: handle pip wheel extraction (lib64/ layout, .whl extension) - configure.R: handle lib64/ vs lib/ for pip wheels - CMakeLists.txt.in: C++17, rapids-cmake branch-26.04 - Workflow: target cuML 26.04 C++ API changes for cuML 26.04: - svm_serde.h: namespace alias MLCommon::Matrix -> ML::matrix for KernelParams and KernelType (header renamed kernelparams.h -> kernel_params.hpp) - fil.cu, fil_utils.h, fil_utils.cu: disable FIL on 26.04 with stubs (fil.h replaced by modular headers; full adaptation TODO) - random_projection.cu: disable on 26.04 with stubs (C++ API removed) - knn.cu: disable on 26.04 with stubs (raft::spatial::knn types removed) - random_forest_classifier.cu, random_forest_regressor.cu: guard FIL prediction paths for 26.04 Backward compatible: cuML 21.x with CUDA 11 still works.
- Dockerfile: accept CUDA_IMAGE as build arg for different base images - Workflow: matrix over cuML 21.12 (CUDA 11.2) and 26.04 (CUDA 12.8) - Each version gets its own build-image and test-gpu job
- CMakeLists.txt.in: template RAPIDS_CMAKE_TAG and CMAKE_CXX_STANDARD so they adapt to the cuML version being built against - configure.R: set rapids-cmake tag (v26.04.00 for 26.x, branch-21.10 for 21.x) and C++ standard (17 for 26.x, 14 for 21.x) - cuml.R: don't create premature lib symlink in download_libcuml()
Use vYY.MM.00 for cuML >= 23.02 (stable tags), vYY.MM.00a for older versions (only alpha tags available).
rapids-cmake v26.04 needs cmake >= 3.30.4. The existing auto-download logic handles this, but the min version threshold was hardcoded to 3.21.1. Now it's 3.30.4 for cuML >= 23.02, 3.21.1 for older versions.
- Download libraft-cu12 and librmm-cu12 wheels alongside libcuml-cu12 (cuml headers include raft/rmm headers which are in separate packages) - Merge raft/rmm headers into libcuml/include/ during download - Remove static_assert(CUML_VERSION_MAJOR == 21) — allow 26+ - Guard raft::mr::device::allocator (removed in raft 26.x) with version conditionals in device_allocator.cu/.h and stream_allocator.cu - Use raft/core/handle.hpp instead of raft/handle.hpp for v26+
- Add tools/config/utils/pypi.R with resolve_native_deps() that walks the PyPI dependency tree for a package and returns download URLs for all native C++ dependencies (libraft, librmm, rapids-logger, cccl, etc.) - libcuml_versions.R: cuML 26.04 entry is now just "libcuml-cu12" (the PyPI package name), not a hardcoded URL - cuml.R: download_libcuml() detects PyPI package names vs direct URLs, resolves the full dep tree, downloads all wheels, and merges their include/ directories into libcuml/include/ - configure.R: load pypi.R utility - Uses jsonlite for PyPI JSON API parsing
RMM 26.04 headers require CCCL >= 3.3 at compile time, but CCCL is not a pip dependency (it's normally bundled with the CUDA toolkit). CUDA 12.x ships CCCL 2.x which is too old. Download CCCL v3.3.0 from GitHub releases (header-only, ~2MB) and merge into libcuml/include/. Also handle pip wheels that extract to nested dirs like nvidia/<subpackage>/include/.
CCCL 3.3 headers (bundled in libcuml/include/) must take precedence over the CUDA 12 toolkit's older CCCL 2.x headers. Swap include order so cuml/raft/rmm/cccl headers are found first.
- Use RAPIDS-pinned CCCL commit (CUDA 12 compatible) instead of v3.3.0 release tag which includes CUDA 13-only code - pinned_host_vector.h: guard thrust::cuda::experimental::pinned_allocator (removed in CCCL 3.x); use plain host_vector on v26+ - handle_utils.cu: raft::handle_t no longer has set_stream(); reconstruct with stream_view via constructor on v26+
cuML 26.04's rmm headers require CCCL >= 3.3 which conflicts with CUDA 12.x toolkit's CCCL 2.x. cuML 25.12 vendors its own CCCL in librmm/include/rapids/ and has no CCCL version check — clean CUDA 12 compatibility. - Target cuML 25.12 instead of 26.04 - Version guards: >= 26 -> >= 25 (same API changes apply) - Re-enable KNN (knn.hpp exists in 25.12 with same API) - Remove CCCL GitHub download (not needed) - Update PyPI resolver to handle version pins (==25.12.*)
RMM headers require this define (normally set automatically by RMM's cmake config, but we're using headers directly from the pip wheel).
All RAPIDS 25.x+ pip wheels require CCCL 3.x headers which are incompatible with CUDA 12's bundled CCCL 2.x. No version of libcuml-cu12 can be compiled against a stock CUDA 12 toolkit. Revert to cuML 21.12 as the default for now. Supporting newer cuML will require either CUDA 13 or a custom build environment.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
[self-hosted, gpu]runners with runs-ong4dn.xlargespot instances (same approach as Use runs-on GPU runners for CI torch#1439)::set-output→$GITHUB_OUTPUT--runtime=nvidiaand concurrency groups withcancel-in-progressTest plan
nvidia-smi/ GPU access works inside the container