Native row-major PCA by aamijar · Pull Request #2125 · rapidsai/cuvs

aamijar · 2026-05-26T01:02:28Z

Currently PCA only supports col-major in the cpp layer. If a consumer's input dataset is in row-major it must be copied as col-major. With the introduction of RAFT native row-major support in PCA rapidsai/raft#3036, users can seamlessly pass in row-major datasets.
This PR exposes the cpp PCA row-major APIs and updates the C and Python APIs accordingly.

Resolves #2115. Depends on rapidsai/raft#3036.

aamijar · 2026-05-26T01:11:31Z

-        FORK                     ${RAFT_FORK}
-        PINNED_TAG               ${RAFT_PINNED_TAG}
+        FORK                     aamijar
+        PINNED_TAG               pca-row-major


Revert later

coderabbitai · 2026-05-26T01:12:06Z

📝 Walkthrough

Summary by CodeRabbit

New Features
- PCA operations now accept and preserve both row-major (C) and column-major (F) layouts across C, C++, and Python APIs; outputs match input layout.
Documentation
- Updated API docs and docstrings to describe layout detection and layout-matching behavior.
Tests
- Added tests validating row-major path, no-copy behavior, and numerical agreement between layouts.
Chores
- Minor build script metadata update.

Walkthrough

This PR adds native row-major (C-contiguous) layout support to PCA across C, C++, and Python bindings by detecting DLPack strides and preserving/dispatching on C- vs F-contiguity instead of forcing col-major conversions.

Changes

PCA Row-Major Layout Support

Layer / File(s)	Summary
C API layout detection and dispatch `c/include/cuvs/preprocessing/pca.h`, `c/src/preprocessing/pca.cpp`	Documentation updated to describe C- or F-contiguous layout detection from DLPack strides. Internal helper functions templated on `LayoutT` and exported entry points dispatch based on contiguity checks, replacing hard-coded col-major conversion logic.
C++ API overloads and detail layer generalization `cpp/include/cuvs/preprocessing/pca.hpp`, `cpp/src/preprocessing/pca/detail/pca.cuh`	New row-major overloads declared for `fit`, `fit_transform`, `transform`, and `inverse_transform`. Detail API generalized to accept `LayoutPolicy` template parameter instead of requiring `raft::col_major`.
C++ implementation wrapper instantiation `cpp/src/preprocessing/pca/pca.cu`	Wrapper macros updated to use `LayoutT` template parameter. Explicit instantiations added for both `raft::col_major` and `raft::row_major` layouts (`float`, `int64_t`).
C++ layout testing and validation `cpp/tests/preprocessing/pca.cu`	`pca_roundtrip` helper generalized with `LayoutT` parameter. New `PcaRowMajorTest` class verifies row-major roundtrip and validates numerical agreement between col-major and row-major executions.
Python API layout preservation implementation `python/cuvs/cuvs/preprocessing/pca/pca.pyx`	Refactored to preserve input layout instead of forcing Fortran conversion. Added `_ensure_device_contiguous`, `_validate_pca_input`, and `_match_layout` helpers. Updated `fit`, `fit_transform`, `transform`, and `inverse_transform` to detect, preserve, and align layouts accordingly.
Python layout testing and validation `python/cuvs/cuvs/tests/test_pca.py`	Added `_as_order` layout helper. Parametrized existing tests over C/F-contiguous layouts. Added `test_row_major_no_copy` to verify no silent conversion, and `test_layouts_agree_numerically` to validate numerical equivalence across layouts.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

dantegd
robertmaynard
jameslamb

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 30.43% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Native row-major PCA' directly and clearly summarizes the primary change—adding native row-major support to PCA APIs.
Description check	✅ Passed	The description explains the motivation (row-major inputs currently require copying) and solution (exposing C++ row-major APIs via RAFT support), relating directly to the changeset.
Linked Issues check	✅ Passed	The PR implements native row-major PCA support across C++ (detail/public APIs), C, and Python layers [`#2115`], eliminating unnecessary data copies for row-major inputs as requested.
Out of Scope Changes check	✅ Passed	All changes are scoped to PCA preprocessing across C, C++, and Python layers. The CMake file update (copyright years and RAFT source pinning) directly supports enabling RAFT row-major PCA.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

python/cuvs/cuvs/tests/test_pca.py (1)
131-149: ⚡ Quick win

This test can't detect an internal layout-conversion copy.

pca.fit(params, X) cannot change X.data.ptr, so this still passes even if the implementation creates a temporary reordered buffer and uses that internally. That makes the regression signal much weaker than the test name suggests. A more reliable check would spy on _ensure_device_contiguous() or another observable copy path instead of re-reading the caller's original pointer.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cuvs/cuvs/tests/test_pca.py` around lines 131 - 149, The test
test_row_major_no_copy is currently ineffective because pca.fit(params, X)
cannot mutate X.data.ptr even if the implementation copies to a temporary;
instead, spy on the internal copy path: patch or mock
pca._ensure_device_contiguous (or the actual helper that performs device-side
reordering) before calling pca.fit and assert that the helper was not invoked;
keep the existing setup (C-contiguous cp.ndarray, original_ptr) but replace the
final pointer assertion with an assertion that
pca._ensure_device_contiguous.assert_not_called() (or equivalent), so the test
fails if the implementation performs an internal layout-conversion copy.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@python/cuvs/cuvs/preprocessing/pca/pca.pyx`:
- Around line 152-154: The docstring currently claims "no internal
copy/transpose is performed" which is false in general; update the public
docstrings in pca.pyx to narrow the guarantee to state that zero-copy is only
guaranteed for already-contiguous float32 device (GPU) inputs and that
_ensure_device_contiguous will perform copies for host inputs, dtype coercion,
and non-contiguous device views (i.e., hidden allocations may occur). Edit both
occurrences noted to explicitly call out these cases and the potential for
allocations, and mention _ensure_device_contiguous by name so callers know where
coercion/copies happen.
- Around line 300-309: Replace use of _match_layout() on caller-supplied
writable buffers (trans_input / output) in transform() and inverse_transform()
so we never allocate and return a new array instead of writing into the caller
buffer; instead, check that the provided buffers have dtype float32 and the
required memory order (C or F) up front and either raise a clear error if they
do not match or perform an explicit in-place copy/transpose into the provided
buffer before calling the C++ layer. Remove or avoid calling _match_layout() on
writable-output code paths and add explicit dtype validation (float32) and
contiguous-order checks where _match_layout() was used; keep _match_layout() for
read-only normalization only. Also update the fit() / fit_transform() docstrings
to stop claiming "no internal copy/transpose is performed" and explicitly note
that _ensure_device_contiguous() may coerce to float32 and/or make a contiguous
copy when needed.

---

Nitpick comments:
In `@python/cuvs/cuvs/tests/test_pca.py`:
- Around line 131-149: The test test_row_major_no_copy is currently ineffective
because pca.fit(params, X) cannot mutate X.data.ptr even if the implementation
copies to a temporary; instead, spy on the internal copy path: patch or mock
pca._ensure_device_contiguous (or the actual helper that performs device-side
reordering) before calling pca.fit and assert that the helper was not invoked;
keep the existing setup (C-contiguous cp.ndarray, original_ptr) but replace the
final pointer assertion with an assertion that
pca._ensure_device_contiguous.assert_not_called() (or equivalent), so the test
fails if the implementation performs an internal layout-conversion copy.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5c9a4634-1195-46d5-bdd1-e9066dc65a14

📥 Commits

Reviewing files that changed from the base of the PR and between 4addf4e and d877a92.

📒 Files selected for processing (8)

c/include/cuvs/preprocessing/pca.h
c/src/preprocessing/pca.cpp
cpp/include/cuvs/preprocessing/pca.hpp
cpp/src/preprocessing/pca/detail/pca.cuh
cpp/src/preprocessing/pca/pca.cu
cpp/tests/preprocessing/pca.cu
python/cuvs/cuvs/preprocessing/pca/pca.pyx
python/cuvs/cuvs/tests/test_pca.py

coderabbitai · 2026-05-26T01:12:09Z

+    if ary.flags.c_contiguous:
+        return ary, "C"
+    if ary.flags.f_contiguous:
+        return ary, "F"


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

In CuPy 13.x, can a 2D array with a singleton dimension be both C-contiguous and F-contiguous, and can the actual memory layout be distinguished from its strides?

💡 Result:

Yes. In CuPy 13.x, a 2D array with a singleton dimension (e.g., shape (1, N) or (N, 1)) can be both C-contiguous and F-contiguous. And in that degenerate singleton-dimension case, you generally cannot (reliably) distinguish the “actual memory layout” (row-major vs column-major in the usual sense) from the strides alone: when an axis has length 1, the contiguity checks effectively ignore/relax the corresponding stride components, and a contiguous array may report strides that are not uniquely determined by physical adjacency. Key evidence: 1) NumPy flags explicitly document that “Even for contiguous arrays a stride for a given dimension arr.strides[dim] may be arbitrary if arr.shape[dim] == 1”. They also state that arrays can be both C- and F-contiguous simultaneously.[1] 2) NumPy’s contiguity-flag implementation explains that with relaxed stride checking, it becomes possible for e.g. a 10x1 array to be both C- and F-contiguous, because strides for size-1 dimensions are ignored/relaxed; this breaks assumptions that certain strides always equal itemsize for contiguous arrays.[2] 3) CuPy’s ndarray.flags in CuPy exposes only c_contiguous and f_contiguous (plus owndata) as the memory-layout information, rather than implying that strides uniquely determine physical layout in all cases.[3] 4) CuPy dev discussion/bugfix notes for “both C and F-contiguous” arrays indicate that strides may differ or change even when contiguity flags indicate the array is contiguous in both senses, and it cautions that users should not rely on strides for contiguity-dependent behavior.[4] Practical takeaway for CuPy 13.x: use a.flags.c_contiguous / a.flags.f_contiguous (and/or perform access/traversal tests), but do not expect a.strides to uniquely encode which “order” (C vs F) you’d normally infer for non-degenerate shapes.

Citations:

1: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.flags.html

2: https://github.com/numpy/numpy/blob/23c05e6f1b392f80f749dbb4668b5e7b52aef014/numpy/core/src/multiarray/flagsobject.c

3: https://docs.cupy.dev/en/stable/reference/generated/cupy.ndarray.html

4: BUG: Fix new strides when array is both C and F-contiguous cupy/cupy#7438

🏁 Script executed:

#!/bin/bash set -euo pipefail ls -la rg -n "return ary, \"C\"|return ary, \"F\"|c_contiguous|f_contiguous" python/cuvs/cuvs/preprocessing/pca/pca.pyx sed -n '90,170p' python/cuvs/cuvs/preprocessing/pca/pca.pyx

Repository: rapidsai/cuvs

Length of output: 4849

🏁 Script executed:

sed -n '170,360p' python/cuvs/cuvs/preprocessing/pca/pca.pyx

Repository: rapidsai/cuvs

Length of output: 7327

🏁 Script executed:

sed -n '360,560p' python/cuvs/cuvs/preprocessing/pca/pca.pyx

Repository: rapidsai/cuvs

Length of output: 4118

🏁 Script executed:

rg -n "cuvsPcaFit\\b|cuvsPcaFitTransform\\b|cuvsPcaInverseTransform\\b" -S cpp python/cuvs | head -n 50

Repository: rapidsai/cuvs

Length of output: 696

🏁 Script executed:

sed -n '1,120p' python/cuvs/cuvs/preprocessing/pca/pca.pxd

Repository: rapidsai/cuvs

Length of output: 2948

🏁 Script executed:

ls -la c* && find cpp -maxdepth 4 -type f -name "pca.h" -o -name "*pca*" | head

Repository: rapidsai/cuvs

Length of output: 3561

🏁 Script executed:

fd -i "pca.h" -a . | head fd -i "preprocessing/pca.h" -a . | head rg -n "DLManagedTensor|strides|layout|contiguous|c_contiguous|f_contiguous|row|col|order" cpp/src/preprocessing/pca -S ls -la cpp/include/cuvs/preprocessing || true ls -la cpp/include/cuvs/preprocessing/pca* || true

Repository: rapidsai/cuvs

Length of output: 1331

🏁 Script executed:

sed -n '1,220p' c/include/cuvs/preprocessing/pca.h

Repository: rapidsai/cuvs

Length of output: 8710

🏁 Script executed:

sed -n '220,520p' c/include/cuvs/preprocessing/pca.h sed -n '1,260p' cpp/src/preprocessing/pca/pca.cu sed -n '1,240p' cpp/include/cuvs/preprocessing/pca.hpp

Repository: rapidsai/cuvs

Length of output: 17615

🏁 Script executed:

ls -la cpp/src/preprocessing/pca && ls -la cpp/src/preprocessing/pca/detail || true rg -n "strides|stride|DLManagedTensor|ndim|shape|dl_tensor|row_major|col_major|contig|c_contiguous|f_contiguous" cpp/src/preprocessing/pca -S

Repository: rapidsai/cuvs

Length of output: 1212

🏁 Script executed:

sed -n '1,220p' cpp/src/preprocessing/pca/detail/pca.cuh sed -n '1,260p' cpp/src/preprocessing/pca/pca.cu

Repository: rapidsai/cuvs

Length of output: 11150

Stop always classifying singleton-dimension inputs as "C" when both C- and F-contiguous flags are set.

In CuPy 13.x, shapes like (1, N) / (N, 1) can report both ary.flags.c_contiguous and ary.flags.f_contiguous, but for size-1 axes the strides don’t uniquely determine an original row/col-major “order”. The current _ensure_device_contiguous logic returns "C" whenever c_contiguous is true, so PCA outputs (e.g., components, trans_input) will consistently use "C" layout for this degenerate case, which conflicts with the stated “layout preserved natively” expectation. Either implement a best-effort preservation using the incoming array’s CUDA/DLPack stride metadata, or document that C/F is ambiguous for singleton-dimension inputs and that "C" is the chosen convention.

coderabbitai · 2026-05-26T01:12:10Z

+    The input layout (C-contiguous / row-major or F-contiguous / col-major)
+    is preserved natively; no internal copy/transpose is performed. Output
+    arrays use the same layout as the input.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

The public docs overpromise zero-copy behavior.

_ensure_device_contiguous() still copies for host inputs, dtype coercion, and non-contiguous device views, so "no internal copy/transpose is performed" is only true for already-contiguous float32 device inputs. Please narrow the wording so callers are not surprised by hidden allocations. As per coding guidelines, "Document missing or incorrect docstrings for public methods and classes, including parameter documentation and usage examples for memory resources."

Also applies to: 226-227

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuvs/cuvs/preprocessing/pca/pca.pyx` around lines 152 - 154, The docstring currently claims "no internal copy/transpose is performed" which is false in general; update the public docstrings in pca.pyx to narrow the guarantee to state that zero-copy is only guaranteed for already-contiguous float32 device (GPU) inputs and that _ensure_device_contiguous will perform copies for host inputs, dtype coercion, and non-contiguous device views (i.e., hidden allocations may occur). Edit both occurrences noted to explicitly call out these cases and the potential for allocations, and mention _ensure_device_contiguous by name so callers know where coercion/copies happen.

coderabbitai · 2026-05-26T01:12:10Z

+def _match_layout(arr, order):
+    """Return ``arr`` in the requested layout (``"C"`` or ``"F"``).
+
+    Avoids a copy when the array is already in the target layout."""
+    arr = cp.asarray(arr)
+    if order == "C" and arr.flags.c_contiguous:
+        return arr
+    if order == "F" and arr.flags.f_contiguous:
+        return arr
+    return cp.asarray(arr, order=order)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

For CuPy 13.x, does cupy.asarray(arr, order="C")allocate a new array whenarr is F-contiguous or otherwise not already C-contiguous, leaving the original array object unchanged?

💡 Result:

For CuPy 13.x, cupy.asarray(arr, order="C") does not guarantee zero-copy when arr is not already C-contiguous. What CuPy promises from the API level: - cupy.asarray is defined as equivalent to array(a, dtype, copy=False, order=order) [1]. - The order argument is a request for the desired memory layout (e.g., C-style vs Fortran-style) [1]. - CuPy returns “An array on the current device” and performs a copy if needed (the “no copy is performed” guarantee applies only when a is already on the device; for host arrays, data movement/allocation is inherently required) [1]. What happens for C vs F contiguous inputs: - Because order="C" is explicitly requesting a C-style memory representation, if the input provided by arr is F-contiguous (or otherwise not C-contiguous), producing a C-contiguous result requires creating a new array (or otherwise changing strides/layout). CuPy’s docs indicate asarray uses array(..., copy=False, order=order) semantics, i.e., it will only avoid copying when it can satisfy the requested order without needing a new array [1]. Therefore, when arr is F-contiguous or otherwise not C-contiguous, CuPy 13.x will allocate a new GPU array to satisfy order="C", and the original array object (the input arr) is left unchanged (it is not mutated in-place).

Citations:

1: https://docs.cupy.dev/en/v13.0.0/reference/generated/cupy.asarray.html

🏁 Script executed:

#!/bin/bash set -euo pipefail # Locate and print the relevant parts of pca.pyx around the cited lines and usages. FILE="python/cuvs/cuvs/preprocessing/pca/pca.pyx" # Basic context around _match_layout nl -ba "$FILE" | sed -n '260,340p' echo "-----" # Show fit/fit_transform/transform/inverse_transform sections where outputs are created or normalized nl -ba "$FILE" | sed -n '340,520p' echo "-----" # Find all usages of _match_layout in this file rg -n "_match_layout\(" "$FILE"

Repository: rapidsai/cuvs

Length of output: 100

🌐 Web query:

In CuPy (v13.x), how do arr.flags.c_contiguousandarr.flags.f_contiguousbehave for singleton dimensions (e.g., shape(1, n)or(n, 1))? Can an array be both C- and F-contiguous, and how does this affect selecting order="C"vsorder="F"?

💡 Result:

In CuPy 13.x, arr.flags.c_contiguous and arr.flags.f_contiguous follow NumPy’s contiguity flag semantics, including special handling for singleton dimensions. 1) Singleton dimensions and the meaning of the flags - For a 2+ dimensional array, C- and F-contiguity are normally mutually exclusive (“can be C- or F- contiguous, or neither, but not both”) [1]. - However, there is a relaxed case when dimensions have size 1: strides along size-1 axes do not affect the effective memory layout because those indices are always 0. In NumPy docs, if self.shape[k] == 1, then self.strides[k] is arbitrary without affecting offsets, which implies arrays with such singleton dimensions can satisfy both C and F contiguity checks under the relaxed/appropriate definition [2]. Concrete consequences: - Shape (1, n): - c_contiguous is True (standard row-major layout for the non-singleton dimension). - f_contiguous is also typically True because the singleton first dimension makes the column-major layout compatible with the same physical memory (the only dimension that matters is the length-n one, and the contiguity checks can both pass) [2]. - Shape (n, 1): - f_contiguous is True. - c_contiguous is also typically True for the same reason: the singleton last dimension makes the stride along that axis irrelevant for contiguity, allowing both checks to pass [2]. So for the common “one singleton, one non-singleton” 2D cases (1, n) and (n, 1), arrays can be both C- and F-contiguous (both flags True) under the normal semantics used by NumPy/CuPy. 2) Can an array be both C- and F-contiguous? - Yes. NumPy’s contiguous definitions allow both flags to be true in the presence of singleton (size-1) dimensions; the NumPy docs explicitly discuss that singleton dimensions make the stride along them arbitrary and that this can result in both contiguity flags being set simultaneously, even for higher-dimensional arrays [2]. 3) How this affects selecting order="C" vs order="F" - CuPy functions that take an order argument (e.g., in array constructors or asarray/conversion-style APIs) generally use the requested memory layout; for order='A', CuPy chooses Fortran if the array is Fortran-contiguous, otherwise C [3]. - If you have an array that is both C- and F-contiguous (common for (1, n) / (n, 1)), then “Fortran-contiguous” is true as well as “C-contiguous”. Therefore, any decision logic that checks only one flag (or uses A) may pick either layout but will usually be no-ops in terms of data movement because the array already satisfies both contiguity properties [3][2]. Practical takeaway: - For arrays with singleton dimensions where both flags are True, choosing order='C' vs order='F' should generally not force a copy/reorder, because both contiguity conditions hold already [2]. - For arrays with no singleton/degenerate dimensions, C and F contiguity are generally not both true for 2+ dimensions, so order='C' vs order='F' typically distinguishes whether a copy/reformat is needed [1].

Citations:

1: https://github.com/numpy/numpy/blob/23c05e6f1b392f80f749dbb4668b5e7b52aef014/numpy/core/src/multiarray/flagsobject.c

2: https://github.com/numpy/numpy/blob/main/doc/source/reference/arrays.ndarray.rst

3: https://docs.cupy.dev/en/stable/reference/generated/cupy.ndarray.html

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="python/cuvs/cuvs/preprocessing/pca/pca.pyx" test -f "$FILE" || { echo "Missing file: $FILE"; exit 1; } echo "=== _match_layout (around line ~300) ===" sed -n '285,330p' "$FILE" echo echo "=== _match_layout call sites ===" rg -n "_match_layout$" "$FILE" || true echo echo "=== fit / fit_transform / transform / inverse_transform (broad) ===" rg -n "def (fit|fit_transform|transform|inverse_transform)\b" "$FILE" || true # Dump nearby output-buffer handling lines mentioned in the review comment echo echo "=== Lines around 360-390 ===" sed -n '350,410p' "$FILE" echo echo "=== Lines around 430-470 ===" sed -n '420,480p' "$FILE" echo echo "=== Docstrings for public entry points (snippets) ===" # Print the beginning of each public method to see what docstrings claim python3 - <<'PY' import re, itertools, pathlib p = pathlib.Path("python/cuvs/cuvs/preprocessing/pca/pca.pyx") txt = p.read_text() for name in ["fit", "fit_transform", "transform", "inverse_transform"]: m = re.search(rf"def\s+{name}\b[^(]*\([^)]*$:", txt) if not m: continue start = m.start() # take next 1200 chars after signature snippet = txt[m.start(): m.start()+1200] print("\n---", name, "---") print(snippet.splitlines()[:60]) PY

Repository: rapidsai/cuvs

Length of output: 11941

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="python/cuvs/cuvs/preprocessing/pca/pca.pyx" echo "=== _ensure_device_contiguous ===" rg -n "def _ensure_device_contiguous" "$FILE" sed -n '1,120p' "$FILE" sed -n '120,200p' "$FILE" echo echo "=== _validate_pca_input ===" rg -n "def _validate_pca_input" "$FILE" sed -n '200,270p' "$FILE" echo echo "=== fit() / fit_transform() bodies (validation + array creation) ===" sed -n '120,270p' "$FILE" echo echo "=== transform() / inverse_transform() signatures area already shown; now include rest of each for dtype checks ===" sed -n '330,480p' "$FILE"

Repository: rapidsai/cuvs

Length of output: 20048

Stop allocating/replacing caller-supplied PCA output buffers (trans_input / output)
transform() and inverse_transform() treat trans_input/output as pre-allocated outputs, but when they’re not already in the requested layout they’re normalized via _match_layout(), which ultimately calls cp.asarray(arr, order=order)—so CuPy can return a new array while leaving the caller’s original buffer untouched. _match_layout() also does not validate float32, so incorrect dtypes can reach the C++ layer. Remove _match_layout() from writable-output paths; instead validate dtype/layout and fail fast (or explicitly write into the provided buffer).

Docstrings: fit() / fit_transform() claim “no internal copy/transpose is performed” even though _ensure_device_contiguous() coerces to float32 and can make a contiguous copy when needed; adjust wording to reflect when copying can occur.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuvs/cuvs/preprocessing/pca/pca.pyx` around lines 300 - 309, Replace use of _match_layout() on caller-supplied writable buffers (trans_input / output) in transform() and inverse_transform() so we never allocate and return a new array instead of writing into the caller buffer; instead, check that the provided buffers have dtype float32 and the required memory order (C or F) up front and either raise a clear error if they do not match or perform an explicit in-place copy/transpose into the provided buffer before calling the C++ layer. Remove or avoid calling _match_layout() on writable-output code paths and add explicit dtype validation (float32) and contiguous-order checks where _match_layout() was used; keep _match_layout() for read-only normalization only. Also update the fit() / fit_transform() docstrings to stop claiming "no internal copy/transpose is performed" and explicitly note that _ensure_device_contiguous() may coerce to float32 and/or make a contiguous copy when needed.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp/cmake/thirdparty/get_raft.cmake`:
- Around line 63-64: Replace the temporary hard-coded RAFT fork/tag values in
the get_raft.cmake snippet with the project variables so the build uses the
configured defaults: change the FORK and PINNED_TAG entries that currently read
as literal "aamijar" and "pca-row-major" to reference ${RAFT_FORK} and
${RAFT_PINNED_TAG} respectively; ensure this change is applied where FORK and
PINNED_TAG are defined in the same block so the build picks up the
RAFT_FORK/RAFT_PINNED_TAG variables (after confirming the upstream RAFT PR `#3036`
is merged).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: bf425a06-8e61-4d15-be24-486dc2b6cf53

📥 Commits

Reviewing files that changed from the base of the PR and between d877a92 and 6408c8e.

📒 Files selected for processing (1)

cpp/cmake/thirdparty/get_raft.cmake

coderabbitai · 2026-05-26T01:13:51Z

+        FORK                     aamijar
+        PINNED_TAG               pca-row-major


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: Revert to default RAFT fork/tag before merge.

These hard-coded values point to a temporary development branch and must be reverted before this PR is merged. Merging as-is would break the build for all users once the pca-row-major branch is removed.

Required before merge:

Ensure RAFT PR #3036 is merged to the main RAFT repository

Revert these lines to use the default variables:
FORK ${RAFT_FORK} PINNED_TAG ${RAFT_PINNED_TAG}

This configuration is appropriate for testing during development, but the author's own comment "Revert later" confirms it must not reach main.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/cmake/thirdparty/get_raft.cmake` around lines 63 - 64, Replace the temporary hard-coded RAFT fork/tag values in the get_raft.cmake snippet with the project variables so the build uses the configured defaults: change the FORK and PINNED_TAG entries that currently read as literal "aamijar" and "pca-row-major" to reference ${RAFT_FORK} and ${RAFT_PINNED_TAG} respectively; ensure this change is applied where FORK and PINNED_TAG are defined in the same block so the build picks up the RAFT_FORK/RAFT_PINNED_TAG variables (after confirming the upstream RAFT PR `#3036` is merged).

pca-row-major

d877a92

aamijar requested review from a team as code owners May 26, 2026 01:02

github-project-automation Bot added this to Unstructured Data Processing May 26, 2026

aamijar moved this to In Progress in Unstructured Data Processing May 26, 2026

aamijar self-assigned this May 26, 2026

aamijar added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels May 26, 2026

pin raft

6408c8e

aamijar requested a review from a team as a code owner May 26, 2026 01:11

aamijar commented May 26, 2026

View reviewed changes

coderabbitai Bot reviewed May 26, 2026

View reviewed changes

aamijar added the AlloyDB label May 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Native row-major PCA#2125

Native row-major PCA#2125
aamijar wants to merge 2 commits into
rapidsai:mainfrom
aamijar:pca-row-major

aamijar commented May 26, 2026

Uh oh!

aamijar May 26, 2026

Uh oh!

coderabbitai Bot commented May 26, 2026 •

edited

Loading

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 26, 2026

Uh oh!

coderabbitai Bot May 26, 2026

Uh oh!

coderabbitai Bot May 26, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aamijar commented May 26, 2026

Uh oh!

aamijar May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 26, 2026 •

edited

Loading