Skip to content

Native row-major PCA#2125

Open
aamijar wants to merge 2 commits into
rapidsai:mainfrom
aamijar:pca-row-major
Open

Native row-major PCA#2125
aamijar wants to merge 2 commits into
rapidsai:mainfrom
aamijar:pca-row-major

Conversation

@aamijar
Copy link
Copy Markdown
Member

@aamijar aamijar commented May 26, 2026

Currently PCA only supports col-major in the cpp layer. If a consumer's input dataset is in row-major it must be copied as col-major. With the introduction of RAFT native row-major support in PCA rapidsai/raft#3036, users can seamlessly pass in row-major datasets.
This PR exposes the cpp PCA row-major APIs and updates the C and Python APIs accordingly.

Resolves #2115. Depends on rapidsai/raft#3036.

@aamijar aamijar requested review from a team as code owners May 26, 2026 01:02
@aamijar aamijar moved this to In Progress in Unstructured Data Processing May 26, 2026
@aamijar aamijar self-assigned this May 26, 2026
@aamijar aamijar added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels May 26, 2026
@aamijar aamijar requested a review from a team as a code owner May 26, 2026 01:11
FORK ${RAFT_FORK}
PINNED_TAG ${RAFT_PINNED_TAG}
FORK aamijar
PINNED_TAG pca-row-major
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert later

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

  • New Features
    • PCA operations now accept and preserve both row-major (C) and column-major (F) layouts across C, C++, and Python APIs; outputs match input layout.
  • Documentation
    • Updated API docs and docstrings to describe layout detection and layout-matching behavior.
  • Tests
    • Added tests validating row-major path, no-copy behavior, and numerical agreement between layouts.
  • Chores
    • Minor build script metadata update.

Walkthrough

This PR adds native row-major (C-contiguous) layout support to PCA across C, C++, and Python bindings by detecting DLPack strides and preserving/dispatching on C- vs F-contiguity instead of forcing col-major conversions.

Changes

PCA Row-Major Layout Support

Layer / File(s) Summary
C API layout detection and dispatch
c/include/cuvs/preprocessing/pca.h, c/src/preprocessing/pca.cpp
Documentation updated to describe C- or F-contiguous layout detection from DLPack strides. Internal helper functions templated on LayoutT and exported entry points dispatch based on contiguity checks, replacing hard-coded col-major conversion logic.
C++ API overloads and detail layer generalization
cpp/include/cuvs/preprocessing/pca.hpp, cpp/src/preprocessing/pca/detail/pca.cuh
New row-major overloads declared for fit, fit_transform, transform, and inverse_transform. Detail API generalized to accept LayoutPolicy template parameter instead of requiring raft::col_major.
C++ implementation wrapper instantiation
cpp/src/preprocessing/pca/pca.cu
Wrapper macros updated to use LayoutT template parameter. Explicit instantiations added for both raft::col_major and raft::row_major layouts (float, int64_t).
C++ layout testing and validation
cpp/tests/preprocessing/pca.cu
pca_roundtrip helper generalized with LayoutT parameter. New PcaRowMajorTest class verifies row-major roundtrip and validates numerical agreement between col-major and row-major executions.
Python API layout preservation implementation
python/cuvs/cuvs/preprocessing/pca/pca.pyx
Refactored to preserve input layout instead of forcing Fortran conversion. Added _ensure_device_contiguous, _validate_pca_input, and _match_layout helpers. Updated fit, fit_transform, transform, and inverse_transform to detect, preserve, and align layouts accordingly.
Python layout testing and validation
python/cuvs/cuvs/tests/test_pca.py
Added _as_order layout helper. Parametrized existing tests over C/F-contiguous layouts. Added test_row_major_no_copy to verify no silent conversion, and test_layouts_agree_numerically to validate numerical equivalence across layouts.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • dantegd
  • robertmaynard
  • jameslamb
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Native row-major PCA' directly and clearly summarizes the primary change—adding native row-major support to PCA APIs.
Description check ✅ Passed The description explains the motivation (row-major inputs currently require copying) and solution (exposing C++ row-major APIs via RAFT support), relating directly to the changeset.
Linked Issues check ✅ Passed The PR implements native row-major PCA support across C++ (detail/public APIs), C, and Python layers [#2115], eliminating unnecessary data copies for row-major inputs as requested.
Out of Scope Changes check ✅ Passed All changes are scoped to PCA preprocessing across C, C++, and Python layers. The CMake file update (copyright years and RAFT source pinning) directly supports enabling RAFT row-major PCA.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
python/cuvs/cuvs/tests/test_pca.py (1)

131-149: ⚡ Quick win

This test can't detect an internal layout-conversion copy.

pca.fit(params, X) cannot change X.data.ptr, so this still passes even if the implementation creates a temporary reordered buffer and uses that internally. That makes the regression signal much weaker than the test name suggests. A more reliable check would spy on _ensure_device_contiguous() or another observable copy path instead of re-reading the caller's original pointer.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cuvs/cuvs/tests/test_pca.py` around lines 131 - 149, The test
test_row_major_no_copy is currently ineffective because pca.fit(params, X)
cannot mutate X.data.ptr even if the implementation copies to a temporary;
instead, spy on the internal copy path: patch or mock
pca._ensure_device_contiguous (or the actual helper that performs device-side
reordering) before calling pca.fit and assert that the helper was not invoked;
keep the existing setup (C-contiguous cp.ndarray, original_ptr) but replace the
final pointer assertion with an assertion that
pca._ensure_device_contiguous.assert_not_called() (or equivalent), so the test
fails if the implementation performs an internal layout-conversion copy.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@python/cuvs/cuvs/preprocessing/pca/pca.pyx`:
- Around line 152-154: The docstring currently claims "no internal
copy/transpose is performed" which is false in general; update the public
docstrings in pca.pyx to narrow the guarantee to state that zero-copy is only
guaranteed for already-contiguous float32 device (GPU) inputs and that
_ensure_device_contiguous will perform copies for host inputs, dtype coercion,
and non-contiguous device views (i.e., hidden allocations may occur). Edit both
occurrences noted to explicitly call out these cases and the potential for
allocations, and mention _ensure_device_contiguous by name so callers know where
coercion/copies happen.
- Around line 300-309: Replace use of _match_layout() on caller-supplied
writable buffers (trans_input / output) in transform() and inverse_transform()
so we never allocate and return a new array instead of writing into the caller
buffer; instead, check that the provided buffers have dtype float32 and the
required memory order (C or F) up front and either raise a clear error if they
do not match or perform an explicit in-place copy/transpose into the provided
buffer before calling the C++ layer. Remove or avoid calling _match_layout() on
writable-output code paths and add explicit dtype validation (float32) and
contiguous-order checks where _match_layout() was used; keep _match_layout() for
read-only normalization only. Also update the fit() / fit_transform() docstrings
to stop claiming "no internal copy/transpose is performed" and explicitly note
that _ensure_device_contiguous() may coerce to float32 and/or make a contiguous
copy when needed.

---

Nitpick comments:
In `@python/cuvs/cuvs/tests/test_pca.py`:
- Around line 131-149: The test test_row_major_no_copy is currently ineffective
because pca.fit(params, X) cannot mutate X.data.ptr even if the implementation
copies to a temporary; instead, spy on the internal copy path: patch or mock
pca._ensure_device_contiguous (or the actual helper that performs device-side
reordering) before calling pca.fit and assert that the helper was not invoked;
keep the existing setup (C-contiguous cp.ndarray, original_ptr) but replace the
final pointer assertion with an assertion that
pca._ensure_device_contiguous.assert_not_called() (or equivalent), so the test
fails if the implementation performs an internal layout-conversion copy.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5c9a4634-1195-46d5-bdd1-e9066dc65a14

📥 Commits

Reviewing files that changed from the base of the PR and between 4addf4e and d877a92.

📒 Files selected for processing (8)
  • c/include/cuvs/preprocessing/pca.h
  • c/src/preprocessing/pca.cpp
  • cpp/include/cuvs/preprocessing/pca.hpp
  • cpp/src/preprocessing/pca/detail/pca.cuh
  • cpp/src/preprocessing/pca/pca.cu
  • cpp/tests/preprocessing/pca.cu
  • python/cuvs/cuvs/preprocessing/pca/pca.pyx
  • python/cuvs/cuvs/tests/test_pca.py

Comment on lines +129 to +132
if ary.flags.c_contiguous:
return ary, "C"
if ary.flags.f_contiguous:
return ary, "F"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

In CuPy 13.x, can a 2D array with a singleton dimension be both C-contiguous and F-contiguous, and can the actual memory layout be distinguished from its strides?

💡 Result:

Yes. In CuPy 13.x, a 2D array with a singleton dimension (e.g., shape (1, N) or (N, 1)) can be both C-contiguous and F-contiguous. And in that degenerate singleton-dimension case, you generally cannot (reliably) distinguish the “actual memory layout” (row-major vs column-major in the usual sense) from the strides alone: when an axis has length 1, the contiguity checks effectively ignore/relax the corresponding stride components, and a contiguous array may report strides that are not uniquely determined by physical adjacency. Key evidence: 1) NumPy flags explicitly document that “Even for contiguous arrays a stride for a given dimension arr.strides[dim] may be arbitrary if arr.shape[dim] == 1”. They also state that arrays can be both C- and F-contiguous simultaneously.[1] 2) NumPy’s contiguity-flag implementation explains that with relaxed stride checking, it becomes possible for e.g. a 10x1 array to be both C- and F-contiguous, because strides for size-1 dimensions are ignored/relaxed; this breaks assumptions that certain strides always equal itemsize for contiguous arrays.[2] 3) CuPy’s ndarray.flags in CuPy exposes only c_contiguous and f_contiguous (plus owndata) as the memory-layout information, rather than implying that strides uniquely determine physical layout in all cases.[3] 4) CuPy dev discussion/bugfix notes for “both C and F-contiguous” arrays indicate that strides may differ or change even when contiguity flags indicate the array is contiguous in both senses, and it cautions that users should not rely on strides for contiguity-dependent behavior.[4] Practical takeaway for CuPy 13.x: use a.flags.c_contiguous / a.flags.f_contiguous (and/or perform access/traversal tests), but do not expect a.strides to uniquely encode which “order” (C vs F) you’d normally infer for non-degenerate shapes.

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

ls -la
rg -n "return ary, \"C\"|return ary, \"F\"|c_contiguous|f_contiguous" python/cuvs/cuvs/preprocessing/pca/pca.pyx
sed -n '90,170p' python/cuvs/cuvs/preprocessing/pca/pca.pyx

Repository: rapidsai/cuvs

Length of output: 4849


🏁 Script executed:

sed -n '170,360p' python/cuvs/cuvs/preprocessing/pca/pca.pyx

Repository: rapidsai/cuvs

Length of output: 7327


🏁 Script executed:

sed -n '360,560p' python/cuvs/cuvs/preprocessing/pca/pca.pyx

Repository: rapidsai/cuvs

Length of output: 4118


🏁 Script executed:

rg -n "cuvsPcaFit\\b|cuvsPcaFitTransform\\b|cuvsPcaInverseTransform\\b" -S cpp python/cuvs | head -n 50

Repository: rapidsai/cuvs

Length of output: 696


🏁 Script executed:

sed -n '1,120p' python/cuvs/cuvs/preprocessing/pca/pca.pxd

Repository: rapidsai/cuvs

Length of output: 2948


🏁 Script executed:

ls -la c* && find cpp -maxdepth 4 -type f -name "pca.h" -o -name "*pca*" | head

Repository: rapidsai/cuvs

Length of output: 3561


🏁 Script executed:

fd -i "pca.h" -a . | head
fd -i "preprocessing/pca.h" -a . | head
rg -n "DLManagedTensor|strides|layout|contiguous|c_contiguous|f_contiguous|row|col|order" cpp/src/preprocessing/pca -S
ls -la cpp/include/cuvs/preprocessing || true
ls -la cpp/include/cuvs/preprocessing/pca* || true

Repository: rapidsai/cuvs

Length of output: 1331


🏁 Script executed:

sed -n '1,220p' c/include/cuvs/preprocessing/pca.h

Repository: rapidsai/cuvs

Length of output: 8710


🏁 Script executed:

sed -n '220,520p' c/include/cuvs/preprocessing/pca.h
sed -n '1,260p' cpp/src/preprocessing/pca/pca.cu
sed -n '1,240p' cpp/include/cuvs/preprocessing/pca.hpp

Repository: rapidsai/cuvs

Length of output: 17615


🏁 Script executed:

ls -la cpp/src/preprocessing/pca && ls -la cpp/src/preprocessing/pca/detail || true
rg -n "strides|stride|DLManagedTensor|ndim|shape|dl_tensor|row_major|col_major|contig|c_contiguous|f_contiguous" cpp/src/preprocessing/pca -S

Repository: rapidsai/cuvs

Length of output: 1212


🏁 Script executed:

sed -n '1,220p' cpp/src/preprocessing/pca/detail/pca.cuh
sed -n '1,260p' cpp/src/preprocessing/pca/pca.cu

Repository: rapidsai/cuvs

Length of output: 11150


Stop always classifying singleton-dimension inputs as "C" when both C- and F-contiguous flags are set.

In CuPy 13.x, shapes like (1, N) / (N, 1) can report both ary.flags.c_contiguous and ary.flags.f_contiguous, but for size-1 axes the strides don’t uniquely determine an original row/col-major “order”. The current _ensure_device_contiguous logic returns "C" whenever c_contiguous is true, so PCA outputs (e.g., components, trans_input) will consistently use "C" layout for this degenerate case, which conflicts with the stated “layout preserved natively” expectation. Either implement a best-effort preservation using the incoming array’s CUDA/DLPack stride metadata, or document that C/F is ambiguous for singleton-dimension inputs and that "C" is the chosen convention.

Comment on lines +152 to +154
The input layout (C-contiguous / row-major or F-contiguous / col-major)
is preserved natively; no internal copy/transpose is performed. Output
arrays use the same layout as the input.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

The public docs overpromise zero-copy behavior.

_ensure_device_contiguous() still copies for host inputs, dtype coercion, and non-contiguous device views, so "no internal copy/transpose is performed" is only true for already-contiguous float32 device inputs. Please narrow the wording so callers are not surprised by hidden allocations. As per coding guidelines, "Document missing or incorrect docstrings for public methods and classes, including parameter documentation and usage examples for memory resources."

Also applies to: 226-227

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cuvs/cuvs/preprocessing/pca/pca.pyx` around lines 152 - 154, The
docstring currently claims "no internal copy/transpose is performed" which is
false in general; update the public docstrings in pca.pyx to narrow the
guarantee to state that zero-copy is only guaranteed for already-contiguous
float32 device (GPU) inputs and that _ensure_device_contiguous will perform
copies for host inputs, dtype coercion, and non-contiguous device views (i.e.,
hidden allocations may occur). Edit both occurrences noted to explicitly call
out these cases and the potential for allocations, and mention
_ensure_device_contiguous by name so callers know where coercion/copies happen.

Comment on lines +300 to +309
def _match_layout(arr, order):
"""Return ``arr`` in the requested layout (``"C"`` or ``"F"``).

Avoids a copy when the array is already in the target layout."""
arr = cp.asarray(arr)
if order == "C" and arr.flags.c_contiguous:
return arr
if order == "F" and arr.flags.f_contiguous:
return arr
return cp.asarray(arr, order=order)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

For CuPy 13.x, does cupy.asarray(arr, order="C")allocate a new array whenarr is F-contiguous or otherwise not already C-contiguous, leaving the original array object unchanged?

💡 Result:

For CuPy 13.x, cupy.asarray(arr, order="C") does not guarantee zero-copy when arr is not already C-contiguous. What CuPy promises from the API level: - cupy.asarray is defined as equivalent to array(a, dtype, copy=False, order=order) [1]. - The order argument is a request for the desired memory layout (e.g., C-style vs Fortran-style) [1]. - CuPy returns “An array on the current device” and performs a copy if needed (the “no copy is performed” guarantee applies only when a is already on the device; for host arrays, data movement/allocation is inherently required) [1]. What happens for C vs F contiguous inputs: - Because order="C" is explicitly requesting a C-style memory representation, if the input provided by arr is F-contiguous (or otherwise not C-contiguous), producing a C-contiguous result requires creating a new array (or otherwise changing strides/layout). CuPy’s docs indicate asarray uses array(..., copy=False, order=order) semantics, i.e., it will only avoid copying when it can satisfy the requested order without needing a new array [1]. Therefore, when arr is F-contiguous or otherwise not C-contiguous, CuPy 13.x will allocate a new GPU array to satisfy order="C", and the original array object (the input arr) is left unchanged (it is not mutated in-place).

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate and print the relevant parts of pca.pyx around the cited lines and usages.
FILE="python/cuvs/cuvs/preprocessing/pca/pca.pyx"

# Basic context around _match_layout
nl -ba "$FILE" | sed -n '260,340p'

echo "-----"

# Show fit/fit_transform/transform/inverse_transform sections where outputs are created or normalized
nl -ba "$FILE" | sed -n '340,520p'

echo "-----"

# Find all usages of _match_layout in this file
rg -n "_match_layout\(" "$FILE"

Repository: rapidsai/cuvs

Length of output: 100


🌐 Web query:

In CuPy (v13.x), how do arr.flags.c_contiguousandarr.flags.f_contiguousbehave for singleton dimensions (e.g., shape(1, n)or(n, 1))? Can an array be both C- and F-contiguous, and how does this affect selecting order="C"vsorder="F"?

💡 Result:

In CuPy 13.x, arr.flags.c_contiguous and arr.flags.f_contiguous follow NumPy’s contiguity flag semantics, including special handling for singleton dimensions. 1) Singleton dimensions and the meaning of the flags - For a 2+ dimensional array, C- and F-contiguity are normally mutually exclusive (“can be C- or F- contiguous, or neither, but not both”) [1]. - However, there is a relaxed case when dimensions have size 1: strides along size-1 axes do not affect the effective memory layout because those indices are always 0. In NumPy docs, if self.shape[k] == 1, then self.strides[k] is arbitrary without affecting offsets, which implies arrays with such singleton dimensions can satisfy both C and F contiguity checks under the relaxed/appropriate definition [2]. Concrete consequences: - Shape (1, n): - c_contiguous is True (standard row-major layout for the non-singleton dimension). - f_contiguous is also typically True because the singleton first dimension makes the column-major layout compatible with the same physical memory (the only dimension that matters is the length-n one, and the contiguity checks can both pass) [2]. - Shape (n, 1): - f_contiguous is True. - c_contiguous is also typically True for the same reason: the singleton last dimension makes the stride along that axis irrelevant for contiguity, allowing both checks to pass [2]. So for the common “one singleton, one non-singleton” 2D cases (1, n) and (n, 1), arrays can be both C- and F-contiguous (both flags True) under the normal semantics used by NumPy/CuPy. 2) Can an array be both C- and F-contiguous? - Yes. NumPy’s contiguous definitions allow both flags to be true in the presence of singleton (size-1) dimensions; the NumPy docs explicitly discuss that singleton dimensions make the stride along them arbitrary and that this can result in both contiguity flags being set simultaneously, even for higher-dimensional arrays [2]. 3) How this affects selecting order="C" vs order="F" - CuPy functions that take an order argument (e.g., in array constructors or asarray/conversion-style APIs) generally use the requested memory layout; for order='A', CuPy chooses Fortran if the array is Fortran-contiguous, otherwise C [3]. - If you have an array that is both C- and F-contiguous (common for (1, n) / (n, 1)), then “Fortran-contiguous” is true as well as “C-contiguous”. Therefore, any decision logic that checks only one flag (or uses A) may pick either layout but will usually be no-ops in terms of data movement because the array already satisfies both contiguity properties [3][2]. Practical takeaway: - For arrays with singleton dimensions where both flags are True, choosing order='C' vs order='F' should generally not force a copy/reorder, because both contiguity conditions hold already [2]. - For arrays with no singleton/degenerate dimensions, C and F contiguity are generally not both true for 2+ dimensions, so order='C' vs order='F' typically distinguishes whether a copy/reformat is needed [1].

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="python/cuvs/cuvs/preprocessing/pca/pca.pyx"

test -f "$FILE" || { echo "Missing file: $FILE"; exit 1; }

echo "=== _match_layout (around line ~300) ==="
sed -n '285,330p' "$FILE"

echo
echo "=== _match_layout call sites ==="
rg -n "_match_layout\(" "$FILE" || true

echo
echo "=== fit / fit_transform / transform / inverse_transform (broad) ==="
rg -n "def (fit|fit_transform|transform|inverse_transform)\b" "$FILE" || true

# Dump nearby output-buffer handling lines mentioned in the review comment
echo
echo "=== Lines around 360-390 ==="
sed -n '350,410p' "$FILE"

echo
echo "=== Lines around 430-470 ==="
sed -n '420,480p' "$FILE"

echo
echo "=== Docstrings for public entry points (snippets) ==="
# Print the beginning of each public method to see what docstrings claim
python3 - <<'PY'
import re, itertools, pathlib
p = pathlib.Path("python/cuvs/cuvs/preprocessing/pca/pca.pyx")
txt = p.read_text()
for name in ["fit", "fit_transform", "transform", "inverse_transform"]:
    m = re.search(rf"def\s+{name}\b[^(]*\([^)]*\):", txt)
    if not m:
        continue
    start = m.start()
    # take next 1200 chars after signature
    snippet = txt[m.start(): m.start()+1200]
    print("\n---", name, "---")
    print(snippet.splitlines()[:60])
PY

Repository: rapidsai/cuvs

Length of output: 11941


🏁 Script executed:

#!/bin/bash
set -euo pipefail
FILE="python/cuvs/cuvs/preprocessing/pca/pca.pyx"

echo "=== _ensure_device_contiguous ==="
rg -n "def _ensure_device_contiguous" "$FILE"
sed -n '1,120p' "$FILE"
sed -n '120,200p' "$FILE"

echo
echo "=== _validate_pca_input ==="
rg -n "def _validate_pca_input" "$FILE"
sed -n '200,270p' "$FILE"

echo
echo "=== fit() / fit_transform() bodies (validation + array creation) ==="
sed -n '120,270p' "$FILE"

echo
echo "=== transform() / inverse_transform() signatures area already shown; now include rest of each for dtype checks ==="
sed -n '330,480p' "$FILE"

Repository: rapidsai/cuvs

Length of output: 20048


Stop allocating/replacing caller-supplied PCA output buffers (trans_input / output)
transform() and inverse_transform() treat trans_input/output as pre-allocated outputs, but when they’re not already in the requested layout they’re normalized via _match_layout(), which ultimately calls cp.asarray(arr, order=order)—so CuPy can return a new array while leaving the caller’s original buffer untouched. _match_layout() also does not validate float32, so incorrect dtypes can reach the C++ layer. Remove _match_layout() from writable-output paths; instead validate dtype/layout and fail fast (or explicitly write into the provided buffer).

  • Docstrings: fit() / fit_transform() claim “no internal copy/transpose is performed” even though _ensure_device_contiguous() coerces to float32 and can make a contiguous copy when needed; adjust wording to reflect when copying can occur.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cuvs/cuvs/preprocessing/pca/pca.pyx` around lines 300 - 309, Replace
use of _match_layout() on caller-supplied writable buffers (trans_input /
output) in transform() and inverse_transform() so we never allocate and return a
new array instead of writing into the caller buffer; instead, check that the
provided buffers have dtype float32 and the required memory order (C or F) up
front and either raise a clear error if they do not match or perform an explicit
in-place copy/transpose into the provided buffer before calling the C++ layer.
Remove or avoid calling _match_layout() on writable-output code paths and add
explicit dtype validation (float32) and contiguous-order checks where
_match_layout() was used; keep _match_layout() for read-only normalization only.
Also update the fit() / fit_transform() docstrings to stop claiming "no internal
copy/transpose is performed" and explicitly note that
_ensure_device_contiguous() may coerce to float32 and/or make a contiguous copy
when needed.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp/cmake/thirdparty/get_raft.cmake`:
- Around line 63-64: Replace the temporary hard-coded RAFT fork/tag values in
the get_raft.cmake snippet with the project variables so the build uses the
configured defaults: change the FORK and PINNED_TAG entries that currently read
as literal "aamijar" and "pca-row-major" to reference ${RAFT_FORK} and
${RAFT_PINNED_TAG} respectively; ensure this change is applied where FORK and
PINNED_TAG are defined in the same block so the build picks up the
RAFT_FORK/RAFT_PINNED_TAG variables (after confirming the upstream RAFT PR `#3036`
is merged).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: bf425a06-8e61-4d15-be24-486dc2b6cf53

📥 Commits

Reviewing files that changed from the base of the PR and between d877a92 and 6408c8e.

📒 Files selected for processing (1)
  • cpp/cmake/thirdparty/get_raft.cmake

Comment on lines +63 to +64
FORK aamijar
PINNED_TAG pca-row-major
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: Revert to default RAFT fork/tag before merge.

These hard-coded values point to a temporary development branch and must be reverted before this PR is merged. Merging as-is would break the build for all users once the pca-row-major branch is removed.

Required before merge:

  1. Ensure RAFT PR #3036 is merged to the main RAFT repository
  2. Revert these lines to use the default variables:
    FORK                     ${RAFT_FORK}
    PINNED_TAG               ${RAFT_PINNED_TAG}

This configuration is appropriate for testing during development, but the author's own comment "Revert later" confirms it must not reach main.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/cmake/thirdparty/get_raft.cmake` around lines 63 - 64, Replace the
temporary hard-coded RAFT fork/tag values in the get_raft.cmake snippet with the
project variables so the build uses the configured defaults: change the FORK and
PINNED_TAG entries that currently read as literal "aamijar" and "pca-row-major"
to reference ${RAFT_FORK} and ${RAFT_PINNED_TAG} respectively; ensure this
change is applied where FORK and PINNED_TAG are defined in the same block so the
build picks up the RAFT_FORK/RAFT_PINNED_TAG variables (after confirming the
upstream RAFT PR `#3036` is merged).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AlloyDB improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

[FEA] row-major PCA

1 participant