Native row-major PCA#2125
Conversation
| FORK ${RAFT_FORK} | ||
| PINNED_TAG ${RAFT_PINNED_TAG} | ||
| FORK aamijar | ||
| PINNED_TAG pca-row-major |
📝 WalkthroughSummary by CodeRabbit
WalkthroughThis PR adds native row-major (C-contiguous) layout support to PCA across C, C++, and Python bindings by detecting DLPack strides and preserving/dispatching on C- vs F-contiguity instead of forcing col-major conversions. ChangesPCA Row-Major Layout Support
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
python/cuvs/cuvs/tests/test_pca.py (1)
131-149: ⚡ Quick winThis test can't detect an internal layout-conversion copy.
pca.fit(params, X)cannot changeX.data.ptr, so this still passes even if the implementation creates a temporary reordered buffer and uses that internally. That makes the regression signal much weaker than the test name suggests. A more reliable check would spy on_ensure_device_contiguous()or another observable copy path instead of re-reading the caller's original pointer.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuvs/cuvs/tests/test_pca.py` around lines 131 - 149, The test test_row_major_no_copy is currently ineffective because pca.fit(params, X) cannot mutate X.data.ptr even if the implementation copies to a temporary; instead, spy on the internal copy path: patch or mock pca._ensure_device_contiguous (or the actual helper that performs device-side reordering) before calling pca.fit and assert that the helper was not invoked; keep the existing setup (C-contiguous cp.ndarray, original_ptr) but replace the final pointer assertion with an assertion that pca._ensure_device_contiguous.assert_not_called() (or equivalent), so the test fails if the implementation performs an internal layout-conversion copy.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@python/cuvs/cuvs/preprocessing/pca/pca.pyx`:
- Around line 152-154: The docstring currently claims "no internal
copy/transpose is performed" which is false in general; update the public
docstrings in pca.pyx to narrow the guarantee to state that zero-copy is only
guaranteed for already-contiguous float32 device (GPU) inputs and that
_ensure_device_contiguous will perform copies for host inputs, dtype coercion,
and non-contiguous device views (i.e., hidden allocations may occur). Edit both
occurrences noted to explicitly call out these cases and the potential for
allocations, and mention _ensure_device_contiguous by name so callers know where
coercion/copies happen.
- Around line 300-309: Replace use of _match_layout() on caller-supplied
writable buffers (trans_input / output) in transform() and inverse_transform()
so we never allocate and return a new array instead of writing into the caller
buffer; instead, check that the provided buffers have dtype float32 and the
required memory order (C or F) up front and either raise a clear error if they
do not match or perform an explicit in-place copy/transpose into the provided
buffer before calling the C++ layer. Remove or avoid calling _match_layout() on
writable-output code paths and add explicit dtype validation (float32) and
contiguous-order checks where _match_layout() was used; keep _match_layout() for
read-only normalization only. Also update the fit() / fit_transform() docstrings
to stop claiming "no internal copy/transpose is performed" and explicitly note
that _ensure_device_contiguous() may coerce to float32 and/or make a contiguous
copy when needed.
---
Nitpick comments:
In `@python/cuvs/cuvs/tests/test_pca.py`:
- Around line 131-149: The test test_row_major_no_copy is currently ineffective
because pca.fit(params, X) cannot mutate X.data.ptr even if the implementation
copies to a temporary; instead, spy on the internal copy path: patch or mock
pca._ensure_device_contiguous (or the actual helper that performs device-side
reordering) before calling pca.fit and assert that the helper was not invoked;
keep the existing setup (C-contiguous cp.ndarray, original_ptr) but replace the
final pointer assertion with an assertion that
pca._ensure_device_contiguous.assert_not_called() (or equivalent), so the test
fails if the implementation performs an internal layout-conversion copy.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 5c9a4634-1195-46d5-bdd1-e9066dc65a14
📒 Files selected for processing (8)
c/include/cuvs/preprocessing/pca.hc/src/preprocessing/pca.cppcpp/include/cuvs/preprocessing/pca.hppcpp/src/preprocessing/pca/detail/pca.cuhcpp/src/preprocessing/pca/pca.cucpp/tests/preprocessing/pca.cupython/cuvs/cuvs/preprocessing/pca/pca.pyxpython/cuvs/cuvs/tests/test_pca.py
| if ary.flags.c_contiguous: | ||
| return ary, "C" | ||
| if ary.flags.f_contiguous: | ||
| return ary, "F" |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In CuPy 13.x, can a 2D array with a singleton dimension be both C-contiguous and F-contiguous, and can the actual memory layout be distinguished from its strides?
💡 Result:
Yes. In CuPy 13.x, a 2D array with a singleton dimension (e.g., shape (1, N) or (N, 1)) can be both C-contiguous and F-contiguous. And in that degenerate singleton-dimension case, you generally cannot (reliably) distinguish the “actual memory layout” (row-major vs column-major in the usual sense) from the strides alone: when an axis has length 1, the contiguity checks effectively ignore/relax the corresponding stride components, and a contiguous array may report strides that are not uniquely determined by physical adjacency. Key evidence: 1) NumPy flags explicitly document that “Even for contiguous arrays a stride for a given dimension arr.strides[dim] may be arbitrary if arr.shape[dim] == 1”. They also state that arrays can be both C- and F-contiguous simultaneously.[1] 2) NumPy’s contiguity-flag implementation explains that with relaxed stride checking, it becomes possible for e.g. a 10x1 array to be both C- and F-contiguous, because strides for size-1 dimensions are ignored/relaxed; this breaks assumptions that certain strides always equal itemsize for contiguous arrays.[2] 3) CuPy’s ndarray.flags in CuPy exposes only c_contiguous and f_contiguous (plus owndata) as the memory-layout information, rather than implying that strides uniquely determine physical layout in all cases.[3] 4) CuPy dev discussion/bugfix notes for “both C and F-contiguous” arrays indicate that strides may differ or change even when contiguity flags indicate the array is contiguous in both senses, and it cautions that users should not rely on strides for contiguity-dependent behavior.[4] Practical takeaway for CuPy 13.x: use a.flags.c_contiguous / a.flags.f_contiguous (and/or perform access/traversal tests), but do not expect a.strides to uniquely encode which “order” (C vs F) you’d normally infer for non-degenerate shapes.
Citations:
- 1: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.flags.html
- 2: https://github.com/numpy/numpy/blob/23c05e6f1b392f80f749dbb4668b5e7b52aef014/numpy/core/src/multiarray/flagsobject.c
- 3: https://docs.cupy.dev/en/stable/reference/generated/cupy.ndarray.html
- 4: BUG: Fix new strides when array is both C and F-contiguous cupy/cupy#7438
🏁 Script executed:
#!/bin/bash
set -euo pipefail
ls -la
rg -n "return ary, \"C\"|return ary, \"F\"|c_contiguous|f_contiguous" python/cuvs/cuvs/preprocessing/pca/pca.pyx
sed -n '90,170p' python/cuvs/cuvs/preprocessing/pca/pca.pyxRepository: rapidsai/cuvs
Length of output: 4849
🏁 Script executed:
sed -n '170,360p' python/cuvs/cuvs/preprocessing/pca/pca.pyxRepository: rapidsai/cuvs
Length of output: 7327
🏁 Script executed:
sed -n '360,560p' python/cuvs/cuvs/preprocessing/pca/pca.pyxRepository: rapidsai/cuvs
Length of output: 4118
🏁 Script executed:
rg -n "cuvsPcaFit\\b|cuvsPcaFitTransform\\b|cuvsPcaInverseTransform\\b" -S cpp python/cuvs | head -n 50Repository: rapidsai/cuvs
Length of output: 696
🏁 Script executed:
sed -n '1,120p' python/cuvs/cuvs/preprocessing/pca/pca.pxdRepository: rapidsai/cuvs
Length of output: 2948
🏁 Script executed:
ls -la c* && find cpp -maxdepth 4 -type f -name "pca.h" -o -name "*pca*" | headRepository: rapidsai/cuvs
Length of output: 3561
🏁 Script executed:
fd -i "pca.h" -a . | head
fd -i "preprocessing/pca.h" -a . | head
rg -n "DLManagedTensor|strides|layout|contiguous|c_contiguous|f_contiguous|row|col|order" cpp/src/preprocessing/pca -S
ls -la cpp/include/cuvs/preprocessing || true
ls -la cpp/include/cuvs/preprocessing/pca* || trueRepository: rapidsai/cuvs
Length of output: 1331
🏁 Script executed:
sed -n '1,220p' c/include/cuvs/preprocessing/pca.hRepository: rapidsai/cuvs
Length of output: 8710
🏁 Script executed:
sed -n '220,520p' c/include/cuvs/preprocessing/pca.h
sed -n '1,260p' cpp/src/preprocessing/pca/pca.cu
sed -n '1,240p' cpp/include/cuvs/preprocessing/pca.hppRepository: rapidsai/cuvs
Length of output: 17615
🏁 Script executed:
ls -la cpp/src/preprocessing/pca && ls -la cpp/src/preprocessing/pca/detail || true
rg -n "strides|stride|DLManagedTensor|ndim|shape|dl_tensor|row_major|col_major|contig|c_contiguous|f_contiguous" cpp/src/preprocessing/pca -SRepository: rapidsai/cuvs
Length of output: 1212
🏁 Script executed:
sed -n '1,220p' cpp/src/preprocessing/pca/detail/pca.cuh
sed -n '1,260p' cpp/src/preprocessing/pca/pca.cuRepository: rapidsai/cuvs
Length of output: 11150
Stop always classifying singleton-dimension inputs as "C" when both C- and F-contiguous flags are set.
In CuPy 13.x, shapes like (1, N) / (N, 1) can report both ary.flags.c_contiguous and ary.flags.f_contiguous, but for size-1 axes the strides don’t uniquely determine an original row/col-major “order”. The current _ensure_device_contiguous logic returns "C" whenever c_contiguous is true, so PCA outputs (e.g., components, trans_input) will consistently use "C" layout for this degenerate case, which conflicts with the stated “layout preserved natively” expectation. Either implement a best-effort preservation using the incoming array’s CUDA/DLPack stride metadata, or document that C/F is ambiguous for singleton-dimension inputs and that "C" is the chosen convention.
| The input layout (C-contiguous / row-major or F-contiguous / col-major) | ||
| is preserved natively; no internal copy/transpose is performed. Output | ||
| arrays use the same layout as the input. |
There was a problem hiding this comment.
The public docs overpromise zero-copy behavior.
_ensure_device_contiguous() still copies for host inputs, dtype coercion, and non-contiguous device views, so "no internal copy/transpose is performed" is only true for already-contiguous float32 device inputs. Please narrow the wording so callers are not surprised by hidden allocations. As per coding guidelines, "Document missing or incorrect docstrings for public methods and classes, including parameter documentation and usage examples for memory resources."
Also applies to: 226-227
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@python/cuvs/cuvs/preprocessing/pca/pca.pyx` around lines 152 - 154, The
docstring currently claims "no internal copy/transpose is performed" which is
false in general; update the public docstrings in pca.pyx to narrow the
guarantee to state that zero-copy is only guaranteed for already-contiguous
float32 device (GPU) inputs and that _ensure_device_contiguous will perform
copies for host inputs, dtype coercion, and non-contiguous device views (i.e.,
hidden allocations may occur). Edit both occurrences noted to explicitly call
out these cases and the potential for allocations, and mention
_ensure_device_contiguous by name so callers know where coercion/copies happen.
| def _match_layout(arr, order): | ||
| """Return ``arr`` in the requested layout (``"C"`` or ``"F"``). | ||
|
|
||
| Avoids a copy when the array is already in the target layout.""" | ||
| arr = cp.asarray(arr) | ||
| if order == "C" and arr.flags.c_contiguous: | ||
| return arr | ||
| if order == "F" and arr.flags.f_contiguous: | ||
| return arr | ||
| return cp.asarray(arr, order=order) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
For CuPy 13.x, does cupy.asarray(arr, order="C")allocate a new array whenarr is F-contiguous or otherwise not already C-contiguous, leaving the original array object unchanged?
💡 Result:
For CuPy 13.x, cupy.asarray(arr, order="C") does not guarantee zero-copy when arr is not already C-contiguous. What CuPy promises from the API level: - cupy.asarray is defined as equivalent to array(a, dtype, copy=False, order=order) [1]. - The order argument is a request for the desired memory layout (e.g., C-style vs Fortran-style) [1]. - CuPy returns “An array on the current device” and performs a copy if needed (the “no copy is performed” guarantee applies only when a is already on the device; for host arrays, data movement/allocation is inherently required) [1]. What happens for C vs F contiguous inputs: - Because order="C" is explicitly requesting a C-style memory representation, if the input provided by arr is F-contiguous (or otherwise not C-contiguous), producing a C-contiguous result requires creating a new array (or otherwise changing strides/layout). CuPy’s docs indicate asarray uses array(..., copy=False, order=order) semantics, i.e., it will only avoid copying when it can satisfy the requested order without needing a new array [1]. Therefore, when arr is F-contiguous or otherwise not C-contiguous, CuPy 13.x will allocate a new GPU array to satisfy order="C", and the original array object (the input arr) is left unchanged (it is not mutated in-place).
Citations:
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate and print the relevant parts of pca.pyx around the cited lines and usages.
FILE="python/cuvs/cuvs/preprocessing/pca/pca.pyx"
# Basic context around _match_layout
nl -ba "$FILE" | sed -n '260,340p'
echo "-----"
# Show fit/fit_transform/transform/inverse_transform sections where outputs are created or normalized
nl -ba "$FILE" | sed -n '340,520p'
echo "-----"
# Find all usages of _match_layout in this file
rg -n "_match_layout\(" "$FILE"Repository: rapidsai/cuvs
Length of output: 100
🌐 Web query:
In CuPy (v13.x), how do arr.flags.c_contiguousandarr.flags.f_contiguousbehave for singleton dimensions (e.g., shape(1, n)or(n, 1))? Can an array be both C- and F-contiguous, and how does this affect selecting order="C"vsorder="F"?
💡 Result:
In CuPy 13.x, arr.flags.c_contiguous and arr.flags.f_contiguous follow NumPy’s contiguity flag semantics, including special handling for singleton dimensions. 1) Singleton dimensions and the meaning of the flags - For a 2+ dimensional array, C- and F-contiguity are normally mutually exclusive (“can be C- or F- contiguous, or neither, but not both”) [1]. - However, there is a relaxed case when dimensions have size 1: strides along size-1 axes do not affect the effective memory layout because those indices are always 0. In NumPy docs, if self.shape[k] == 1, then self.strides[k] is arbitrary without affecting offsets, which implies arrays with such singleton dimensions can satisfy both C and F contiguity checks under the relaxed/appropriate definition [2]. Concrete consequences: - Shape (1, n): - c_contiguous is True (standard row-major layout for the non-singleton dimension). - f_contiguous is also typically True because the singleton first dimension makes the column-major layout compatible with the same physical memory (the only dimension that matters is the length-n one, and the contiguity checks can both pass) [2]. - Shape (n, 1): - f_contiguous is True. - c_contiguous is also typically True for the same reason: the singleton last dimension makes the stride along that axis irrelevant for contiguity, allowing both checks to pass [2]. So for the common “one singleton, one non-singleton” 2D cases (1, n) and (n, 1), arrays can be both C- and F-contiguous (both flags True) under the normal semantics used by NumPy/CuPy. 2) Can an array be both C- and F-contiguous? - Yes. NumPy’s contiguous definitions allow both flags to be true in the presence of singleton (size-1) dimensions; the NumPy docs explicitly discuss that singleton dimensions make the stride along them arbitrary and that this can result in both contiguity flags being set simultaneously, even for higher-dimensional arrays [2]. 3) How this affects selecting order="C" vs order="F" - CuPy functions that take an order argument (e.g., in array constructors or asarray/conversion-style APIs) generally use the requested memory layout; for order='A', CuPy chooses Fortran if the array is Fortran-contiguous, otherwise C [3]. - If you have an array that is both C- and F-contiguous (common for (1, n) / (n, 1)), then “Fortran-contiguous” is true as well as “C-contiguous”. Therefore, any decision logic that checks only one flag (or uses A) may pick either layout but will usually be no-ops in terms of data movement because the array already satisfies both contiguity properties [3][2]. Practical takeaway: - For arrays with singleton dimensions where both flags are True, choosing order='C' vs order='F' should generally not force a copy/reorder, because both contiguity conditions hold already [2]. - For arrays with no singleton/degenerate dimensions, C and F contiguity are generally not both true for 2+ dimensions, so order='C' vs order='F' typically distinguishes whether a copy/reformat is needed [1].
Citations:
- 1: https://github.com/numpy/numpy/blob/23c05e6f1b392f80f749dbb4668b5e7b52aef014/numpy/core/src/multiarray/flagsobject.c
- 2: https://github.com/numpy/numpy/blob/main/doc/source/reference/arrays.ndarray.rst
- 3: https://docs.cupy.dev/en/stable/reference/generated/cupy.ndarray.html
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="python/cuvs/cuvs/preprocessing/pca/pca.pyx"
test -f "$FILE" || { echo "Missing file: $FILE"; exit 1; }
echo "=== _match_layout (around line ~300) ==="
sed -n '285,330p' "$FILE"
echo
echo "=== _match_layout call sites ==="
rg -n "_match_layout\(" "$FILE" || true
echo
echo "=== fit / fit_transform / transform / inverse_transform (broad) ==="
rg -n "def (fit|fit_transform|transform|inverse_transform)\b" "$FILE" || true
# Dump nearby output-buffer handling lines mentioned in the review comment
echo
echo "=== Lines around 360-390 ==="
sed -n '350,410p' "$FILE"
echo
echo "=== Lines around 430-470 ==="
sed -n '420,480p' "$FILE"
echo
echo "=== Docstrings for public entry points (snippets) ==="
# Print the beginning of each public method to see what docstrings claim
python3 - <<'PY'
import re, itertools, pathlib
p = pathlib.Path("python/cuvs/cuvs/preprocessing/pca/pca.pyx")
txt = p.read_text()
for name in ["fit", "fit_transform", "transform", "inverse_transform"]:
m = re.search(rf"def\s+{name}\b[^(]*\([^)]*\):", txt)
if not m:
continue
start = m.start()
# take next 1200 chars after signature
snippet = txt[m.start(): m.start()+1200]
print("\n---", name, "---")
print(snippet.splitlines()[:60])
PYRepository: rapidsai/cuvs
Length of output: 11941
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="python/cuvs/cuvs/preprocessing/pca/pca.pyx"
echo "=== _ensure_device_contiguous ==="
rg -n "def _ensure_device_contiguous" "$FILE"
sed -n '1,120p' "$FILE"
sed -n '120,200p' "$FILE"
echo
echo "=== _validate_pca_input ==="
rg -n "def _validate_pca_input" "$FILE"
sed -n '200,270p' "$FILE"
echo
echo "=== fit() / fit_transform() bodies (validation + array creation) ==="
sed -n '120,270p' "$FILE"
echo
echo "=== transform() / inverse_transform() signatures area already shown; now include rest of each for dtype checks ==="
sed -n '330,480p' "$FILE"Repository: rapidsai/cuvs
Length of output: 20048
Stop allocating/replacing caller-supplied PCA output buffers (trans_input / output)
transform() and inverse_transform() treat trans_input/output as pre-allocated outputs, but when they’re not already in the requested layout they’re normalized via _match_layout(), which ultimately calls cp.asarray(arr, order=order)—so CuPy can return a new array while leaving the caller’s original buffer untouched. _match_layout() also does not validate float32, so incorrect dtypes can reach the C++ layer. Remove _match_layout() from writable-output paths; instead validate dtype/layout and fail fast (or explicitly write into the provided buffer).
- Docstrings:
fit()/fit_transform()claim “no internal copy/transpose is performed” even though_ensure_device_contiguous()coerces tofloat32and can make a contiguous copy when needed; adjust wording to reflect when copying can occur.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@python/cuvs/cuvs/preprocessing/pca/pca.pyx` around lines 300 - 309, Replace
use of _match_layout() on caller-supplied writable buffers (trans_input /
output) in transform() and inverse_transform() so we never allocate and return a
new array instead of writing into the caller buffer; instead, check that the
provided buffers have dtype float32 and the required memory order (C or F) up
front and either raise a clear error if they do not match or perform an explicit
in-place copy/transpose into the provided buffer before calling the C++ layer.
Remove or avoid calling _match_layout() on writable-output code paths and add
explicit dtype validation (float32) and contiguous-order checks where
_match_layout() was used; keep _match_layout() for read-only normalization only.
Also update the fit() / fit_transform() docstrings to stop claiming "no internal
copy/transpose is performed" and explicitly note that
_ensure_device_contiguous() may coerce to float32 and/or make a contiguous copy
when needed.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cpp/cmake/thirdparty/get_raft.cmake`:
- Around line 63-64: Replace the temporary hard-coded RAFT fork/tag values in
the get_raft.cmake snippet with the project variables so the build uses the
configured defaults: change the FORK and PINNED_TAG entries that currently read
as literal "aamijar" and "pca-row-major" to reference ${RAFT_FORK} and
${RAFT_PINNED_TAG} respectively; ensure this change is applied where FORK and
PINNED_TAG are defined in the same block so the build picks up the
RAFT_FORK/RAFT_PINNED_TAG variables (after confirming the upstream RAFT PR `#3036`
is merged).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: bf425a06-8e61-4d15-be24-486dc2b6cf53
📒 Files selected for processing (1)
cpp/cmake/thirdparty/get_raft.cmake
| FORK aamijar | ||
| PINNED_TAG pca-row-major |
There was a problem hiding this comment.
Critical: Revert to default RAFT fork/tag before merge.
These hard-coded values point to a temporary development branch and must be reverted before this PR is merged. Merging as-is would break the build for all users once the pca-row-major branch is removed.
Required before merge:
- Ensure RAFT PR
#3036is merged to the main RAFT repository - Revert these lines to use the default variables:
FORK ${RAFT_FORK} PINNED_TAG ${RAFT_PINNED_TAG}
This configuration is appropriate for testing during development, but the author's own comment "Revert later" confirms it must not reach main.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cpp/cmake/thirdparty/get_raft.cmake` around lines 63 - 64, Replace the
temporary hard-coded RAFT fork/tag values in the get_raft.cmake snippet with the
project variables so the build uses the configured defaults: change the FORK and
PINNED_TAG entries that currently read as literal "aamijar" and "pca-row-major"
to reference ${RAFT_FORK} and ${RAFT_PINNED_TAG} respectively; ensure this
change is applied where FORK and PINNED_TAG are defined in the same block so the
build picks up the RAFT_FORK/RAFT_PINNED_TAG variables (after confirming the
upstream RAFT PR `#3036` is merged).
Currently PCA only supports col-major in the cpp layer. If a consumer's input dataset is in row-major it must be copied as col-major. With the introduction of RAFT native row-major support in PCA rapidsai/raft#3036, users can seamlessly pass in row-major datasets.
This PR exposes the cpp PCA row-major APIs and updates the C and Python APIs accordingly.
Resolves #2115. Depends on rapidsai/raft#3036.