-
Notifications
You must be signed in to change notification settings - Fork 2
5. Code Reference
This page documents the Python, R, and MATLAB APIs for gcPCA.
This reference is intended as a technical lookup for functions, arguments, and outputs.
Main class for generalized contrastive PCA.
from contrastive_methods import gcPCAgcPCA(
method='v4',
Ncalc=np.inf,
Nshuffle=0,
normalize_flag=True,
alpha=1,
alpha_null=0.975,
cond_number=1e13
)method
gcPCA version to use:
-
'v1'— cPCA -
'v2'— Ra / Rb -
'v3'— (Ra − Rb) / Rb -
'v4'— (Ra − Rb) / (Ra + Rb) (recommended)
Orthogonal variants:
'v2.1''v3.1''v4.1'
Ncalc
Number of components to compute (used for orthogonal versions)
Nshuffle
Number of shuffles for null distribution
normalize_flag
-
True→ z-score and L2 normalization -
False→ user provides normalized data
alpha
Used only for v1, it corresponds to alpha of cPCA.
argmax_x xᵀ (CA − α CB) x
subject to xᵀx = 1
cond_number
Numerical stability parameter, this determines if the matrix is full rank or needs correction
Fit gcPCA model.
model.fit(Ra, Rb)Ra — matrix (samples × features)
Rb — matrix (samples × features)
Both datasets must:
- Have same number of features
- Have samples in rows
Project data onto gcPCs
model.transform(Ra, Rb)Outputs:
Ra_transformed_Rb_transformed_
Fit and transform in one step
model.fit_transform(Ra, Rb)After fitting:
model.loadings_
model.Ra_scores_
model.Rb_scores_
model.Ra_values_
model.Rb_values_
model.objective_values_
model.objective_function_gcPC loadings
Shape:
features × components
Unit-normalized scores for each dataset
Shape:
samples × components
The scores returned by gcPCA are unit normalized.
This means:
-
Ra_scores_andRb_scores_have unit magnitude - Values may appear small compared to PCA scores
To recover the full magnitude:
Ra_full = Ra_scores_ * Ra_values_
Rb_full = Rb_scores_ * Rb_values_Where:
-
Ra_values_— magnitude for each component -
Rb_values_— magnitude for each component
gcPCA values for each component
Interpretation depends on method:
For v4:
- +1 → variance only in Ra
- -1 → variance only in Rb
- 0 → equal variance
Sparse version of gcPCA.
from contrastive_methods import sparse_gcPCAsparse_gcPCA(
method='v4',
Nsparse=np.inf,
lasso_penalty=...,
ridge_penalty=0
)After fitting:
model.sparse_loadings_
model.Ra_scores_
model.Rb_scores_
model.Ra_values_
model.Rb_values_Main function for generalized contrastive PCA in R.
library(gcpca)fit <- gcPCA(
Ra,
Rb,
method = "v4",
Ncalc = NULL
)Ra
Matrix for condition A (samples × features)
Rb
Matrix for condition B (samples × features)
Both datasets must:
- Have the same number of features
- Have samples in rows
- Contain numeric values
method
gcPCA version to use:
-
"v1"— cPCA -
"v2"— Ra / Rb -
"v3"— (Ra − Rb) / Rb -
"v4"— (Ra − Rb) / (Ra + Rb) (recommended)
Orthogonal variants:
"v2.1""v3.1""v4.1"
Ncalc
Number of components to compute, works only for orthogonal version.
If not specified, all available components are returned.
Project data onto gcPCs
pred <- predict(fit, Ra = Ra, Rb = Rb)Outputs:
Ra_scoresRb_scores
After prediction:
pred$Ra_scores
pred$Rb_scoresWhen extracted from gcPCA function, it returns unit-normalized scores for each dataset. When scores are calculated by predict(), the scores have their magnitude in their dataset.
Shape:
samples × components
If you are extracting scores from predict(), the scores returned have their magnitude in each dataset (Ra/Rb)
If you are extracting scores from the model fit, as in the Python implementation, gcPCA returns unit-normalized scores. See more below.
This means:
-
Ra_scoresandRb_scoresare normalized - Magnitude information is stored separately
To recover full magnitude:
Ra_full <- pred$Ra_scores * pred$Ra_values
Rb_full <- pred$Rb_scores * pred$Rb_valuesWhere:
-
Ra_values— magnitude for each component -
Rb_values— magnitude for each component
[NOTE] This is only for scores returned from the gcPCA function, not predict()
coef(fit)Returns:
gcPC loadings
Shape:
features × components
summary(fit)Displays:
- Method used
- Number of components
- Objective values
library(gcpca)
Ra <- matrix(rnorm(40 * 5), ncol = 5)
Rb <- matrix(rnorm(35 * 5), ncol = 5)
fit <- gcPCA(Ra, Rb, method = "v4", Ncalc = 3)
pred <- predict(fit, Ra = Ra, Rb = Rb)
pred$Ra_scores
pred$Rb_scoresSparse version of gcPCA in R.
Use this when you want gcPC loadings with feature selection across a range of lasso penalties. The main gcPCA arguments are the same as above. The most important sparse-specific inputs are the ones controlling sparsity.
fit_sparse <- sparse_gcPCA(
Ra,
Rb,
method = "v4",
Nsparse = 2,
lasso_penalty = c(0.05, 0.1),
ridge_penalty = 0
)Nsparse
Number of sparse gcPCs to estimate
lasso_penalty
Numeric vector of lasso penalties (also referred to as lambda)
- Larger values give sparser loadings
- Multiple values fit a full sparsity path
- One set of outputs is returned for each lambda value
ridge_penalty
Optional ridge penalty used in sparse optimization
The fitted object stores sparse loadings and projections across the full lasso path.
fit_sparse$sparse_loadings
fit_sparse$Ra_scores
fit_sparse$Rb_scores
fit_sparse$Ra_values
fit_sparse$Rb_values
fit_sparse$lasso_penaltyFor sparse_gcPCA(), outputs are returned as lists, with one element for each value in lasso_penalty.
This means:
-
fit_sparse$sparse_loadings[[i]]is the loading matrix for the i-th lambda -
fit_sparse$Ra_scores[[i]]andfit_sparse$Rb_scores[[i]]are the scores for that lambda -
fit_sparse$Ra_values[[i]]andfit_sparse$Rb_values[[i]]are the corresponding magnitudes
Prediction outputs also follow the same structure: they are returned as lists across lasso_penalty values, not as a single matrix.
So if multiple lasso penalties are used, the predicted scores are grouped by lambda.
fit_sparse <- sparse_gcPCA(
Ra,
Rb,
method = "v4",
Nsparse = 2,
lasso_penalty = c(0.05, 0.1, 0.2)
)
fit_sparse$sparse_loadings[[1]]
fit_sparse$Ra_scores[[1]]
fit_sparse$Rb_scores[[1]]Main MATLAB function for generalized contrastive PCA.
[B, S, X] = gcPCA(Za, Zb, gcPCAversion)The MATLAB implementation decomposes the two datasets into:
- condition-specific scores in
B - magnitude and objective information in
S - shared loadings in
X
Unlike PCA, the scores for the two conditions are stored separately.
[B, S, X] = gcPCA(Ra, Rb, gcPCAversion, varargin)Ra
Condition A matrix with shape:
samples × features
Rb
Condition B matrix with shape:
samples × features
Both datasets must:
- have the same number of features
- have samples in rows
- contain numeric values
gcPCAversion
Method to use:
-
1— cPCA -
2— Ra / Rb -
3— (Ra - Rb) / Rb -
4— (Ra - Rb) / (Ra + Rb)
Orthogonal variants:
2.13.14.1
For most use cases, start with version 4.
Additional parameters are passed through varargin.
'Nshuffle'
Number of shuffles instances for null distribution
'normalize'
Logical flag controlling normalization
-
true→ z-score and L2-normalize columns -
false→ use custom preprocessing
If normalization is turned off, the data should still be at least centered.
'Ncalc'
Maximum number of gcPCs to calculate
This is mainly relevant for orthogonal versions (2.1, 3.1, 4.1), which are iterative.
'alpha'
Used only for version 1 (cPCA)
argmax_x xᵀ (CA − α CB) x
subject to xᵀx = 1
'maxcond'
Maximum allowed condition number for denominator regularization
This is used for numerical stability.
'rPCAkeep'
Controls how much variance from the original PCA space is retained before gcPCA is computed, can vary from 0-1 and corresponds to the proportion of variance kept.
-
1keeps the full data - values less than
1remove low-variance and potentially unstable PCA dimensions
[B, S, X] = gcPCA(Ra, Rb, 4, 'normalize', true, 'Nshuffle', 0);Shared gcPC loadings
Shape:
features × components
These are the dimensions that differ most between Za and Zb.
Structure containing unit-normalized scores for each condition.
B.a
B.b
B.gcPCAversion-
B.a— scores for condition A -
B.b— scores for condition B
Shape of B.a and B.b:
samples × components
Structure containing objective values and per-condition magnitudes.
S.objective
S.objval
S.a
S.b
S.gcPCAversion-
S.objective— text description of the objective function -
S.objval— gcPCA objective value for each component -
S.a— magnitudes for condition A -
S.b— magnitudes for condition B
As in the Python and R implementations, MATLAB returns unit-normalized scores in B.a and B.b.
This means:
-
B.aandB.bare normalized - the component magnitudes are stored separately in
S.aandS.b
If users expect PCA-like score magnitudes, this is the main difference.
To recover full-magnitude projections:
Ba_full = B.a .* S.a'
Bb_full = B.b .* S.b'Where:
-
S.acontains the magnitudes for condition A -
S.bcontains the magnitudes for condition B
S.objval contains the gcPCA value for each component.
For version 4:
-
+1→ variance only inZa -
-1→ variance only inZb -
0→ equal variance in both conditions
This is the most directly interpretable version.
If 'Nshuffle' > 0, S also contains shuffle-based outputs:
S.objval_shuf
S.a_shuf
S.b_shufThese store objective values and magnitudes computed on shuffled data.
- MATLAB expects samples in rows and features in columns
-
RaandRbmust have matching feature dimensions - Orthogonal versions (
2.1,3.1,4.1) are slower because they are computed iteratively - Version
4is usually the best starting point for symmetric comparisons
Sparse version of gcPCA in MATLAB.
Use this when you want sparse loadings with feature selection controlled by lasso penalties.
[B, S, X] = sparse_gcPCA(Ra, Rb, 4, 'Nsparse', 2, 'lasso_penalty', [0.05 0.1 0.2]);'Nsparse'
Number of sparse gcPCs to estimate
'lasso_penalty'
Vector of lasso penalties
- larger values produce sparser loadings
- multiple values fit a sparsity path
'ridge_penalty'
Optional ridge penalty for sparse optimization
'maxiter'
Maximum number of optimization iterations
'tol'
Convergence tolerance
Sparse outputs are returned per lasso penalty.
-
B.a{idx}andB.b{idx}contain scores for theidx-th penalty -
S.a{idx}andS.b{idx}contain magnitudes for theidx-th penalty -
X{idx}contains the sparse loadings for theidx-th penalty
So unlike standard gcPCA(), the sparse MATLAB function returns results as cell arrays across lambda values.
[B, S, X] = sparse_gcPCA(Za, Zb, 4, ...
'Nsparse', 2, ...
'lasso_penalty', [0.05 0.1 0.2], ...
'ridge_penalty', 0);All implementations:
- Expect samples in rows
- Require matching feature dimensions
- Support v1–v4 variants
- Support orthogonal versions (.1)
| Feature | Python | R | MATLAB |
|---|---|---|---|
| Class-based | ✓ | ✗ | ✗ |
| Sparse version | ✓ | ✓ | ✓ |
| Orthogonal versions | ✓ | ✓ | ✓ |
| Null distribution | ✓ | ✓ | ✓ |
1. Quickstart Guide
2. Installation
3. Conceptual Overview
4. Mathematical Formulation
6. Input Data Guidelines
7. Interpreting Results