5. Code Reference

Code Reference

This page documents the Python, R, and MATLAB APIs for gcPCA.
This reference is intended as a technical lookup for functions, arguments, and outputs.

Jump to Section

Python API
R API
MATLAB API
Shared Notes
Cross-Language Differences

Python API

gcPCA Class

Main class for generalized contrastive PCA.

Import

from contrastive_methods import gcPCA

Initialization

gcPCA(
    method='v4',
    Ncalc=np.inf,
    Nshuffle=0,
    normalize_flag=True,
    alpha=1,
    alpha_null=0.975,
    cond_number=1e13
)

Parameters

method
gcPCA version to use:

'v1' — cPCA
'v2' — Ra / Rb
'v3' — (Ra − Rb) / Rb
'v4' — (Ra − Rb) / (Ra + Rb) (recommended)

Orthogonal variants:

'v2.1'
'v3.1'
'v4.1'

Ncalc
Number of components to compute (used for orthogonal versions)

Nshuffle
Number of shuffles for null distribution

normalize_flag

True → z-score and L2 normalization
False → user provides normalized data

alpha

Used only for v1, it corresponds to alpha of cPCA.

argmax_x  xᵀ (CA − α CB) x
subject to xᵀx = 1

cond_number

Numerical stability parameter, this determines if the matrix is full rank or needs correction

Methods

fit

Fit gcPCA model.

model.fit(Ra, Rb)

Inputs

Ra — matrix (samples × features)
Rb — matrix (samples × features)

Both datasets must:

Have same number of features
Have samples in rows

transform

Project data onto gcPCs

model.transform(Ra, Rb)

Outputs:

Ra_transformed_
Rb_transformed_

fit_transform

Fit and transform in one step

model.fit_transform(Ra, Rb)

Output Attributes

After fitting:

model.loadings_
model.Ra_scores_
model.Rb_scores_
model.Ra_values_
model.Rb_values_
model.objective_values_
model.objective_function_

loadings_

gcPC loadings

Shape:

features × components

Ra_scores_ / Rb_scores_

Unit-normalized scores for each dataset

Shape:

samples × components

Important: Unit Scores vs Magnitude

The scores returned by gcPCA are unit normalized.

This means:

Ra_scores_ and Rb_scores_ have unit magnitude
Values may appear small compared to PCA scores

To recover the full magnitude:

Ra_full = Ra_scores_ * Ra_values_
Rb_full = Rb_scores_ * Rb_values_

Where:

Ra_values_ — magnitude for each component
Rb_values_ — magnitude for each component

objective_values_

gcPCA values for each component

Interpretation depends on method:

For v4:

+1 → variance only in Ra
-1 → variance only in Rb
0 → equal variance

sparse_gcPCA Class

Sparse version of gcPCA.

Import

from contrastive_methods import sparse_gcPCA

Initialization

sparse_gcPCA(
    method='v4',
    Nsparse=np.inf,
    lasso_penalty=...,
    ridge_penalty=0
)

Outputs

After fitting:

model.sparse_loadings_
model.Ra_scores_
model.Rb_scores_
model.Ra_values_
model.Rb_values_

R API

gcPCA Function

Main function for generalized contrastive PCA in R.

library(gcpca)

Initialization

fit <- gcPCA(
  Ra,
  Rb,
  method = "v4",
  Ncalc = NULL
)

Parameters

Ra
Matrix for condition A (samples × features)

Rb
Matrix for condition B (samples × features)

Both datasets must:

Have the same number of features
Have samples in rows
Contain numeric values

method
gcPCA version to use:

"v1" — cPCA
"v2" — Ra / Rb
"v3" — (Ra − Rb) / Rb
"v4" — (Ra − Rb) / (Ra + Rb) (recommended)

Orthogonal variants:

"v2.1"
"v3.1"
"v4.1"

Ncalc
Number of components to compute, works only for orthogonal version.

If not specified, all available components are returned.

Methods

predict

Project data onto gcPCs

pred <- predict(fit, Ra = Ra, Rb = Rb)

Outputs:

Ra_scores
Rb_scores

Output Attributes

After prediction:

pred$Ra_scores
pred$Rb_scores

Ra_scores / Rb_scores

When extracted from gcPCA function, it returns unit-normalized scores for each dataset. When scores are calculated by predict(), the scores have their magnitude in their dataset.

Shape:

samples × components

Important: Unit Scores vs Magnitude

If you are extracting scores from predict(), the scores returned have their magnitude in each dataset (Ra/Rb)

If you are extracting scores from the model fit, as in the Python implementation, gcPCA returns unit-normalized scores. See more below.

This means:

Ra_scores and Rb_scores are normalized
Magnitude information is stored separately

To recover full magnitude:

Ra_full <- pred$Ra_scores * pred$Ra_values
Rb_full <- pred$Rb_scores * pred$Rb_values

Where:

Ra_values — magnitude for each component
Rb_values — magnitude for each component

[NOTE] This is only for scores returned from the gcPCA function, not predict()

Extracting Loadings

coef(fit)

Returns:

gcPC loadings

Shape:

features × components

Model Summary

summary(fit)

Displays:

Method used
Number of components
Objective values

Example

library(gcpca)

Ra <- matrix(rnorm(40 * 5), ncol = 5)
Rb <- matrix(rnorm(35 * 5), ncol = 5)

fit <- gcPCA(Ra, Rb, method = "v4", Ncalc = 3)

pred <- predict(fit, Ra = Ra, Rb = Rb)

pred$Ra_scores
pred$Rb_scores

sparse_gcPCA Function

Sparse version of gcPCA in R.

Use this when you want gcPC loadings with feature selection across a range of lasso penalties. The main gcPCA arguments are the same as above. The most important sparse-specific inputs are the ones controlling sparsity.

fit_sparse <- sparse_gcPCA(
  Ra,
  Rb,
  method = "v4",
  Nsparse = 2,
  lasso_penalty = c(0.05, 0.1),
  ridge_penalty = 0
)

Sparse-Specific Parameters

Nsparse
Number of sparse gcPCs to estimate

lasso_penalty
Numeric vector of lasso penalties (also referred to as lambda)

Larger values give sparser loadings
Multiple values fit a full sparsity path
One set of outputs is returned for each lambda value

ridge_penalty
Optional ridge penalty used in sparse optimization

Outputs

The fitted object stores sparse loadings and projections across the full lasso path.

fit_sparse$sparse_loadings
fit_sparse$Ra_scores
fit_sparse$Rb_scores
fit_sparse$Ra_values
fit_sparse$Rb_values
fit_sparse$lasso_penalty

Important: Outputs Are Returned Per Lasso Penalty Values

For sparse_gcPCA(), outputs are returned as lists, with one element for each value in lasso_penalty.

This means:

fit_sparse$sparse_loadings[[i]] is the loading matrix for the i-th lambda
fit_sparse$Ra_scores[[i]] and fit_sparse$Rb_scores[[i]] are the scores for that lambda
fit_sparse$Ra_values[[i]] and fit_sparse$Rb_values[[i]] are the corresponding magnitudes

predict

Prediction outputs also follow the same structure: they are returned as lists across lasso_penalty values, not as a single matrix.

So if multiple lasso penalties are used, the predicted scores are grouped by lambda.

Example

fit_sparse <- sparse_gcPCA(
  Ra,
  Rb,
  method = "v4",
  Nsparse = 2,
  lasso_penalty = c(0.05, 0.1, 0.2)
)

fit_sparse$sparse_loadings[[1]]
fit_sparse$Ra_scores[[1]]
fit_sparse$Rb_scores[[1]]

MATLAB API

gcPCA Function

Main MATLAB function for generalized contrastive PCA.

[B, S, X] = gcPCA(Za, Zb, gcPCAversion)

The MATLAB implementation decomposes the two datasets into:

condition-specific scores in B
magnitude and objective information in S
shared loadings in X

Unlike PCA, the scores for the two conditions are stored separately.

Function Signature

[B, S, X] = gcPCA(Ra, Rb, gcPCAversion, varargin)

Required Inputs

Ra
Condition A matrix with shape:

samples × features

Rb
Condition B matrix with shape:

samples × features

Both datasets must:

have the same number of features
have samples in rows
contain numeric values

gcPCAversion
Method to use:

1 — cPCA
2 — Ra / Rb
3 — (Ra - Rb) / Rb
4 — (Ra - Rb) / (Ra + Rb)

Orthogonal variants:

2.1
3.1
4.1

For most use cases, start with version 4.

Optional Inputs

Additional parameters are passed through varargin.

Common options

'Nshuffle'
Number of shuffles instances for null distribution

'normalize'
Logical flag controlling normalization

true → z-score and L2-normalize columns
false → use custom preprocessing

If normalization is turned off, the data should still be at least centered.

'Ncalc'
Maximum number of gcPCs to calculate

This is mainly relevant for orthogonal versions (2.1, 3.1, 4.1), which are iterative.

'alpha'
Used only for version 1 (cPCA)

argmax_x  xᵀ (CA − α CB) x
subject to xᵀx = 1

'maxcond'
Maximum allowed condition number for denominator regularization

This is used for numerical stability.

'rPCAkeep'
Controls how much variance from the original PCA space is retained before gcPCA is computed, can vary from 0-1 and corresponds to the proportion of variance kept.

1 keeps the full data
values less than 1 remove low-variance and potentially unstable PCA dimensions

Example

[B, S, X] = gcPCA(Ra, Rb, 4, 'normalize', true, 'Nshuffle', 0);

Outputs

X

Shared gcPC loadings

Shape:

features × components

These are the dimensions that differ most between Za and Zb.

B

Structure containing unit-normalized scores for each condition.

B.a
B.b
B.gcPCAversion

B.a — scores for condition A
B.b — scores for condition B

Shape of B.a and B.b:

samples × components

S

Structure containing objective values and per-condition magnitudes.

S.objective
S.objval
S.a
S.b
S.gcPCAversion

S.objective — text description of the objective function
S.objval — gcPCA objective value for each component
S.a — magnitudes for condition A
S.b — magnitudes for condition B

Important: Unit Scores vs Magnitude

As in the Python and R implementations, MATLAB returns unit-normalized scores in B.a and B.b.

This means:

B.a and B.b are normalized
the component magnitudes are stored separately in S.a and S.b

If users expect PCA-like score magnitudes, this is the main difference.

To recover full-magnitude projections:

Ba_full = B.a .* S.a'
Bb_full = B.b .* S.b'

Where:

S.a contains the magnitudes for condition A
S.b contains the magnitudes for condition B

Interpreting `S.objval`

S.objval contains the gcPCA value for each component.

For version 4:

+1 → variance only in Za
-1 → variance only in Zb
0 → equal variance in both conditions

This is the most directly interpretable version.

Shuffle Outputs

If 'Nshuffle' > 0, S also contains shuffle-based outputs:

S.objval_shuf
S.a_shuf
S.b_shuf

These store objective values and magnitudes computed on shuffled data.

Notes

MATLAB expects samples in rows and features in columns
Ra and Rb must have matching feature dimensions
Orthogonal versions (2.1, 3.1, 4.1) are slower because they are computed iteratively
Version 4 is usually the best starting point for symmetric comparisons

sparse_gcPCA Function

Sparse version of gcPCA in MATLAB.

Use this when you want sparse loadings with feature selection controlled by lasso penalties.

[B, S, X] = sparse_gcPCA(Ra, Rb, 4, 'Nsparse', 2, 'lasso_penalty', [0.05 0.1 0.2]);

Sparse-Specific Inputs

'Nsparse'
Number of sparse gcPCs to estimate

'lasso_penalty'
Vector of lasso penalties

larger values produce sparser loadings
multiple values fit a sparsity path

'ridge_penalty'
Optional ridge penalty for sparse optimization

'maxiter'
Maximum number of optimization iterations

'tol'
Convergence tolerance

Sparse Outputs

Sparse outputs are returned per lasso penalty.

B.a{idx} and B.b{idx} contain scores for the idx-th penalty
S.a{idx} and S.b{idx} contain magnitudes for the idx-th penalty
X{idx} contains the sparse loadings for the idx-th penalty

So unlike standard gcPCA(), the sparse MATLAB function returns results as cell arrays across lambda values.

Sparse Example

[B, S, X] = sparse_gcPCA(Za, Zb, 4, ...
    'Nsparse', 2, ...
    'lasso_penalty', [0.05 0.1 0.2], ...
    'ridge_penalty', 0);

Shared Notes

All implementations:

Expect samples in rows
Require matching feature dimensions
Support v1–v4 variants
Support orthogonal versions (.1)

Cross-Language Differences

Feature	Python	R	MATLAB
Class-based	✓	✗	✗
Sparse version	✓	✓	✓
Orthogonal versions	✓	✓	✓
Null distribution	✓	✓	✓

Links to Other Pages

1. Quickstart Guide
2. Installation
3. Conceptual Overview
4. Mathematical Formulation
6. Input Data Guidelines
7. Interpreting Results

5. Code Reference

Code Reference

Jump to Section

Python API

gcPCA Class

Import

Initialization

Parameters

Methods

fit

Inputs

transform

fit_transform

Output Attributes

loadings_

Ra_scores_ / Rb_scores_

Important: Unit Scores vs Magnitude

objective_values_

sparse_gcPCA Class

Import

Initialization

Outputs

R API

gcPCA Function

Initialization

Parameters

Methods

predict

Output Attributes

Ra_scores / Rb_scores

Important: Unit Scores vs Magnitude

Extracting Loadings

Model Summary

Example

sparse_gcPCA Function

Sparse-Specific Parameters

Outputs

Important: Outputs Are Returned Per Lasso Penalty Values

predict

Example

MATLAB API

gcPCA Function

Function Signature

Required Inputs

Optional Inputs

Common options

Example

Outputs

X

B

S

Important: Unit Scores vs Magnitude

Interpreting S.objval

Shuffle Outputs

Notes

sparse_gcPCA Function

Sparse-Specific Inputs

Sparse Outputs

Sparse Example

Shared Notes

Cross-Language Differences

Links to Other Pages

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Interpreting `S.objval`