Skip to content

5. Code Reference

Eliezyer edited this page Mar 29, 2026 · 3 revisions

Code Reference

This page documents the Python, R, and MATLAB APIs for gcPCA.
This reference is intended as a technical lookup for functions, arguments, and outputs.

Jump to Section


Python API

gcPCA Class

Main class for generalized contrastive PCA.

Import

from contrastive_methods import gcPCA

Initialization

gcPCA(
    method='v4',
    Ncalc=np.inf,
    Nshuffle=0,
    normalize_flag=True,
    alpha=1,
    alpha_null=0.975,
    cond_number=1e13
)

Parameters

method
gcPCA version to use:

  • 'v1' — cPCA
  • 'v2' — Ra / Rb
  • 'v3' — (Ra − Rb) / Rb
  • 'v4' — (Ra − Rb) / (Ra + Rb) (recommended)

Orthogonal variants:

  • 'v2.1'
  • 'v3.1'
  • 'v4.1'

Ncalc
Number of components to compute (used for orthogonal versions)


Nshuffle
Number of shuffles for null distribution


normalize_flag

  • True → z-score and L2 normalization
  • False → user provides normalized data

alpha

Used only for v1, it corresponds to alpha of cPCA.

argmax_x  xᵀ (CA − α CB) x
subject to xᵀx = 1

cond_number

Numerical stability parameter, this determines if the matrix is full rank or needs correction


Methods

fit

Fit gcPCA model.

model.fit(Ra, Rb)

Inputs

Ra — matrix (samples × features)
Rb — matrix (samples × features)

Both datasets must:

  • Have same number of features
  • Have samples in rows

transform

Project data onto gcPCs

model.transform(Ra, Rb)

Outputs:

  • Ra_transformed_
  • Rb_transformed_

fit_transform

Fit and transform in one step

model.fit_transform(Ra, Rb)

Output Attributes

After fitting:

model.loadings_
model.Ra_scores_
model.Rb_scores_
model.Ra_values_
model.Rb_values_
model.objective_values_
model.objective_function_

loadings_

gcPC loadings

Shape:

features × components


Ra_scores_ / Rb_scores_

Unit-normalized scores for each dataset

Shape:

samples × components


Important: Unit Scores vs Magnitude

The scores returned by gcPCA are unit normalized.

This means:

  • Ra_scores_ and Rb_scores_ have unit magnitude
  • Values may appear small compared to PCA scores

To recover the full magnitude:

Ra_full = Ra_scores_ * Ra_values_
Rb_full = Rb_scores_ * Rb_values_

Where:

  • Ra_values_ — magnitude for each component
  • Rb_values_ — magnitude for each component

objective_values_

gcPCA values for each component

Interpretation depends on method:

For v4:

  • +1 → variance only in Ra
  • -1 → variance only in Rb
  • 0 → equal variance

sparse_gcPCA Class

Sparse version of gcPCA.

Import

from contrastive_methods import sparse_gcPCA

Initialization

sparse_gcPCA(
    method='v4',
    Nsparse=np.inf,
    lasso_penalty=...,
    ridge_penalty=0
)

Outputs

After fitting:

model.sparse_loadings_
model.Ra_scores_
model.Rb_scores_
model.Ra_values_
model.Rb_values_

R API

gcPCA Function

Main function for generalized contrastive PCA in R.

library(gcpca)

Initialization

fit <- gcPCA(
  Ra,
  Rb,
  method = "v4",
  Ncalc = NULL
)

Parameters

Ra
Matrix for condition A (samples × features)


Rb
Matrix for condition B (samples × features)

Both datasets must:

  • Have the same number of features
  • Have samples in rows
  • Contain numeric values

method
gcPCA version to use:

  • "v1" — cPCA
  • "v2" — Ra / Rb
  • "v3" — (Ra − Rb) / Rb
  • "v4" — (Ra − Rb) / (Ra + Rb) (recommended)

Orthogonal variants:

  • "v2.1"
  • "v3.1"
  • "v4.1"

Ncalc
Number of components to compute, works only for orthogonal version.

If not specified, all available components are returned.


Methods

predict

Project data onto gcPCs

pred <- predict(fit, Ra = Ra, Rb = Rb)

Outputs:

  • Ra_scores
  • Rb_scores

Output Attributes

After prediction:

pred$Ra_scores
pred$Rb_scores

Ra_scores / Rb_scores

When extracted from gcPCA function, it returns unit-normalized scores for each dataset. When scores are calculated by predict(), the scores have their magnitude in their dataset.

Shape:

samples × components


Important: Unit Scores vs Magnitude

If you are extracting scores from predict(), the scores returned have their magnitude in each dataset (Ra/Rb)

If you are extracting scores from the model fit, as in the Python implementation, gcPCA returns unit-normalized scores. See more below.

This means:

  • Ra_scores and Rb_scores are normalized
  • Magnitude information is stored separately

To recover full magnitude:

Ra_full <- pred$Ra_scores * pred$Ra_values
Rb_full <- pred$Rb_scores * pred$Rb_values

Where:

  • Ra_values — magnitude for each component
  • Rb_values — magnitude for each component

[NOTE] This is only for scores returned from the gcPCA function, not predict()


Extracting Loadings

coef(fit)

Returns:

gcPC loadings

Shape:

features × components


Model Summary

summary(fit)

Displays:

  • Method used
  • Number of components
  • Objective values

Example

library(gcpca)

Ra <- matrix(rnorm(40 * 5), ncol = 5)
Rb <- matrix(rnorm(35 * 5), ncol = 5)

fit <- gcPCA(Ra, Rb, method = "v4", Ncalc = 3)

pred <- predict(fit, Ra = Ra, Rb = Rb)

pred$Ra_scores
pred$Rb_scores

sparse_gcPCA Function

Sparse version of gcPCA in R.

Use this when you want gcPC loadings with feature selection across a range of lasso penalties. The main gcPCA arguments are the same as above. The most important sparse-specific inputs are the ones controlling sparsity.

fit_sparse <- sparse_gcPCA(
  Ra,
  Rb,
  method = "v4",
  Nsparse = 2,
  lasso_penalty = c(0.05, 0.1),
  ridge_penalty = 0
)

Sparse-Specific Parameters

Nsparse
Number of sparse gcPCs to estimate


lasso_penalty
Numeric vector of lasso penalties (also referred to as lambda)

  • Larger values give sparser loadings
  • Multiple values fit a full sparsity path
  • One set of outputs is returned for each lambda value

ridge_penalty
Optional ridge penalty used in sparse optimization


Outputs

The fitted object stores sparse loadings and projections across the full lasso path.

fit_sparse$sparse_loadings
fit_sparse$Ra_scores
fit_sparse$Rb_scores
fit_sparse$Ra_values
fit_sparse$Rb_values
fit_sparse$lasso_penalty

Important: Outputs Are Returned Per Lasso Penalty Values

For sparse_gcPCA(), outputs are returned as lists, with one element for each value in lasso_penalty.

This means:

  • fit_sparse$sparse_loadings[[i]] is the loading matrix for the i-th lambda
  • fit_sparse$Ra_scores[[i]] and fit_sparse$Rb_scores[[i]] are the scores for that lambda
  • fit_sparse$Ra_values[[i]] and fit_sparse$Rb_values[[i]] are the corresponding magnitudes

predict

Prediction outputs also follow the same structure: they are returned as lists across lasso_penalty values, not as a single matrix.

So if multiple lasso penalties are used, the predicted scores are grouped by lambda.


Example

fit_sparse <- sparse_gcPCA(
  Ra,
  Rb,
  method = "v4",
  Nsparse = 2,
  lasso_penalty = c(0.05, 0.1, 0.2)
)

fit_sparse$sparse_loadings[[1]]
fit_sparse$Ra_scores[[1]]
fit_sparse$Rb_scores[[1]]

MATLAB API

gcPCA Function

Main MATLAB function for generalized contrastive PCA.

[B, S, X] = gcPCA(Za, Zb, gcPCAversion)

The MATLAB implementation decomposes the two datasets into:

  • condition-specific scores in B
  • magnitude and objective information in S
  • shared loadings in X

Unlike PCA, the scores for the two conditions are stored separately.


Function Signature

[B, S, X] = gcPCA(Ra, Rb, gcPCAversion, varargin)

Required Inputs

Ra
Condition A matrix with shape:

samples × features


Rb
Condition B matrix with shape:

samples × features

Both datasets must:

  • have the same number of features
  • have samples in rows
  • contain numeric values

gcPCAversion
Method to use:

  • 1 — cPCA
  • 2 — Ra / Rb
  • 3 — (Ra - Rb) / Rb
  • 4 — (Ra - Rb) / (Ra + Rb)

Orthogonal variants:

  • 2.1
  • 3.1
  • 4.1

For most use cases, start with version 4.


Optional Inputs

Additional parameters are passed through varargin.

Common options

'Nshuffle'
Number of shuffles instances for null distribution


'normalize'
Logical flag controlling normalization

  • true → z-score and L2-normalize columns
  • false → use custom preprocessing

If normalization is turned off, the data should still be at least centered.


'Ncalc'
Maximum number of gcPCs to calculate

This is mainly relevant for orthogonal versions (2.1, 3.1, 4.1), which are iterative.


'alpha'
Used only for version 1 (cPCA)

argmax_x  xᵀ (CA − α CB) x
subject to xᵀx = 1

'maxcond'
Maximum allowed condition number for denominator regularization

This is used for numerical stability.


'rPCAkeep'
Controls how much variance from the original PCA space is retained before gcPCA is computed, can vary from 0-1 and corresponds to the proportion of variance kept.

  • 1 keeps the full data
  • values less than 1 remove low-variance and potentially unstable PCA dimensions

Example

[B, S, X] = gcPCA(Ra, Rb, 4, 'normalize', true, 'Nshuffle', 0);

Outputs

X

Shared gcPC loadings

Shape:

features × components

These are the dimensions that differ most between Za and Zb.


B

Structure containing unit-normalized scores for each condition.

B.a
B.b
B.gcPCAversion
  • B.a — scores for condition A
  • B.b — scores for condition B

Shape of B.a and B.b:

samples × components


S

Structure containing objective values and per-condition magnitudes.

S.objective
S.objval
S.a
S.b
S.gcPCAversion
  • S.objective — text description of the objective function
  • S.objval — gcPCA objective value for each component
  • S.a — magnitudes for condition A
  • S.b — magnitudes for condition B

Important: Unit Scores vs Magnitude

As in the Python and R implementations, MATLAB returns unit-normalized scores in B.a and B.b.

This means:

  • B.a and B.b are normalized
  • the component magnitudes are stored separately in S.a and S.b

If users expect PCA-like score magnitudes, this is the main difference.

To recover full-magnitude projections:

Ba_full = B.a .* S.a'
Bb_full = B.b .* S.b'

Where:

  • S.a contains the magnitudes for condition A
  • S.b contains the magnitudes for condition B

Interpreting S.objval

S.objval contains the gcPCA value for each component.

For version 4:

  • +1 → variance only in Za
  • -1 → variance only in Zb
  • 0 → equal variance in both conditions

This is the most directly interpretable version.


Shuffle Outputs

If 'Nshuffle' > 0, S also contains shuffle-based outputs:

S.objval_shuf
S.a_shuf
S.b_shuf

These store objective values and magnitudes computed on shuffled data.


Notes

  • MATLAB expects samples in rows and features in columns
  • Ra and Rb must have matching feature dimensions
  • Orthogonal versions (2.1, 3.1, 4.1) are slower because they are computed iteratively
  • Version 4 is usually the best starting point for symmetric comparisons

sparse_gcPCA Function

Sparse version of gcPCA in MATLAB.

Use this when you want sparse loadings with feature selection controlled by lasso penalties.

[B, S, X] = sparse_gcPCA(Ra, Rb, 4, 'Nsparse', 2, 'lasso_penalty', [0.05 0.1 0.2]);

Sparse-Specific Inputs

'Nsparse'
Number of sparse gcPCs to estimate


'lasso_penalty'
Vector of lasso penalties

  • larger values produce sparser loadings
  • multiple values fit a sparsity path

'ridge_penalty'
Optional ridge penalty for sparse optimization


'maxiter'
Maximum number of optimization iterations


'tol'
Convergence tolerance


Sparse Outputs

Sparse outputs are returned per lasso penalty.

  • B.a{idx} and B.b{idx} contain scores for the idx-th penalty
  • S.a{idx} and S.b{idx} contain magnitudes for the idx-th penalty
  • X{idx} contains the sparse loadings for the idx-th penalty

So unlike standard gcPCA(), the sparse MATLAB function returns results as cell arrays across lambda values.


Sparse Example

[B, S, X] = sparse_gcPCA(Za, Zb, 4, ...
    'Nsparse', 2, ...
    'lasso_penalty', [0.05 0.1 0.2], ...
    'ridge_penalty', 0);

Shared Notes

All implementations:

  • Expect samples in rows
  • Require matching feature dimensions
  • Support v1–v4 variants
  • Support orthogonal versions (.1)

Cross-Language Differences

Feature Python R MATLAB
Class-based
Sparse version
Orthogonal versions
Null distribution

Links to Other Pages

1. Quickstart Guide
2. Installation
3. Conceptual Overview
4. Mathematical Formulation
6. Input Data Guidelines
7. Interpreting Results

Clone this wiki locally