This project provides a from-scratch implementation of Principal Component Analysis (PCA), a widely used method for dimensionality reduction in high-dimensional data analysis. The implementation relies solely on fundamental linear algebra operations and basic statistical tools.
PCA projects high-dimensional data onto a lower-dimensional linear subspace by identifying the directions of maximum variance. These directions correspond to the eigenvectors of the data's covariance matrix associated with the largest eigenvalues.
Given a data matrix
-
Center the data by subtracting the empirical column-wise mean:
$X_c = X - \mu$ -
Compute the empirical covariance matrix:
$C = \frac{1}{n - 1} X_c^\top X_c$ -
Extract the top
$k$ eigenvectors$V_k \in \mathbb{R}^{d \times k}$ of$C$ . -
Project the centered data onto the subspace spanned by the principal components:
$Y = X_c V_k \in \mathbb{R}^{n \times k}$ -
Optionally, restore the column means to the reduced data for interpretation.
By performing these steps, the resulting low-dimensional representation