A comprehensive comparison of linear (PCA) and non-linear (Autoencoder) dimensionality reduction techniques applied to clothing image recognition and clustering using the Fashion-MNIST dataset.
This project evaluates two dimensionality reduction approaches on Fashion-MNIST clothing images (28×28 pixels, 784 features) across 10 categories. The reduced-dimension representations are assessed through classification (k-NN) and clustering (k-means, GMM) tasks to determine which method better preserves meaningful information in the latent space.
- Source: Fashion-MNIST dataset
- Images: 28×28 grayscale clothing images (784 features)
- Categories: 10 clothing types (T-shirt, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, Ankle Boot)
- Sample size: 150 images per category (1,500 total)
- Split: 80% training (1,200), 20% test (300) with stratified sampling
- Normalization: Min-max scaling from [0, 255] to [0, 1] pixel values
- Stratified split: Maintains class balance across train/test sets
- Validation: 20% of training data (240 images) used for autoencoder validation
- Type: Linear transformation
- Dimensions tested: 50, 25, 10
- Best performance: 50D with MAE = 0.067
- Architecture:
- Encoder: ReLU activation (hidden layers), Linear (output)
- Decoder: ReLU activation (hidden layers), Sigmoid (output)
- Dimensions tested: 50, 25, 10
- Performance: Consistent MAE ≈ 0.073-0.075 across all dimensions
| Method | 10D | 25D | 50D |
|---|---|---|---|
| Autoencoder | 0.075 | 0.073 | 0.073 |
| PCA | 0.100 | 0.082 | 0.067 |
Key Findings:
- PCA reconstruction quality improves significantly with higher dimensions
- Autoencoder performance remains stable regardless of latent dimension
- Autoencoders effectively encode information even with only 10 features
Leave-one-out cross-validation with k=1 to 9 neighbors:
| Method | Mean Accuracy | Std | Best Configuration |
|---|---|---|---|
| Original (784D) | 0.740 | ±0.020 | k=4: 0.760 |
| PCA-50 | 0.757 | ±0.011 | k=3: 0.773 |
| PCA-25 | 0.754 | ±0.013 | k=3: 0.770 |
| PCA-10 | 0.717 | ±0.009 | k=9: 0.730 |
| AE-50 | 0.718 | ±0.023 | k=7: 0.740 |
| AE-25 | 0.710 | ±0.028 | k=7: 0.730 |
| AE-10 | 0.713 | ±0.023 | k=7: 0.750 |
Best Overall Performance: PCA-50 with k=3 achieving 77.3% accuracy
Using PCA-50 features with k=10 clusters:
| Method | NMI Score |
|---|---|
| k-means | 0.5235 |
| GMM | 0.5655 |
Clustering Insights:
- Trousers: Best separated category (100% precision in classification)
- Challenging pairs: Sneaker-Sandal, Coat-Pullover-Shirt show high visual similarity
- t-SNE visualization confirms these similarity patterns
Two approaches tested with 10D autoencoder latent space:
- Per-class Gaussian: Calculate mean/covariance per category
- Gaussian Mixture Model: Fit GMM with 10 components across all categories
Both methods successfully generate realistic clothing images from latent representations.
- PCA superiority for classification: Despite autoencoders' better low-dimensional reconstruction, PCA features achieve superior classification accuracy
- Dimension efficiency: Autoencoders maintain consistent performance across dimensions, while PCA benefits from higher dimensions
- Feature interpretability: PCA's linear combinations preserve class-discriminative information better than autoencoder's non-linear features
- Reconstruction vs. classification trade-off: Better reconstruction doesn't guarantee better downstream task performance
- For classification tasks: PCA with sufficient dimensions (≥25) recommended
- For reconstruction/generation: Autoencoders effective even with minimal dimensions
- For clustering: GMM outperforms k-means on reduced representations
- Computational efficiency: Autoencoders more parameter-efficient for very low dimensions
- Python 3.7+
- TensorFlow/Keras
- scikit-learn
- numpy
- matplotlib
- seaborn
Konstantinos Vardakas
This project demonstrates the trade-offs between linear and non-linear dimensionality reduction techniques and their impact on downstream machine learning tasks.