This repository contains the Jupyter Notebooks used for my Master Thesis Project at the University of Lisbon with the title: "Identifying subtypes of Mild Cognitive Impairment due to Alzheimer’s disease in cerebrospinal fluid proteomics: a Machine Learning approach"
The work focused on applying supervised and unsupervised Machine Learning methods to omics data for the identification of Alzheimer’s disease subtypes. The implemented pipelines include steps for:
-
Data preprocessing and cleaning
-
Feature ranking and selection using multiple strategies:
- Univariate Selection
- Single Score Methods
- Attribute Weights from ML models
- Decrease of Accuracy
- Recursive Feature Addition (RFA)
- Recursive Feature Elimination (RFE)
-
Ensemble feature ranking, aggregating multiple selection results to build a more robust and stable feature set
-
Supervised modeling and evaluating classification algorithms such as:
- Support Vector Machines (SVM)
- Naïve Bayes`
- k-Nearest Neighbor (kNN)
- Decision Trees
- Logistic Regression
- Random Forests
- XGBoost
-
Unsupervised clustering of patients using Non-negative Matrix Factorization (NMF) to identify molecular subtypes.
-
Visualization and interpretation of model outputs to extract biologically meaningful insights
The findings from this project are published in: