Multi-Vertex Pattern and Network Analysis (MVPNA) is a 2-step machine learning framework to jointly identify vertex clusters and vertex cluster interactions in sMRI (freesurfer atlas-transformed) vertex-based cortical thickness data.
Step 1 is a fused sparse group lasso regression (FSGL) model that performs binary classification between 2 experimental paradigms (ie. Disease and Control) on vertex-based cortical thickness data.
FSGL leverages the adjacency structure of subject vertex data (fused lasso penalty) and predefined desikan-killiany anatomical regions-of-interest (ROIs) (group lasso penalty) to learn a clusterable coefficient map by supervised classification of subjects into their experimental paradigm.
Step 2 is an elasticnet model that z-score thresholds the step 1 coefficient map to result in vertex clusters of high coefficient magnitudes, and takes the cluster mean thicknesses and their interactions to learn an interpretable solution for binary classification. This solution can be interpreted to understand vertex clusters ans interactions between vertex clusters that are associated with thickness changes in the paradigm being studied.
- vertex_data_processing.ipynb
- optuna_search_FSGL.py
- S1
- S2
Input data for data processing should be represented as a pandas DataFrame with the following columns in order: the groups for classification, age, sex, site, the right hemisphere vertices, and the left hemisphere vertices. Site is a placeholder and is not used. If there are multiple sites in the input data, it is recommended to check for site effects and harmonize the vertex data to remove site effects. No smoothing of vertices nor other spatial operations should be performed.
The medial wall vertices should be set equal to zero and are masked in the models such that their coefficients are zero and vertex clusters are not defined within medial walls of the cortex.
The step 1 FSGL model requires the following order of a subjects feature vector: right hemisphere vertices, left hemisphere vertices, age, and sex.
Data should be standardized, and split into equal folds for model training.
The current configuration uses a 5x2-fold cross validation (CV) uniform search to optimize the FSGL hyperparameters (HPs) and learn a clusterable best solution for binary classification of groups.
After HP search, a final FSGL model is fitted to all 5 folds of data, and the coefficient map is used in Step 2.
In Step 2, cluster z-score threshold is a hyperparameter that is imposed on the S1 coefficients to define clusters. An elasticnet regression is trained on binary classification of the groups using the vertex cluster mean thicknesses and mean thickness interactions for each z-score threshold, and uses a grid search 5-fold CV to optimize the elasticnet hyperparameters.
The best-AUC Step 2 model is fit on all of the data, and the resulting coefficients can be interpreted to understand vertex cluster and vertex cluster interactions that are associated with thickness changes in the groups.