This analysis investigated EEG frequency patterns distinguishing Alzheimer's Disease (AD) patients from age-matched healthy controls, with attention to methodological approaches and their implications.
After analyzing all available frequency bands and regions, I employed mutual information (MI) analysis to systematically identify the most discriminative features. Starting with over 50 potential frequency-region combinations, I selected the features that provided the strongest and most complementary diagnostic signals. MI scores range from 0 to 1, where 0 indicates no mutual information (independence) and 1 indicates perfect mutual information (dependence).
| Feature Band | Frequency Range | Region | MI Score |
|---|---|---|---|
| Theta (slow range) | 3-5 Hz | Global | 0.112 |
| Delta | 1-3 Hz | Temporal | 0.104 |
| Alpha (fast range) | 10-13 Hz | Central | 0.099 |
| Beta (slow range) | 13-20 Hz | Parietal | 0.082 |
| Beta (fast range) | 21-30 Hz | Global | 0.037 |
The MI scores obtained reveal a clear hierarchical pattern of frequency band importance in AD diagnosis. The slow-wave frequencies show the strongest associations, with Theta (MI = 0.112) and Delta (MI = 0.104) bands together accounting for over 56% of the relative importance. These findings align with previous literature showing increased slow-wave activity in AD. The analysis also reveals region-specific importance, particularly in temporal and central areas for Delta and Alpha bands respectively.
The relatively close MI scores among the top features (0.082-0.112) suggest that AD detection benefits from considering multiple frequency bands rather than relying on a single discriminative feature.
MI was chosen for its ability to:
- Capture non-linear relationships between frequency patterns and AD diagnosis
- Identify subtle alterations in frequency distributions characteristic of AD
- Provide interpretable importance measures that are robust to scaling and transformations
- Quantify both linear and non-linear dependencies without assuming a particular distribution
Using Optuna, I evaluated multiple classifier types:
- XGBoost
- CatBoost
- Gradient Boosting
- Random Forest
- Support Vector Machines
CatBoost emerged as the best performing model.
| Parameter | Value |
|---|---|
| Iterations | 415 |
| Learning Rate | 1.76e-4 |
| Depth | 10 |
| L2 Leaf Regularization | 4.75e-8 |
I explored two approaches, each with different implications for clinical application:
- Conservative Approach (Downsampling)
- Sensitivity: 77% (ability to correctly identify AD patients)
- Specificity: 81% (ability to correctly identify healthy controls)
- Oversampling Approach
- Sensitivity: 82% (ability to correctly identify AD patients)
- Specificity: 85% (ability to correctly identify healthy controls)
My exploration revealed a fundamental trade-off in detection approaches that has potential clinical implications:
- High-Detection Priority (Oversampling)
- Achieves higher AD detection rate (82% sensitivity). Done by oversampling AD cases to match larger amount of healthy patients in the dataset.
- This prioritizes catching potential AD cases, but at the potential cost of more false positives.
- Best suited for screening contexts where follow-up testing is readily available
- Reliability Priority (Conservative Approach)
- Shows slightly lower detection rate (77% sensitivity) but bases all training and predictions on real AD patient data
- Prioritizes prediction confidence over catch-all detection, potentially missing some early cases but providing more reliable positive predictions
- Better suited for contexts where false positives are costly or could cause stress.
While oversampling achieved the highest performance metrics, I recommend the more conservative downsampling approach for clinical implementation. This choice prioritizes reliability and generalizability over raw performance metrics.