Stroke-Risk-Classification

This project represents my first hands-on exploration of machine learning through stroke-risk prediction, using it as a case study to evaluate and compare classification model efficiency. Aligned with IBM’s Data Science Methodology, I implemented a complete Python workflow including data preprocessing, feature engineering, class imbalance correction, model training, and evaluation.

Key steps included undersampling the majority class to address imbalance, feature extraction (age, glucose levels), and scaling with StandardScaler to prepare data for model development. I evaluated three classification algorithms—K-Nearest Neighbors (KNN), Decision Trees, and Random Forests—using both hold-out validation and 10-fold cross-validation for reliable performance assessment.

The KNN model achieved the best results, with a 73% mean accuracy and improved F1-score, precision, and recall, driven by both preprocessing and feature selection. Special attention was placed on reducing false negatives, critical for medical decision support. Additionally, Decision Tree visualizations were generated to enhance model explainability. This project provided foundational experience in building machine learning pipelines for real-world health applications, emphasizing model transparency, performance trade-offs, and the importance of rigorous evaluation.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
Dataset		Dataset
Scripts		Scripts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stroke-Risk-Classification

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Stroke-Risk-Classification

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages