This project focused on predicting diabetes using Logistic Regression (LR), Support Vector Machine (SVM), and Random Forest (RF) models. Through model comparison and hyperparameter tuning, Random Forest achieved the best overall performance with the highest recall (67.9%), accuracy (78.8%), and ROC-AUC (86.0%) on the test set, making it the most reliable for identifying diabetic patients.
This project highlights how machine learning can enhance early diabetes detection by minimizing false negatives and improving diagnostic accuracy.
Using the Pima Indians Diabetes dataset, to develop a binary classification model using Logistic Regression to predict whether a patient is diabetic.
From the task, we can identify that,
- It is supervised learning.
- It is binary classification task.
- Perform necessary data preprocessing.
- Evaluate the model using accuracy, precision, recall and ROC-AUC.
- Loh Chia Heung (Leader), 2301684
- Tong Yu Shan, 2301157
- Low Jia Hao, 2302161