UCCD1033_Group2

This project focused on predicting diabetes using Logistic Regression (LR), Support Vector Machine (SVM), and Random Forest (RF) models. Through model comparison and hyperparameter tuning, Random Forest achieved the best overall performance with the highest recall (67.9%), accuracy (78.8%), and ROC-AUC (86.0%) on the test set, making it the most reliable for identifying diabetic patients.

This project highlights how machine learning can enhance early diabetes detection by minimizing false negatives and improving diagnostic accuracy.

Task:

Using the Pima Indians Diabetes dataset, to develop a binary classification model using Logistic Regression to predict whether a patient is diabetic.

From the task, we can identify that,

It is supervised learning.
It is binary classification task.

Guideline:

Perform necessary data preprocessing.
Evaluate the model using accuracy, precision, recall and ROC-AUC.

Dataset:

https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database/

Members:

Loh Chia Heung (Leader), 2301684
Tong Yu Shan, 2301157
Low Jia Hao, 2302161

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
UCCD1033_Group2_Code.ipynb		UCCD1033_Group2_Code.ipynb
UCCD1033_Group2_PPT.pptx		UCCD1033_Group2_PPT.pptx
UCCD1033_Group2_Report.pdf		UCCD1033_Group2_Report.pdf
diabetes.csv		diabetes.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UCCD1033_Group2

Task:

Guideline:

Dataset:

Members:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

UCCD1033_Group2

Task:

Guideline:

Dataset:

Members:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages