Skip to content

JHU-CDHAI/pima

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 

Repository files navigation

Diabetes Prediction with the Pima Indians Dataset

This repository contains teaching materials and code examples for demonstrating how to conduct exploratory data analysis (EDA), data preprocessing, and machine learning model training on the classic Pima Indians Diabetes Dataset.

πŸ“š Objective

The goal is to predict the onset of diabetes based on diagnostic measurements. The target variable is Outcome (0 or 1), and features include:

  • Pregnancies
  • Glucose
  • BloodPressure
  • SkinThickness
  • Insulin
  • BMI
  • DiabetesPedigreeFunction
  • Age

πŸ› οΈ Techniques Covered

  • Exploratory Data Analysis (EDA) using pandas and seaborn
  • Data cleaning and imputation (handling 0s in medical data)
  • Feature scaling
  • Logistic Regression, Decision Trees, Random Forest
  • Model evaluation using accuracy, confusion matrix, ROC-AUC

πŸ—‚οΈ Dataset License

This dataset is in the public domain under CC0 License. No attribution required.

πŸ“Ž Citation

Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., & Johannes, R.S. (1988). Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Symposium on Computer Applications and Medical Care (pp. 261--265). IEEE Computer Society Press.

πŸ”— Credits

Dataset provided by the National Institute of Diabetes and Digestive and Kidney Diseases via UCI Machine Learning Repository and hosted on Kaggle.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors