This project implements a Decision Tree classifier to predict the presence of heart disease based on various patient health attributes. The workflow includes exploratory data analysis, model training, decision tree visualization, and evaluation of classification performance.
The objective of this project is to build a machine learning model capable of predicting heart disease using a Decision Tree algorithm. The model helps identify patterns associated with heart disease and provides insights for early risk assessment.
The dataset contains medical attributes related to patients, including:
- Age
- Sex
- Chest pain type
- Resting blood pressure
- Cholesterol level
- Maximum heart rate achieved
- Exercise-induced angina
- ST depression
- Other clinical measurements
- Target variable indicating heart disease presence
- Python
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Scikit-Learn
- Decision Tree Classifier
- Graphviz
- Pydotplus
- Loaded and explored the heart disease dataset.
- Performed exploratory data analysis to understand feature distributions.
- Identified input features and target variable.
- Split the dataset into training and testing sets.
- Built a Decision Tree classification model.
- Trained the model using Scikit-Learn.
- Visualized the decision tree structure.
- Evaluated the model on unseen data.
- Interpreted important factors influencing heart disease prediction.
- Data Loading
- Exploratory Data Analysis (EDA)
- Data Preprocessing
- Feature and Target Separation
- Train-Test Split
- Decision Tree Model Building
- Model Training
- Tree Visualization
- Model Evaluation
- Interpretation of Results
The model performance was evaluated using:
- Training and testing datasets
- Prediction analysis
- Classification metrics
- Decision tree visualization
decision-tree-classification
│
├── decision_tree_classification.ipynb
├── README.md
├── requirements.txt
├── dataset.csv
└── images/
The Decision Tree model successfully learned patterns from patient health data and provided predictions for heart disease classification. The tree visualization improved model interpretability and helped understand the factors contributing to heart disease.
- Perform hyperparameter tuning to improve model performance.
- Compare Decision Tree with Random Forest and XGBoost models.
- Apply cross-validation techniques.
- Deploy the model using Streamlit or Flask.
- Create interactive visualizations for better analysis.
Deebesh Sundar
Machine Learning & Data Science | NLP Practitioner
GitHub: https://github.com/DeebeshS-ML