Skip to content

KaranGupta2005/Breast_Cancer_ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Breast_Cancer_ML

🧠 Breast Cancer Detection Using ML (Logistic Regression, KNN, SVM) This project uses machine learning algorithms to classify tumors as Malignant (M) or Benign (B) using the Breast Cancer Wisconsin dataset. The steps include data preprocessing, exploratory data analysis (EDA), feature selection, scaling, and model building with evaluation.

πŸ“‚ Dataset Source: breast_cancer.csv

Total entries: 569

Diagnosis labels:

M = Malignant (encoded as 1)

B = Benign (encoded as 0)

πŸ” Workflow Overview

  1. Data Preprocessing Removed unnecessary id column.

Encoded target variable: M β†’ 1, B β†’ 0.

Checked for missing values using missingno.

Normalized features using StandardScaler.

  1. Exploratory Data Analysis Visualized class distribution.

Plotted distribution of all features.

Used a heatmap to observe correlation among features.

  1. Feature Selection Removed features having correlation > 0.92 to prevent multicollinearity.

Reduced feature count from 32 β†’ 23.

  1. Modeling & Evaluation
Model Train Accuracy Test Accuracy Precision Recall F1-score
Logistic Regression 98.9% 96.5% 0.96–0.98 0.94–0.99 0.96
KNN (default) 96.7% 95.6% 0.94–0.98 0.91–0.99 0.95
SVC (GridSearchCV) ~100% 97.4% 0.96–1.00 0.96–0.99 0.97

.

πŸ”§ Libraries Used python Copy Edit pandas, numpy, matplotlib, seaborn, sklearn, missingno πŸ“ˆ Key Visualizations Class Distribution Plot

Feature Histograms

Heatmap for Feature Correlation

Missing Data Matrix using missingno

πŸ§ͺ Model Evaluation Metrics Confusion Matrix

Accuracy Score

Classification Report

Precision

Recall

F1-score

βœ… Conclusion All three models perform well, with SVC giving the highest accuracy and balance between precision/recall.

Feature selection based on correlation significantly improved generalization by reducing redundancy.

The project can be extended using:

Cross-validation

Ensemble models (e.g., Random Forest, XGBoost)

Feature engineering

SHAP/LIME for explainability

About

This project is focused on building a machine learning model that can effectively detect whether a tumor is benign or malignant based on medical data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors