💳 Credit Card Fraud Detection Using Machine Learning

A supervised machine learning project to detect fraudulent credit card transactions using Random Forest Classification on a highly imbalanced dataset.

Project Overview
Dataset
Project Structure
Tech Stack
Workflow
Key Results
Visualizations
How to Run
Possible Improvements
Author

📖 Project Overview

Credit card fraud is a major global financial threat. This project builds a machine learning pipeline to automatically classify transactions as legitimate (0) or fraudulent (1) using historical transaction data. The core challenge is the extreme class imbalance — fraud cases account for only ~0.17% of all transactions.

Goal: Maximize fraud detection (Recall) while keeping false alarms (False Positives) low.

📊 Dataset

Property	Details
Source	Kaggle – Credit Card Fraud Detection
Total Records	284,807 transactions
Features	31 columns (Time, V1–V28, Amount, Class)
Target Column	`Class` → 0 = Legitimate, 1 = Fraud
Fraud Cases	492 (~0.17%)
Legitimate Cases	284,315 (~99.83%)
Missing Values	None
Duplicates	1,081 rows (removed during cleaning)

Note: Features V1–V28 are the result of PCA transformation to protect user confidentiality. Only Time and Amount are in their original form.

📁 Project Structure

credit-card-fraud-detection/
│
├── creditcard.csv                          # Raw dataset (download from Kaggle)
├── Credit_Card_Fraud_Detection.ipynb       # Main Jupyter Notebook
├── README.md                               # Project documentation
└── requirements.txt                        # Python dependencies

🛠️ Tech Stack

Category	Libraries / Tools
Language	Python 3.10+
Data Manipulation	`pandas`, `numpy`
Visualization	`matplotlib`, `seaborn`
Machine Learning	`scikit-learn`
Environment	Jupyter Notebook / Google Colab

🔄 Workflow

1. Import Libraries
       ↓
2. Load & Explore Dataset
   └── shape, head, tail, info, describe
       ↓
3. Data Cleaning
   ├── 3a. Handle Missing Values  → None found
   └── 3b. Remove Duplicate Rows → 1,081 removed
       ↓
4. Exploratory Data Analysis (EDA)
   ├── Q1: Class distribution (Fraud vs Legitimate %)
   ├── Q2: Fraud transaction amount distribution
   └── Q3: Amount comparison across both classes
       ↓
5. Model Development
   ├── 5a. Feature/Target split (X, y)
   ├── 5b. Train-Test split (80/20, stratified)
   └── 5c. Train Random Forest Classifier
       ↓
6. Model Evaluation
   ├── Classification Report
   ├── Confusion Matrix (numeric + heatmap)
   └── Feature Importance Plot
       ↓
7. Summary & Conclusions

📈 Key Results

Model: Random Forest Classifier (`n_estimators=100`, `random_state=41`)

Metric	Legitimate (0)	Fraud (1)
Precision	1.00	0.90
Recall	1.00	0.80
F1-Score	1.00	0.85
Support	56,671	75

Overall Accuracy: ~99.97%

Confusion Matrix

                  Predicted
                  Legit   Fraud
Actual  Legit  [ 56664     7  ]
        Fraud  [    15    60  ]

	Count
True Negatives (Legit correctly identified)	56,664
False Positives (Legit flagged as Fraud)	7
False Negatives (Fraud missed)	15
True Positives (Fraud correctly detected)	60

Out of 75 fraud cases in the test set, 60 were correctly detected and only 15 were missed.

📊 Visualizations

The notebook includes the following plots:

Pie Chart — Class distribution (Fraud vs. Legitimate %)
Histogram — Fraud transaction amount distribution
Side-by-side Histograms — Amount comparison (Fraud vs. Legitimate)
Confusion Matrix Heatmap — Visual breakdown of predictions
Feature Importance Bar Chart — Top 15 most predictive features

▶️ How to Run

1. Clone or Download the Repository

git clone https://github.com/your-username/credit-card-fraud-detection.git
cd credit-card-fraud-detection

2. Install Dependencies

pip install -r requirements.txt

Or manually:

pip install pandas numpy matplotlib seaborn scikit-learn jupyter

3. Download the Dataset

Download creditcard.csv from Kaggle and place it in the project root directory.

4. Launch the Notebook

jupyter notebook Credit_Card_Fraud_Detection.ipynb

🚀 Possible Improvements

Technique	Purpose
SMOTE / Undersampling	Handle severe class imbalance
XGBoost / LightGBM	Potentially better performance
Hyperparameter Tuning	`GridSearchCV` or `RandomizedSearchCV`
ROC-AUC Curve	Better evaluation metric for imbalanced data
Precision-Recall Curve	Ideal metric for fraud detection
Cross-Validation	More robust model evaluation
Feature Scaling	Normalize `Amount` and `Time` columns

👨‍💻 Author

Rushikesh Sangamnere

Email: rushikeshsangamnere4561@gmail.com

Phone: +91 9096506345

📄 License

This project is for educational purposes. The dataset is publicly available on Kaggle under its own license terms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

💳 Credit Card Fraud Detection Using Machine Learning

Table of Contents

📖 Project Overview

📊 Dataset

📁 Project Structure

🛠️ Tech Stack

🔄 Workflow

📈 Key Results

Model: Random Forest Classifier (`n_estimators=100`, `random_state=41`)

Confusion Matrix

📊 Visualizations

▶️ How to Run

1. Clone or Download the Repository

2. Install Dependencies

3. Download the Dataset

4. Launch the Notebook

🚀 Possible Improvements

👨‍💻 Author

📄 License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

💳 Credit Card Fraud Detection Using Machine Learning

Table of Contents

📖 Project Overview

📊 Dataset

📁 Project Structure

🛠️ Tech Stack

🔄 Workflow

📈 Key Results

Model: Random Forest Classifier (n_estimators=100, random_state=41)

Confusion Matrix

📊 Visualizations

▶️ How to Run

1. Clone or Download the Repository

2. Install Dependencies

3. Download the Dataset

4. Launch the Notebook

🚀 Possible Improvements

👨‍💻 Author

📄 License

Model: Random Forest Classifier (`n_estimators=100`, `random_state=41`)