Fraud Detection Machine Learning Project

A comprehensive machine learning solution for detecting fraudulent financial transactions using XGBoost classifier on a highly imbalanced dataset of 6.3+ million transactions.

🔗 Quick Access

Google Colab Notebook: View & Run in Colab

📊 Project Overview

This project implements a fraud detection system that analyzes financial transaction patterns to identify fraudulent activities with 99.99% ROC-AUC accuracy. The model processes transaction data including balance changes, transaction types, and engineered features to detect fraud in real-time scenarios.

Key Statistics

Dataset Size: 6,362,620 transactions
Fraud Rate: 0.129% (highly imbalanced)
Time Period: 30 days (744 hours)
Model Accuracy: 99.99% ROC-AUC Score

🎯 Features

Data Analysis

Comprehensive exploratory data analysis (EDA)
Missing values and outlier analysis
Transaction type distribution analysis
Fraud pattern visualization

Feature Engineering

errorBalanceOrig: Balance discrepancy detection on sender side
errorBalanceDest: Balance discrepancy detection on receiver side
isHighRiskType: Flags TRANSFER and CASH_OUT transactions
isOriginDrained: Identifies completely emptied accounts
isDestEmpty: Detects transactions to new/empty accounts

Machine Learning Model

Algorithm: XGBoost Classifier
Handling Imbalance: Strategic undersampling (10:1 ratio)
Performance Metrics:
- ROC-AUC Score: 0.9999
- Average Precision: 0.9985
- Fraud Recall: 99.75%
- Fraud Precision: 87%
- F1-Score: 0.93

🔍 Key Findings

Top Fraud Indicators

errorBalanceOrig (50.44%) - Balance inconsistencies on sender side
isOriginDrained (46.09%) - Complete account drainage
Transaction Amount (1.32%) - Avg fraud: ₹1.46M vs normal: ₹178K
Transaction Type - Only TRANSFER and CASH_OUT show fraud activity

Fraud Patterns Discovered

98.1% of fraud cases completely drain the origin account
65.2% of fraud destinations are empty accounts
Fraudulent transactions average 8x higher amounts
Only TRANSFER and CASH_OUT transaction types exhibit fraud

🛠️ Technologies Used

Python 3.x
Data Processing: NumPy, Pandas
Visualization: Matplotlib, Seaborn
Machine Learning: Scikit-learn, XGBoost
Environment: Google Colab

📁 Project Structure

Fraud-Detection-ML/
│
├── accredian_python_file.ipynb    # Main Jupyter notebook
├── Data_Dictionary.txt             # Dataset documentation
├── README.md                       # Project documentation
└── requirements.txt                # Python dependencies (if applicable)

📈 Model Performance

Confusion Matrix Results

True Negatives: 1,270,642
False Positives: 239
False Negatives: 4
True Positives: 1,639

Classification Report

Non-Fraud Precision: 1.00
Fraud Recall: 1.00
Overall Accuracy: 99.98%

💡 Business Applications

Prevention Strategies

Real-time Transaction Scoring - Deploy model as API endpoint
Velocity Checks - Monitor rapid account drainage patterns
Destination Verification - Enhanced KYC for new accounts
Transaction Limits - Smart thresholds based on model predictions
Behavioral Analytics - Customer baseline profiling
Time-based Monitoring - Increased sensitivity during off-peak hours

Success Metrics

Reduction in fraud detection time
Decreased false positive rates
Lower financial losses from fraud
Improved customer satisfaction through reduced friction

🚀 Getting Started

Prerequisites

numpy
pandas
matplotlib
seaborn
scikit-learn
xgboost

Running the Notebook

Open the Google Colab link
Run all cells sequentially
Review visualizations and model performance metrics

📊 Dataset Information

The dataset contains synthetic financial transaction data with the following features:

step: Time unit (1 step = 1 hour)
type: Transaction type (CASH-IN, CASH-OUT, DEBIT, PAYMENT, TRANSFER)
amount: Transaction amount in local currency
nameOrig: Customer initiating the transaction
oldbalanceOrg: Initial balance before transaction
newbalanceOrig: New balance after transaction
nameDest: Recipient of the transaction
oldbalanceDest: Recipient's initial balance
newbalanceDest: Recipient's new balance
isFraud: Fraud indicator (target variable)
isFlaggedFraud: Business rule-based flag (>200K transactions)

📝 Analysis Questions Answered

The project comprehensively addresses:

Data cleaning methodology (missing values, outliers, multicollinearity)
Model selection and justification (XGBoost)
Variable selection process and feature engineering
Model performance evaluation
Key fraud prediction factors
Logical validation of predictive features
Prevention strategies and recommendations
Effectiveness measurement frameworks

👤 Author

Sujal Kumar Nayak

LinkedIn: linkedin.com/in/sujal-kumar-nayak
Email: nayaksujalkumar@gmail.com

📄 License

This project is created for educational and portfolio purposes.

🙏 Acknowledgments

Dataset: Synthetic financial transaction data
Inspiration: Real-world fraud detection systems used by PayPal, Stripe, and major banks
Tools: Google Colab for cloud computing resources

Note: This project demonstrates advanced machine learning techniques for fraud detection. The model achieves near-perfect performance on the test set and provides actionable insights for real-world fraud prevention systems.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
Data_Dictionary.txt		Data_Dictionary.txt
README.md		README.md
accredian_python_file.ipynb		accredian_python_file.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fraud Detection Machine Learning Project

🔗 Quick Access

📊 Project Overview

Key Statistics

🎯 Features

Data Analysis

Feature Engineering

Machine Learning Model

🔍 Key Findings

Top Fraud Indicators

Fraud Patterns Discovered

🛠️ Technologies Used

📁 Project Structure

📈 Model Performance

Confusion Matrix Results

Classification Report

💡 Business Applications

Prevention Strategies

Success Metrics

🚀 Getting Started

Prerequisites

Running the Notebook

📊 Dataset Information

📝 Analysis Questions Answered

👤 Author

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fraud Detection Machine Learning Project

🔗 Quick Access

📊 Project Overview

Key Statistics

🎯 Features

Data Analysis

Feature Engineering

Machine Learning Model

🔍 Key Findings

Top Fraud Indicators

Fraud Patterns Discovered

🛠️ Technologies Used

📁 Project Structure

📈 Model Performance

Confusion Matrix Results

Classification Report

💡 Business Applications

Prevention Strategies

Success Metrics

🚀 Getting Started

Prerequisites

Running the Notebook

📊 Dataset Information

📝 Analysis Questions Answered

👤 Author

📄 License

🙏 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages