Fraud Scoring Service

Overview

This project builds a reusable machine learning pipeline that detects fraudulent transactions from any structured CSV file. Whether the data comes from PayPal, Stripe, or internal logs, the system validates the input, engineers meaningful features, runs multiple models, and outputs fraud risk scores.

What It Does

Accepts any transaction CSV file
Validates column structure, data types, and missing values
Cleans and transforms the data (encoding, scaling, feature creation)
Trains and compares four ML models:
- Logistic Regression
- Decision Tree
- Random Forest
- K-Nearest Neighbors (KNN)
Outputs fraud probability per transaction
Summarizes model performance using Accuracy, Precision, Recall, and F1

How It Works

Input Validation
- Checks schema, nulls, types, duplicates
Feature Engineering
- Encodes categoricals, scales numerics, creates derived features
Model Training & Evaluation
- Benchmarks four classifiers
- Compares metrics across models
Scoring & Output
- Generates fraud scores
- Produces model comparison summary
Optional Deployment
- Streamlit or Flask interface for CSV upload and scoring

Deliverables

Modular Python pipeline
Fraud scores per transaction
Model comparison dashboard
Ready-to-integrate output for analysts or systems and learn for the data.

Why Machine Learning

Fraud patterns evolve. Static rules fail. ML adapts. This project turns raw data into actionable insight—fast, scalable, and production-ready.

Instructions

Just run notebook 04_model_training.ipynb

fraud_scoring_service/
├── data/
├── notebooks/             
│     ├── 04_model_training.ipynb       # train classifiers and compare metrics

Repo Folder Tree

fraud_scoring_service/
├── data/
│ └── raw/
│ └── synthetic_fraud_dataset.csv   # the untouched CSV of transactions from keggle
├── notebooks/
│ ├── _main_.ipynb            	              
│ ├── 01_knn_baseline.ipynb         		  
│ ├── 02_knn_with_scaling_and_onehot.ipynb   
│ ├── 03_data_preparation.ipynb     # logic of functions validate schema/types, clean data, engineer features                 
│ ├── 04_model_training.ipynb       # train classifiers and compare metrics        
│ └── with_functions.ipynb                 
├── lib/
│ ├── functions.py                  # load functions
│ ├── validate_input.py             # check for required columns, dtypes, nulls/duplicates
│ ├── clean_data.py                 # fill missing values, convert/fix Timestamp, drop duplicates
│ ├── feature_engineering.py        # encode categoricals, scale numerics, derive new features
├── README.md 
├── .gitattributes                  # paths 
├── .gitignore                      # files and folders excluded from Git commits
├── .config.yaml                    # configuration file (data paths, model parameters)
├── pyproject.toml                  # project metadata and dependency definitions
└── uv.lock                         # locked versions of all dependencies

Slides

[Presentation] (https://docs.google.com/presentation/d/1DCRTmxcjTngXTsZA_19D8t6RzMZyaBRoeajUfCeMsb8/edit?usp=sharing)

Keggle Dataset

(https://www.kaggle.com/datasets/samayashar/fraud-detection-transactions-dataset)

Streamlit App

(https://w7pjprojectfraudscoringservice-rrp8ex7co5rxczfmvvjnkx.streamlit.app/)

You can do the 12,50 ammount 150 balance day operation 1, no previous fraud, is weekend and risk 0.92, do it with LogisticOversample then change to LogisticRegression and check how it gives a 36 % and i the model adaptation examples check how it it learns for all this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fraud Scoring Service

Overview

What It Does

How It Works

Deliverables

Why Machine Learning

Instructions

Repo Folder Tree

Slides

Keggle Dataset

Streamlit App

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
data		data
lib		lib
my_streamlit_app		my_streamlit_app
notebooks		notebooks
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
config.yaml		config.yaml
main.py		main.py
pyproject.toml		pyproject.toml
summary.md		summary.md
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Fraud Scoring Service

Overview

What It Does

How It Works

Deliverables

Why Machine Learning

Instructions

Repo Folder Tree

Slides

Keggle Dataset

Streamlit App

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages