Machine learning prototype for detecting fraudulent insurance claims.
π Live Demo: fraud-detection-demo.symfa.com
Fraud Detection is a machine learning prototype for identifying potentially fraudulent insurance claims. Based on the 2023 Travelers NESS Statathon Kaggle Competition, it helps insurance companies reduce financial losses, streamline investigations, and allocate resources more efficiently through automated prediction and SHAP-based explainability.
- Fraud Prediction β Classify claims as fraudulent or legitimate using trained AutoGluon models
- Probability Scoring β Output fraud probability and binary decision (configurable threshold)
- Explainability β SHAP-based feature contributions for each prediction
- Feature Importance β Global feature importance visualization from model training
- Summary Generation β Natural language summaries of prediction reasoning
- Interactive UI β Next.js dashboard for exploring predictions and feature impacts
Claims analysts, fraud investigators, and operations teams who need to identify and prioritize potentially fraudulent claims for review.
| Category | Technologies |
|---|---|
| Backend | Python 3.13, FastAPI |
| Frontend | TypeScript, Next.js, Node.js |
| AI/ML | AutoGluon, scikit-learn, SHAP |
| Data Validation | Pydantic |
| Package Management | uv (backend), pnpm (frontend) |
| Deployment | Docker |
The dataset contains insurance claim records from the Travelers NESS Statathon competition:
| Feature | Description |
|---|---|
age_of_driver |
Age of the driver |
gender |
Gender of the driver (M/F) |
marital_status |
Marital status indicator |
annual_income |
Annual income of the policyholder |
high_education_ind |
Higher education indicator |
living_status |
Living status (Own/Rent) |
zip_code |
ZIP code of the policyholder |
| Feature | Description |
|---|---|
claim_number |
Unique claim identifier |
claim_date |
Date of the claim |
claim_day_of_week |
Day of the week when claim was filed |
accident_site |
Location type of the accident |
past_num_of_claims |
Number of past claims |
witness_present_ind |
Whether a witness was present |
liab_prct |
Liability percentage |
channel |
Claim submission channel |
policy_report_filed_ind |
Whether a policy report was filed |
claim_est_payout |
Estimated claim payout amount |
| Feature | Description |
|---|---|
age_of_vehicle |
Age of the vehicle |
vehicle_category |
Category of the vehicle |
vehicle_price |
Price of the vehicle |
vehicle_color |
Color of the vehicle |
vehicle_weight |
Weight of the vehicle |
safty_rating |
Safety rating of the vehicle |
| Feature | Description |
|---|---|
fraud |
Target (1 = Fraudulent, 0 = Legitimate) |
fraud-detection/
βββ backend/ # Python backend (FastAPI)
β βββ Dockerfile # Backend container
β βββ src/fraud_detection/ # Application code
β βββ models/ # Trained ML model artifacts
β βββ notebooks/ # Jupyter notebooks (EDA, experiments)
β βββ scripts/ # Training & preprocessing scripts
β βββ data/ # Datasets
β βββ pyproject.toml # Backend dependencies
β
βββ frontend/ # Next.js frontend application
β βββ Dockerfile # Frontend container
β
βββ pyproject.toml # UV workspace definition
βββ uv.lock # Lockfile
βββ README.md
# Clone the repository
git clone https://github.com/Symfa-Inc/fraud-detection.git
cd fraud-detection
# Install backend dependencies
uv sync
# Install frontend dependencies
cd frontend
pnpm installBackend:
uv run uvicorn fraud_detection.main:app --port 8000 --reloadFrontend:
cd frontend
pnpm run devThe backend API will be available at http://localhost:8000 and the frontend at http://localhost:3000.
Backend (from backend/ directory):
cd backend
docker build -t fraud-detection-backend .
docker run -p 8000:8000 fraud-detection-backendFrontend (from frontend/ directory):
cd frontend
docker build -t fraud-detection-frontend .
docker run -p 3000:3000 -e API_URL=http://localhost:8000 fraud-detection-frontendSet API_URL to your backend URL when the frontend runs in a different host/container.