A comprehensive fraud detection system that combines database-driven transaction analysis with Graph Neural Networks (GNN) to identify fraudulent financial transactions by analyzing relationships between accounts, devices, and merchants.
- Database-Driven Architecture: PostgreSQL-based storage for structured transaction data
- Graph Neural Networks: Advanced GNN models (GraphSAGE/GAT/R-GCN) for relational fraud detection
- Real-time Analysis: Fraud risk scoring and pattern detection
- Interactive Dashboard: Streamlit-based visualization and analytics
- IEEE-CIS Dataset: Using real-world anonymized transaction data
IEEE-CIS Fraud Detection Dataset
- Anonymized online transaction data
- User identity and device metadata
- Transaction amounts, card details, timestamps
- Fraud labels for supervised learning
FRAUD/
├── data/ # Raw and processed datasets
│ ├── raw/ # Original IEEE-CIS data
│ └── processed/ # Cleaned and transformed data
├── database/ # Database related files
│ ├── schema/ # SQL schema definitions
│ ├── migrations/ # Database migration scripts
│ └── queries/ # Common SQL queries
├── src/ # Source code
│ ├── config/ # Configuration files
│ ├── database/ # Database connection & utilities
│ ├── preprocessing/ # Data cleaning & transformation
│ ├── graph/ # Graph construction modules
│ ├── models/ # GNN model architectures
│ ├── training/ # Training & evaluation pipelines
│ └── visualization/ # Dashboard & plotting utilities
├── notebooks/ # Jupyter notebooks for exploration
├── tests/ # Unit and integration tests
├── docs/ # Documentation
│ ├── ER_diagram.md # Entity-Relationship diagram
│ └── graph_schema.md # Graph structure documentation
├── checkpoints/ # Model checkpoints
├── logs/ # Training and system logs
├── requirements.txt # Python dependencies
├── .env.example # Environment variables template
└── README.md # This file
- ACCOUNT: User account metadata and risk profiles
- MERCHANT: Merchant information and categories
- DEVICE: Device records and metadata
- TRANSACTION: Transaction records with fraud labels
- SHARED_DEVICE: Device-sharing patterns between accounts
| Component | Technology |
|---|---|
| Database | PostgreSQL |
| ML Framework | PyTorch, PyTorch Geometric |
| Graph Processing | NetworkX, DGL |
| Visualization | Streamlit, Plotly, Matplotlib |
| Data Processing | Pandas, NumPy |
| Environment | Python 3.9+ |
- Python 3.9 or higher
- PostgreSQL 13 or higher
- 8GB+ RAM recommended
- GPU (optional but recommended for training)
cd FRAUD
python -m venv venv
.\venv\Scripts\Activate.ps1 # Windows PowerShell
pip install -r requirements.txt# Copy environment template
cp .env.example .env
# Edit .env with your PostgreSQL credentials
# Then create database
python src/database/setup_db.py# Download IEEE-CIS dataset (place in data/raw/)
# Then run preprocessing
python src/preprocessing/load_data.pypython src/graph/build_graph.pypython src/training/train.py --config src/config/train_config.yamlstreamlit run src/visualization/dashboard.py- Precision: Accuracy of fraud predictions
- Recall: Coverage of actual fraud cases
- F1-Score: Harmonic mean of precision and recall
- ROC-AUC: Area under ROC curve
- Comparison: Baseline vs GNN performance
- ✅ Functional PostgreSQL database with fraud transaction records
- ✅ GNN-based fraud classification model
- ✅ Interactive dashboard with:
- Fraud cluster visualization
- Risk score trends
- Network relationship maps
- Real-time transaction monitoring
This is an academic/research project. For modifications or improvements, please document your changes thoroughly.
- IEEE-CIS Fraud Detection Dataset (Kaggle)
- PyTorch Geometric Documentation
- Graph Neural Networks for Fraud Detection (Research Papers)
Status: 🚧 Under Active Development