Chicago Crime Analytics: Domestic Crime Prediction Using Big Data & SparkML

Predictive Modeling on 8M+ Real Crime Records using PySpark on Databricks
_By Yogesh Sivakumar

Overview

Built a binary classifier to predict whether a reported crime in Chicago was domestic or non-domestic using 8M+ records from 2000–2025. Leveraged big data tools on Databricks with PySpark, SparkSQL, and SparkML. Performed feature engineering, statistical tests, and optimization to handle the large dataset effectively.

Objectives

Predict domestic crimes using historical data
Handle big data using scalable tools (PySpark & SparkSQL)
Evaluate model performance and provide actionable insights

Tools & Techniques

Platform: Databricks Community Edition
Tech Stack: PySpark, SparkSQL, SparkML
Model: Random Forest Classifier
Metrics: Accuracy, Precision, Recall, F1-Score
Optimization: Caching, Repartitioning

Key Results

Metric	Value
Accuracy	88.7%
Precision	87.9%
Recall	88.0%
F1-Score	87.9%

Files in this Repository

Chicago Crime Data Analysis and Domestic Crime Prediction.pdf – Full project report
Chicago_Crime_Analysis.html – Databricks notebook (HTML export)
🌐 View on Databricks

Future Work

Hyperparameter tuning and cross-validation
Use advanced models (e.g., GBT, XGBoost)
Apply data balancing techniques (SMOTE)
Integrate MLflow for tracking and reproducibility

👥 Authors

Yogesh Sivakumar

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Chicago-Crime-Analytics-Report.pdf		Chicago-Crime-Analytics-Report.pdf
Code-File.ipynb		Code-File.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chicago Crime Analytics: Domestic Crime Prediction Using Big Data & SparkML

Overview

Objectives

Tools & Techniques

Key Results

Files in this Repository

Future Work

👥 Authors

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Chicago Crime Analytics: Domestic Crime Prediction Using Big Data & SparkML

Overview

Objectives

Tools & Techniques

Key Results

Files in this Repository

Future Work

👥 Authors

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages