A machine learning project to analyze district-wise crime data (2010–2020) and classify regions into risk levels 🚨
This project analyzes crime data across Indian districts and assigns a risk score based on different types of crimes.
The workflow includes:
- Collecting raw district-wise crime data (2010–2020) 🗂️
- Assigning weights to different crime categories (e.g., murder > theft) ⚖️
- Normalizing by population → per capita crime 📐
- Scaling by median values for comparability 📊
- Computing risk scores for each district
- Training a Logistic Regression Model to classify into risk levels (0 = Safe → 4 = Very High Risk)
- Python 🐍 – Data analysis & ML
- Pandas – Data processing
- NumPy – Numerical operations
- Matplotlib / Seaborn – Visualization
- Scikit-learn – ML model (Logistic Regression)
- Source: District-wise crimes committed (IPC), India (2010–2020)
- Data contains:
- Yearly district-wise crime counts
- Population data
- Features created:
- Per capita crime rates
- Weighted scores based on severity
- Files are stored in:
data/📁
- Model used: Logistic Regression
- Why Logistic Regression?
- ✅ Simple & interpretable
- ✅ Handles categorical risk levels (0–4)
- ✅ Easy to explain coefficients
- Why not SVM / Random Forest?
- ❌ SVM → better for complex, non-linear data, harder to train, not multinomial by default
- ❌ Random Forest → powerful but computationally heavy & less interpretable
# Clone the repository
git clone https://github.com/your-username/crime-risk-scoring.git
cd crime-risk-scoring
# Create virtual environment (optional)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run the notebook / script
jupyter notebook analysis.ipynb