CrisisView: AI-Powered Emergency Signal Filtering

Project: Natural Language Processing (NLP) Pipeline for Disaster Tweet Classification

📌 Project Overview

CrisisView is an end-to-end NLP pipeline designed for Emergency Response stakeholders (e.g., Red Cross, FEMA). In the wake of a disaster, social media is often flooded with noise—metaphors ("This party is on fire"), movie reviews, and spam. This tool filters out that noise to identify real-time, actionable disaster alerts.

Objective: Classify tweets as Real Disaster (1) or Not a Real Disaster (0) with high precision.

📂 Dataset

Source: Kaggle - Natural Language Processing with Disaster Tweets
Size: ~7,600 Tweets
Classes: Binary Classification (Real vs. Fake/Metaphorical)

⚙️ Technical Approach

1. Advanced Preprocessing

We implemented a robust cleaning pipeline based on Exploratory Data Analysis (EDA):

Noise Removal: Targeted removal of HTML artifacts (&), news sharing terms (via), and platform-specific noise.
Normalization: Lowercasing and Lemmatization (WordNet) to reduce sparsity.
Privacy: Automated stripping of URLs and User Mentions (@user) to prevent overfitting to specific handles.

2. Feature Engineering (Sparse vs. Dense)

We compared three distinct vectorization strategies:

TF-IDF (Sparse): Captures explicit keyword signals (e.g., "Hiroshima", "flood").
Word2Vec (Dense - Custom): Trained from scratch on the dataset (demonstrates limitations of small-data embeddings).
GloVe (Dense - Pre-trained): Utilized Twitter-27B GloVe embeddings (100d) to leverage Transfer Learning from billions of tweets.

3. Modelling & Optimization

We moved beyond baseline defaults by implementing rigorous experimental controls:

Baseline: Multinomial Naive Bayes.
Classical ML (Tuned): Logistic Regression optimized via GridSearchCV (5-Fold Cross-Validation) to tune Regularization (C) and Solvers.
Deep Learning: A custom Neural Network (Keras/TensorFlow) architecture featuring:
- Frozen GloVe Embedding Layer
- GlobalAveragePooling1D
- Dense Layers with ReLU activation & Dropout for regularization.

📊 Results Summary

The project compared Generative vs. Discriminative models and Sparse vs. Dense features.

Model	Feature Set	Optimization	Performance Notes
Logistic Regression	TF-IDF	GridSearchCV	Top Performer. Excellent balance of Precision/Recall.
Deep Learning	GloVe (Transfer Learning)	Adam Optimizer	Competitive. Captures semantic meaning but computationally heavier.
Logistic Regression	Word2Vec (Custom)	Default	Underperformed due to small training corpus size.
Naive Bayes	TF-IDF	Default	Strong baseline but struggles with context/sarcasm.

Key Insight: While Deep Learning is powerful, TF-IDF with Tuned Logistic Regression proved highly effective for this specific dataset size, highlighting that complex models are not always better for short-text classification.

🚀 Key Features Implemented

✅ Text Cleaning: Regex + HTML decoding + Custom Stopwords.
✅ Visualization: WordClouds, N-Gram Bar Charts, and Embedding PCA Clusters.
✅ Transfer Learning: Integration of pre-trained GloVe vectors.
✅ Hyperparameter Tuning: GridSearchCV for optimal model configuration.
✅ Generative AI: Simple Markov Chain text generator (Bonus Task).

🛠️ Setup & Usage

Prerequisites

Ensure you have the following installed:

pip install -r ./requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
disaster_model.pkl		disaster_model.pkl
main.ipynb		main.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CrisisView: AI-Powered Emergency Signal Filtering

📌 Project Overview

📂 Dataset

⚙️ Technical Approach

1. Advanced Preprocessing

2. Feature Engineering (Sparse vs. Dense)

3. Modelling & Optimization

📊 Results Summary

🚀 Key Features Implemented

🛠️ Setup & Usage

Prerequisites

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CrisisView: AI-Powered Emergency Signal Filtering

📌 Project Overview

📂 Dataset

⚙️ Technical Approach

1. Advanced Preprocessing

2. Feature Engineering (Sparse vs. Dense)

3. Modelling & Optimization

📊 Results Summary

🚀 Key Features Implemented

🛠️ Setup & Usage

Prerequisites

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages