🚫 Spam Email Detection Web App

A comprehensive Machine Learning project to classify text messages or emails as "Spam" (unwanted/malicious) or "Ham" (legitimate). This project includes raw data preprocessing, exploratory data analysis via interactive visualizations, and a fully functional Streamlit web application powered by a custom-trained Scikit-Learn Naive Bayes Classifier.

✅ Project Overview & Goals

The primary goal of this application is to serve as both an educational Data Science toolkit and a practical, user-friendly utility for detecting spam.

Preprocess and clean raw text data.
Explore characteristics like message length and word frequency.
Train and evaluate traditional Machine Learning models (MultinomialNB).
Expose the trained model via a Streamlit Web Dashboard.
Ensure non-technical users can interact easily using automation scripts.

🚀 Quick Start (Running the App)

We have provided simple, clickable scripts so anyone can run the Web App instantly without understanding the underlying code.

On Windows: Simply double-click on run_app.bat inside the project folder. This will automatically install requirements, train the ML model, and open the app in your browser window.

On Mac / Linux: Open your terminal, navigate to the folder, and run:

<<<<<<< Updated upstream
git clone https://github.com/PiYuSh7-2/spam-email-detection.git


### Thankyou'll 
=======
bash run_app.sh

📦 Build & Deployment (For Developers)

If you are a developer looking to explore the code, modify the data, or deploy this project to a server, follow these steps.

Local Development Setup:

Clone the repository:

git clone https://github.com/PiYuSh7-2/spam-email-detection.git
cd spam-email-detection

Create and activate a Python virtual environment:

# Windows
python -m venv venv
venv\Scripts\activate

# Mac/Linux
python3 -m venv venv
source venv/bin/activate

Install required packages:
```
pip install -r requirements.txt
```
Train the Machine Learning Model. (This forces the code to read the CSV, extract features, and save .pkl files):
```
python train_model.py
```
Launch the Streamlit application natively in dev mode:
```
streamlit run app.py
```

Production Deployment Notes:

To deploy this application publicly (e.g., via Heroku, Render, or Streamlit Cloud):

The requirements.txt file is already optimized for standard deployment environments.
Ensure the trained objects (models/spam_classifier_model.pkl and models/count_vectorizer.pkl) are committed to your repository OR ensure train_model.py is configured as a pre-build step before the server boots up.

📌 Tech Stack & Architecture

Component	Tool/Library	Purpose
Core Logic	Python	Main programming language
Data Handling	Pandas	Data loading, manipulation, and CSV parsing
Machine Learning	Scikit-learn	Features extraction (`CountVectorizer`), Classification (`MultinomialNB`)
Front-End/UI	Streamlit	Rapid Web Application interface
Visualizations	Seaborn, Matplotlib, Plotly	Static and interactive charts
Text Viz	WordCloud	Highlighting common spam vs ham vocabulary

Architectural Flow: app.py acts as the frontend controller. It loads the pre-trained .pkl artifacts (saved locally by train_model.py). When a user types text in the UI, it passes the text through the CountVectorizer object to create a feature array, which is then fed to the MultinomialNB model object to predict 1 (Spam) or 0 (Ham).

📂 Project Structure Overview

spam-email-detection/
│
├── data/
│   ├── spam.csv                 # Raw dataset
│   └── cleaned_data.csv         # Processed dataset (generated via notebooks)
│
├── models/                      # Generated automatically by train_model.py
│   ├── count_vectorizer.pkl     # Saved feature extractor
│   └── spam_classifier_model.pkl# Saved Naive Bayes model
│
├── notebooks/                   # Jupyter exploratory notebooks
│   ├── data_preprocessing_and_visualization.ipynb
│   └── visualizations_and_storytelling.ipynb
│
├── interactive/                 # Exported interactive dashboard files
│   └── interactive_plot.html
│
├── app.py                       # Main Streamlit web application
├── train_model.py               # Script to train ML model & save .pkl files
├── run_app.bat                  # Automation script for Windows
├── run_app.sh                   # Automation script for Mac/Linux
├── requirements.txt             # Dependency definitions
└── README.md                    # Project documentation

🧪 Testing

Presently, there is no formal testing suite (e.g., PyTest) configured for the front-end components. However, you can observe the model's performance metrics directly on training. When executing python train_model.py, the terminal will log the Test Set Accuracy Metrics. The baseline Naive Bayes model achieves ~98.5% accuracy.

To run informal tests on the ML pipeline:

python train_model.py

🤝 Contribution Guidelines & Code of Conduct

Fork the Repository and clone it locally.
Create a Feature Branch (git checkout -b feature/AmazingFeature).
Commit your changes. Focus on clear, concise, descriptive commit messages.
Push to the Branch (git push origin feature/AmazingFeature).
Open a Pull Request referencing the issue you are fixing.

Code of Conduct: Please maintain a friendly and collaborative environment. Be respectful when creating issues or participating in Pull Request reviews.

📝 Known Issues & FAQ

Q: "I am getting Command 'python' not found when running the .sh script on Linux." Fix: You may have python3 installed rather than python. Open run_app.sh in a text editor and change occurrences of python to python3.

Q: "The app crashes on startup saying FileNotFoundError: No such file or directory: 'models/spam_classifier_model.pkl'." Fix: The app cannot find the trained model. Ensure you have run python train_model.py at least once before executing streamlit run app.py. (The .bat/.sh scripts handle this automatically).

Q: Does this filter catch modern phishing links? Fix: The current dataset is heavily based on older SMS spam and short promotional emails. Highly sophisticated, context-aware modern phishing emails may slip through.

🔄 Migration Note

v2.0 (Current): The project has migrated from simply being a collection of Jupyter Notebooks for Exploratory Data Analysis into a fully deployed standalone Streamlit Web Application using saved persistent Scikit-Learn .pkl model artifacts.

Authors & Licensing

Built and maintained by Piyushxgit & Contributors.

Thank you for exploring!

Stashed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚫 Spam Email Detection Web App

✅ Project Overview & Goals

🚀 Quick Start (Running the App)

📦 Build & Deployment (For Developers)

Local Development Setup:

Production Deployment Notes:

📌 Tech Stack & Architecture

📂 Project Structure Overview

🧪 Testing

🤝 Contribution Guidelines & Code of Conduct

📝 Known Issues & FAQ

🔄 Migration Note

Authors & Licensing

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
interactive		interactive
models		models
notebooks		notebooks
outputs		outputs
venv		venv
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
run_app.bat		run_app.bat
run_app.sh		run_app.sh
train_model.py		train_model.py

Folders and files

Latest commit

History

Repository files navigation

🚫 Spam Email Detection Web App

✅ Project Overview & Goals

🚀 Quick Start (Running the App)

📦 Build & Deployment (For Developers)

Local Development Setup:

Production Deployment Notes:

📌 Tech Stack & Architecture

📂 Project Structure Overview

🧪 Testing

🤝 Contribution Guidelines & Code of Conduct

📝 Known Issues & FAQ

🔄 Migration Note

Authors & Licensing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages