📧 Email Spam Classifier

📌 Project Overview

The Email Spam Classifier is a Machine Learning project that detects whether an email/message is Spam or Ham (Not Spam) using Natural Language Processing (NLP) techniques.

This project uses:

Python for implementation
Pandas & NumPy for data handling
TF-IDF Vectorization for text feature extraction
Logistic Regression for classification
Scikit-learn for machine learning utilities

The model is trained on a dataset containing labeled email messages and predicts whether a given message is spam or not.

🎯 Objectives

Preprocess email/message text data
Convert text into numerical features using TF-IDF
Train a machine learning model for spam detection
Evaluate the model accuracy
Predict whether new messages are spam or ham

🛠️ Technologies Used

Python
Jupyter Notebook
NumPy
Pandas
Scikit-learn

📂 Project Structure

Email-Spam-Classifier/
│
├── Email Spam Classifier.ipynb   # Main project notebook
├── mail_data.csv                 # Dataset used for training
├── README.md                     # Project documentation

📊 Dataset Information

The dataset contains:

Category → Label indicating whether the message is spam or ham
Message → The actual email/text message

Example:

Category	Message
ham	Hello, how are you?
spam	Congratulations! You won a prize.

⚙️ Workflow of the Project

1. Import Required Libraries

The project imports libraries for:

Data manipulation
Machine learning
Text vectorization
Model evaluation

2. Load Dataset

The dataset is loaded using Pandas.

import pandas as pd

df = pd.read_csv('mail_data.csv')

3. Data Preprocessing

Handle missing values
Convert labels:
- spam → 0
- ham → 1

data = df.where((pd.notnull(df)), '')

4. Feature Extraction

Text messages are converted into numerical vectors using TF-IDF Vectorizer.

from sklearn.feature_extraction.text import TfidfVectorizer

feature_extraction = TfidfVectorizer(min_df=1, stop_words='english', lowercase=True)

5. Split Dataset

The dataset is divided into training and testing data.

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=3)

6. Train the Model

The project uses Logistic Regression.

from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X_train_features, Y_train)

7. Model Evaluation

Accuracy is calculated using:

from sklearn.metrics import accuracy_score

8. Prediction

The trained model predicts whether a new email is spam or ham.

🚀 How to Run the Project

Step 1: Clone the Repository

git clone https://github.com/your-username/email-spam-classifier.git

Step 2: Navigate to the Project Folder

cd email-spam-classifier

Step 3: Install Required Libraries

pip install numpy pandas scikit-learn jupyter

Step 4: Run the Jupyter Notebook

jupyter notebook

Open:

Email Spam Classifier.ipynb

📈 Machine Learning Model Used

Logistic Regression

Logistic Regression is a supervised machine learning algorithm used for binary classification problems.

Advantages:

Simple and efficient
Fast training
Good performance for text classification
Works well with TF-IDF features

✅ Expected Output

The model predicts:

Spam Message

OR

Ham Message

📌 Future Improvements

Use advanced NLP techniques
Implement Deep Learning models
Build a web application interface
Improve accuracy with larger datasets
Add real-time email filtering

📷 Sample Prediction

input_mail = ["Congratulations! You have won a free ticket"]

Output:

Spam Mail

🤝 Contribution

Contributions are welcome.

Steps:

Fork the repository
Create a new branch
Make changes
Submit a pull request

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📧 Email Spam Classifier

📌 Project Overview

🎯 Objectives

🛠️ Technologies Used

📂 Project Structure

📊 Dataset Information

⚙️ Workflow of the Project

1. Import Required Libraries

2. Load Dataset

3. Data Preprocessing

4. Feature Extraction

5. Split Dataset

6. Train the Model

7. Model Evaluation

8. Prediction

🚀 How to Run the Project

Step 1: Clone the Repository

Step 2: Navigate to the Project Folder

Step 3: Install Required Libraries

Step 4: Run the Jupyter Notebook

📈 Machine Learning Model Used

Logistic Regression

✅ Expected Output

📌 Future Improvements

📷 Sample Prediction

🤝 Contribution

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Email Spam Classifier.ipynb		Email Spam Classifier.ipynb
README.md		README.md
mail_data.csv		mail_data.csv

Folders and files

Latest commit

History

Repository files navigation

📧 Email Spam Classifier

📌 Project Overview

🎯 Objectives

🛠️ Technologies Used

📂 Project Structure

📊 Dataset Information

⚙️ Workflow of the Project

1. Import Required Libraries

2. Load Dataset

3. Data Preprocessing

4. Feature Extraction

5. Split Dataset

6. Train the Model

7. Model Evaluation

8. Prediction

🚀 How to Run the Project

Step 1: Clone the Repository

Step 2: Navigate to the Project Folder

Step 3: Install Required Libraries

Step 4: Run the Jupyter Notebook

📈 Machine Learning Model Used

Logistic Regression

✅ Expected Output

📌 Future Improvements

📷 Sample Prediction

🤝 Contribution

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages