Skip to content

ravali6132/Spam_Email_Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

📧 Email Spam Classifier

📌 Project Overview

The Email Spam Classifier is a Machine Learning project that detects whether an email/message is Spam or Ham (Not Spam) using Natural Language Processing (NLP) techniques.

This project uses:

  • Python for implementation
  • Pandas & NumPy for data handling
  • TF-IDF Vectorization for text feature extraction
  • Logistic Regression for classification
  • Scikit-learn for machine learning utilities

The model is trained on a dataset containing labeled email messages and predicts whether a given message is spam or not.


🎯 Objectives

  • Preprocess email/message text data
  • Convert text into numerical features using TF-IDF
  • Train a machine learning model for spam detection
  • Evaluate the model accuracy
  • Predict whether new messages are spam or ham

🛠️ Technologies Used

  • Python
  • Jupyter Notebook
  • NumPy
  • Pandas
  • Scikit-learn

📂 Project Structure

Email-Spam-Classifier/
│
├── Email Spam Classifier.ipynb   # Main project notebook
├── mail_data.csv                 # Dataset used for training
├── README.md                     # Project documentation

📊 Dataset Information

The dataset contains:

  • Category → Label indicating whether the message is spam or ham
  • Message → The actual email/text message

Example:

Category Message
ham Hello, how are you?
spam Congratulations! You won a prize.

⚙️ Workflow of the Project

1. Import Required Libraries

The project imports libraries for:

  • Data manipulation
  • Machine learning
  • Text vectorization
  • Model evaluation

2. Load Dataset

The dataset is loaded using Pandas.

import pandas as pd

df = pd.read_csv('mail_data.csv')

3. Data Preprocessing

  • Handle missing values

  • Convert labels:

    • spam → 0
    • ham → 1
data = df.where((pd.notnull(df)), '')

4. Feature Extraction

Text messages are converted into numerical vectors using TF-IDF Vectorizer.

from sklearn.feature_extraction.text import TfidfVectorizer

feature_extraction = TfidfVectorizer(min_df=1, stop_words='english', lowercase=True)

5. Split Dataset

The dataset is divided into training and testing data.

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=3)

6. Train the Model

The project uses Logistic Regression.

from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X_train_features, Y_train)

7. Model Evaluation

Accuracy is calculated using:

from sklearn.metrics import accuracy_score

8. Prediction

The trained model predicts whether a new email is spam or ham.


🚀 How to Run the Project

Step 1: Clone the Repository

git clone https://github.com/your-username/email-spam-classifier.git

Step 2: Navigate to the Project Folder

cd email-spam-classifier

Step 3: Install Required Libraries

pip install numpy pandas scikit-learn jupyter

Step 4: Run the Jupyter Notebook

jupyter notebook

Open:

Email Spam Classifier.ipynb

📈 Machine Learning Model Used

Logistic Regression

Logistic Regression is a supervised machine learning algorithm used for binary classification problems.

Advantages:

  • Simple and efficient
  • Fast training
  • Good performance for text classification
  • Works well with TF-IDF features

✅ Expected Output

The model predicts:

Spam Message

OR

Ham Message

📌 Future Improvements

  • Use advanced NLP techniques
  • Implement Deep Learning models
  • Build a web application interface
  • Improve accuracy with larger datasets
  • Add real-time email filtering

📷 Sample Prediction

input_mail = ["Congratulations! You have won a free ticket"]

Output:

Spam Mail

🤝 Contribution

Contributions are welcome.

Steps:

  1. Fork the repository
  2. Create a new branch
  3. Make changes
  4. Submit a pull request

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors