SMS Spam Classification using NLP

Overview

This project focuses on building a Natural Language Processing (NLP) model to classify SMS messages as Spam or Ham (Non-Spam).

The project demonstrates various text preprocessing techniques and feature extraction methods commonly used in NLP applications.

Problem Statement

Spam messages are unwanted messages that can affect user experience and security. The objective of this project is to build a machine learning model capable of automatically identifying whether an SMS message is spam or ham.

Objectives

Perform text preprocessing.
Clean and normalize text data.
Convert text into numerical features.
Train machine learning models.
Classify messages into Spam or Ham categories.

NLP Techniques Used

Text Preprocessing

Lowercase conversion
Tokenization
Stopword Removal
Stemming
Lemmatization
Regular Expressions

Feature Engineering

Bag of Words
CountVectorizer
TF-IDF Vectorizer

Libraries Used

Python
Pandas
NumPy
NLTK
Scikit-Learn
Matplotlib
Seaborn

Workflow

Data Cleaning
Exploratory Data Analysis
Text Preprocessing
Tokenization
Stemming and Lemmatization
Feature Extraction
Model Building
Model Evaluation

Project Structure

sms-spam-classification-nlp
│
├── data/
├── sms_spam_classifier.ipynb
├── requirements.txt
├── README.md
└── images/

Skills Demonstrated

Natural Language Processing
Text Cleaning
Tokenization
Regular Expressions
Stemming
Lemmatization
Count Vectorization
TF-IDF Vectorization
Machine Learning
Feature Engineering

Applications

Email Spam Filtering
SMS Spam Detection
Chatbots
Sentiment Analysis
Text Classification
Information Retrieval

Future Improvements

Word2Vec Embeddings
GloVe Embeddings
LSTM Models
BERT Transformers
Hyperparameter Tuning
Model Deployment using Streamlit

Author

Deebesh Sundar

Machine Learning & Data Science Practitioner

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
NLPprac.ipynb		NLPprac.ipynb
README.md		README.md
big.txt		big.txt
pract.txt		pract.txt
spam detector.xlsx		spam detector.xlsx
spelling correction document.txt		spelling correction document.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SMS Spam Classification using NLP

Overview

Problem Statement

Objectives

NLP Techniques Used

Text Preprocessing

Feature Engineering

Libraries Used

Workflow

Project Structure

Skills Demonstrated

Applications

Future Improvements

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SMS Spam Classification using NLP

Overview

Problem Statement

Objectives

NLP Techniques Used

Text Preprocessing

Feature Engineering

Libraries Used

Workflow

Project Structure

Skills Demonstrated

Applications

Future Improvements

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages