This project focuses on building a Natural Language Processing (NLP) model to classify SMS messages as Spam or Ham (Non-Spam).
The project demonstrates various text preprocessing techniques and feature extraction methods commonly used in NLP applications.
Spam messages are unwanted messages that can affect user experience and security. The objective of this project is to build a machine learning model capable of automatically identifying whether an SMS message is spam or ham.
- Perform text preprocessing.
- Clean and normalize text data.
- Convert text into numerical features.
- Train machine learning models.
- Classify messages into Spam or Ham categories.
- Lowercase conversion
- Tokenization
- Stopword Removal
- Stemming
- Lemmatization
- Regular Expressions
- Bag of Words
- CountVectorizer
- TF-IDF Vectorizer
- Python
- Pandas
- NumPy
- NLTK
- Scikit-Learn
- Matplotlib
- Seaborn
- Data Cleaning
- Exploratory Data Analysis
- Text Preprocessing
- Tokenization
- Stemming and Lemmatization
- Feature Extraction
- Model Building
- Model Evaluation
sms-spam-classification-nlp
│
├── data/
├── sms_spam_classifier.ipynb
├── requirements.txt
├── README.md
└── images/
- Natural Language Processing
- Text Cleaning
- Tokenization
- Regular Expressions
- Stemming
- Lemmatization
- Count Vectorization
- TF-IDF Vectorization
- Machine Learning
- Feature Engineering
- Email Spam Filtering
- SMS Spam Detection
- Chatbots
- Sentiment Analysis
- Text Classification
- Information Retrieval
- Word2Vec Embeddings
- GloVe Embeddings
- LSTM Models
- BERT Transformers
- Hyperparameter Tuning
- Model Deployment using Streamlit
Deebesh Sundar
Machine Learning & Data Science Practitioner