This project is a spam classification web app built using:
- Google Pretrained Word2Vec (300d)
- Average Word Embeddings
- XGBoost Classifier
- Streamlit Web Interface
- Semantic word embeddings using pretrained Word2Vec
- XGBoost model for classification
- Clean modular architecture
- Confidence score display
- Streamlit-based UI
- Text preprocessing (tokenization, stopword removal, lemmatization)
- Convert words → Average Word2Vec embedding
- XGBoost prediction
- Spam / Ham output
Accuracy: 96%
Precision: 98%
Recall: 98%
F1 Score: 98%
├── SMS Spam Classification.ipynb
├── SMSSpamCollection.txt
├── app.py
├── embeddings.py
├── google_w2v_model.pkl
├── model.py
├── preprocessing.py
└── requirements.txt
git clone https://github.com/Master-45-vic/SMS-Spam-Detection-.git
cd SMS-Spam-Detectionpip install -r requirements.txtDownload the pretrained GoogleNews Word2Vec model separately and place it in the project root folder:
GoogleNews-vectors-negative300.bin
(Note: File is ~1.5GB and not included in repository)
streamlit run app.pySMS Spam Collection Dataset from Kaggle.
Prashanth M
⭐ If you like this project
Give it a star on GitHub!