Detecting fake and genuine reviews using Natural Language Processing (NLP) and Machine Learning to improve trust and reliability in online platforms.
In today's digital world, online reviews play an important role in influencing customer decisions. However, many platforms contain fake or spam reviews that mislead users.
This project aims to build a system that can automatically classify reviews as Real (Genuine) or Fake (Spam) using Natural Language Processing (NLP) and Machine Learning / Deep Learning techniques.
- Detect fake and genuine reviews automatically
- Apply NLP techniques to process text data
- Convert text into numerical features using TF-IDF
- Train ML/DL models for classification
- Evaluate model performance using standard metrics
Amazon Reviews Dataset
- Review Text
- Rating (1–5 stars)
- Reviewer ID
- Product ID
- Verified Purchase
- Real Review (0)
- Fake Review (1)
Dataset → Preprocessing → Feature Extraction → Model Training → Testing → Prediction
Raw reviews are cleaned before processing.
- Convert text to lowercase
- Remove stopwords (is, the, and, etc.)
- Tokenization (split into words)
- Stemming / Lemmatization
- Remove punctuation and special characters
Input:
"This product is AMAZING!!!"
Output:
product amazing
TF-IDF converts text into numerical vectors.
- TF (Term Frequency): frequency of word in document
- IDF (Inverse Document Frequency): importance of word
TF-IDF = TF × IDF
- Important words get higher weight
- Common words get lower weight
- Logistic Regression
- Naive Bayes
- Support Vector Machine (SVM)
- Recurrent Neural Network (RNN)
- Long Short-Term Memory (LSTM)
- BERT (Bidirectional Encoder Representations from Transformers)
- Input: TF-IDF vectors
- Output: Real / Fake label
- Data split: 80% training, 20% testing
- Accuracy
- Precision
- Recall
- F1-score
| Actual | Predicted | Result |
|---|---|---|
| Fake | Fake | Correct |
| Real | Fake | Wrong |
The trained model predicts whether a new review is real or fake.
Input:
"Excellent product!!! Must buy!!!"
Output:
Fake (Probability: 0.87)
Input:
"I used this product for 2 weeks, battery is good but camera is average"
Output:
Real (Probability: 0.91)
- Repetitive words
- Too many exclamation marks
- Generic statements
- No real experience
- Detailed explanation
- Balanced opinion (pros + cons)
- Natural writing style
- Python
- Scikit-learn
- TensorFlow / Keras
- NLTK / SpaCy
- Pandas
- NumPy
<img width="338" height="435" alt="{8CB86153-51A2-4306-A959-861FB53A84EC}" src="https://github.com/user-attachments/assets/183209a7-06ba-4aa5-863e-e25ba4a5ab1c" />
git clone [https://github.com/komalkhatod1105/fake-review-detection.git](https://github.com/komalkhatod1105/fake-review-detection.git)
pip install -r requirements.txt
python main.py
git clone https://github.com/komalkhatod1105/Fake_Review_Detection_Project.git
cd Fake_Review_Detection_Projectpython -m venv venv
venv\Scripts\activatepython3 -m venv venv
source venv/bin/activatepip install -r requirements.txtIf requirements.txt is not available, install manually:
pip install pandas numpy scikit-learn matplotlib seaborn textblob nltk joblibpython -m textblob.download_corporapython train.py- Use BERT for higher accuracy
- Add web interface (React / MERN stack)
- Real-time review detection
- Deploy on cloud (AWS / Heroku)
- Add multilingual support
- Dataset quality affects accuracy
- Fake reviews becoming more realistic
- Model may misclassify borderline cases
This project demonstrates how NLP and machine learning can be used to detect fake reviews effectively. It improves reliability in e-commerce platforms by filtering out spam reviews and helping users make better decisions. Co-authored-by: Second Account Name komal1105khatod@gmail.com