Skip to content

komalkhatod1105/Fake_Review_Detection_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fake Review Detection using NLP & Deep Learning

Detecting fake and genuine reviews using Natural Language Processing (NLP) and Machine Learning to improve trust and reliability in online platforms.

Python Scikit-Learn NLTK TensorFlow Status


1. Introduction

In today's digital world, online reviews play an important role in influencing customer decisions. However, many platforms contain fake or spam reviews that mislead users.

This project aims to build a system that can automatically classify reviews as Real (Genuine) or Fake (Spam) using Natural Language Processing (NLP) and Machine Learning / Deep Learning techniques.


2. Objectives

  • Detect fake and genuine reviews automatically
  • Apply NLP techniques to process text data
  • Convert text into numerical features using TF-IDF
  • Train ML/DL models for classification
  • Evaluate model performance using standard metrics

3. Dataset

🔹 Source:

Amazon Reviews Dataset

🔹 Features:

  • Review Text
  • Rating (1–5 stars)
  • Reviewer ID
  • Product ID
  • Verified Purchase

🔹 Labels:

  • Real Review (0)
  • Fake Review (1)

4. System Architecture / Pipeline


Dataset → Preprocessing → Feature Extraction → Model Training → Testing → Prediction


5. Preprocessing (Text Cleaning)

Raw reviews are cleaned before processing.

Steps:

  • Convert text to lowercase
  • Remove stopwords (is, the, and, etc.)
  • Tokenization (split into words)
  • Stemming / Lemmatization
  • Remove punctuation and special characters

Example:

Input:


"This product is AMAZING!!!"

Output:


product amazing


6. Feature Extraction (TF-IDF)

TF-IDF converts text into numerical vectors.

Concept:

  • TF (Term Frequency): frequency of word in document
  • IDF (Inverse Document Frequency): importance of word

Formula:

TF-IDF = TF × IDF

Benefit:

  • Important words get higher weight
  • Common words get lower weight

7. Model Training

Machine Learning Models:

  • Logistic Regression
  • Naive Bayes
  • Support Vector Machine (SVM)

Deep Learning Models:

  • Recurrent Neural Network (RNN)
  • Long Short-Term Memory (LSTM)
  • BERT (Bidirectional Encoder Representations from Transformers)

Training Process:

  • Input: TF-IDF vectors
  • Output: Real / Fake label
  • Data split: 80% training, 20% testing

8. Model Evaluation

Metrics Used:

  • Accuracy
  • Precision
  • Recall
  • F1-score

Example:

Actual Predicted Result
Fake Fake Correct
Real Fake Wrong

9. Prediction Phase

The trained model predicts whether a new review is real or fake.

Example:

Input:


"Excellent product!!! Must buy!!!"

Output:


Fake (Probability: 0.87)

Input:


"I used this product for 2 weeks, battery is good but camera is average"

Output:


Real (Probability: 0.91)


10. Detection Logic

Fake Reviews:

  • Repetitive words
  • Too many exclamation marks
  • Generic statements
  • No real experience

Real Reviews:

  • Detailed explanation
  • Balanced opinion (pros + cons)
  • Natural writing style

11. Technologies Used

  • Python
  • Scikit-learn
  • TensorFlow / Keras
  • NLTK / SpaCy
  • Pandas
  • NumPy

12. Project Structure


<img width="338" height="435" alt="{8CB86153-51A2-4306-A959-861FB53A84EC}" src="https://github.com/user-attachments/assets/183209a7-06ba-4aa5-863e-e25ba4a5ab1c" />



13. How to Run the Project

Step 1: Clone Repository


git clone [https://github.com/komalkhatod1105/fake-review-detection.git](https://github.com/komalkhatod1105/fake-review-detection.git)

Step 2: Install Dependencies


pip install -r requirements.txt

Step 3: Run the Project


python main.py

📦 Installation

1️⃣ Clone the Repository

git clone https://github.com/komalkhatod1105/Fake_Review_Detection_Project.git
cd Fake_Review_Detection_Project

2️⃣ Create Virtual Environment (Recommended)

Windows

python -m venv venv
venv\Scripts\activate

Linux / macOS

python3 -m venv venv
source venv/bin/activate

3️⃣ Install Required Libraries

pip install -r requirements.txt

If requirements.txt is not available, install manually:

pip install pandas numpy scikit-learn matplotlib seaborn textblob nltk joblib

4️⃣ Download TextBlob/NLTK Resources

python -m textblob.download_corpora

5️⃣ Run the Project

python train.py

14. Future Improvements

  • Use BERT for higher accuracy
  • Add web interface (React / MERN stack)
  • Real-time review detection
  • Deploy on cloud (AWS / Heroku)
  • Add multilingual support

15. Limitations

  • Dataset quality affects accuracy
  • Fake reviews becoming more realistic
  • Model may misclassify borderline cases

16. Conclusion

This project demonstrates how NLP and machine learning can be used to detect fake reviews effectively. It improves reliability in e-commerce platforms by filtering out spam reviews and helping users make better decisions. Co-authored-by: Second Account Name komal1105khatod@gmail.com


About

Built a Fake Review Detection System using Machine Learning and Natural Language Processing (NLP) to detect deceptive online reviews. The system preprocesses review text, extracts relevant features, and classifies reviews as genuine or fake to improve the reliability of e-commerce platforms

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors