Skip to content

amit4009/Sentiment_Analysis

Repository files navigation

Sentiment Analysis with Neural Networks

Overview

This project builds a sentiment analysis system for classifying IMDb movie reviews as positive or negative. The goal is to compare multiple neural network architectures and select the best-performing model for sentiment classification.

The project compares:

  • Simple Neural Network
  • Convolutional Neural Network (CNN)
  • Long Short-Term Memory Network (LSTM)

The LSTM model achieved the best performance and was saved for prediction on unseen IMDb reviews.

Business Problem

Companies collect large volumes of customer reviews, product feedback, and user-generated text. Manually reading and classifying this feedback does not scale. Sentiment analysis helps convert unstructured text into structured signals that can support customer experience monitoring, product feedback analysis, and review classification.

This project demonstrates how neural networks can classify review sentiment and automate basic text interpretation.

Dataset

  • Dataset: IMDb Movie Reviews Dataset
  • Records: 50,000 reviews
  • Target Variable: Sentiment
  • Classes: Positive, Negative
  • Source: Kaggle IMDb Dataset of 50K Movie Reviews

The dataset contains movie reviews labeled as positive or negative. The text is cleaned, tokenized, padded, and converted into numerical sequences for neural network training.

Methodology

  • Loaded the IMDb review dataset
  • Cleaned review text by removing HTML tags, special characters, numbers, and noise
  • Converted all text to lowercase
  • Removed stopwords using NLTK
  • Encoded sentiment labels into numerical values
  • Tokenized reviews and padded sequences to a fixed length
  • Loaded pretrained GloVe word embeddings
  • Built an embedding matrix for the dataset vocabulary
  • Trained three neural network architectures: Simple NN, CNN, and LSTM
  • Compared test accuracy and loss across all models
  • Selected the LSTM model as the best-performing model
  • Saved the trained LSTM model and tokenizer for future predictions
  • Predicted sentiment scores for unseen IMDb review samples
  • Built a basic Flask web app for sentiment prediction

Word Embeddings

This project uses pretrained GloVe embeddings to convert words into dense vector representations. The embedding layer helps the model capture semantic relationships between words instead of treating every token as an isolated feature.

Text to Numbers

Word Embedding

Model Architectures

1. Simple Neural Network

The baseline neural network uses an embedding layer, flattening layer, dense layer, dropout, and sigmoid output for binary classification.

2. Convolutional Neural Network

The CNN model uses convolution and pooling layers to detect local text patterns that are useful for sentiment classification.

3. Long Short-Term Memory Network

The LSTM model captures sequence patterns in review text and performed best among the three tested architectures.

Model Architecture

Neural Network Architecture

Training Setup

  • Train/Test Split: 80/20
  • Batch Size: 128
  • Epochs: 20
  • Loss Function: Binary Crossentropy
  • Optimizer: Adam
  • Embedding: GloVe 100-dimensional word vectors

Model Evaluation

Model Test Accuracy Test Loss
Simple Neural Network 75.46% 0.5868
Convolutional Neural Network 84.55% 0.4196
LSTM 87.31% 0.3073

Selected Model: LSTM

The LSTM model achieved the highest test accuracy and lowest test loss, making it the best-performing model in this experiment.

Prediction on Unseen Reviews

The trained LSTM model was used to predict sentiment scores for unseen IMDb reviews. The predictions were saved to a CSV file for review and downstream use.

Predicted Reviews

Web Application

A basic Flask application was built to serve sentiment predictions through a web interface.

Web App

Business Impact

This project demonstrates how neural networks can automate sentiment classification for large volumes of text. A similar workflow could be applied to customer reviews, product feedback, support tickets, app reviews, or social media comments to help teams identify positive and negative sentiment patterns faster.

Tech Stack

Python, Pandas, NumPy, TensorFlow, Keras, NLTK, GloVe Embeddings, CNN, LSTM, Flask, Jupyter Notebook

Repository Structure

Sentiment_Analysis/
├── images/
├── templates/
├── IMDB_Dataset.csv
├── IMDb_Unseen_Reviews.csv
├── IMDB_Review_sentiment_Analysis.ipynb
├── lstm_model.h5
├── tokenizer.pickle
├── app.py
├── requirements.txt
└── README.md

How to Run

git clone https://github.com/amit4009/Sentiment_Analysis.git
cd Sentiment_Analysis
pip install -r requirements.txt
jupyter notebook

Then open:

IMDB_Review_sentiment_Analysis.ipynb

Run the Flask App

python app.py

Limitations

  • The project uses IMDb movie reviews, so results may not generalize directly to product reviews, support tickets, or social media text.
  • The model performs binary sentiment classification only: positive vs. negative.
  • Neutral sentiment is not modeled.
  • The saved model uses the legacy .h5 format.
  • No transformer-based models such as BERT or DistilBERT are compared.
  • The project does not include model monitoring or production deployment infrastructure.

Future Improvements

  • Add neutral sentiment as a third class
  • Compare LSTM against BERT or DistilBERT
  • Save the model using the newer .keras format
  • Add confusion matrix, precision, recall, and F1-score
  • Add explainability using SHAP or LIME
  • Improve the Flask app UI and add API endpoint examples
  • Add model monitoring for sentiment drift over time

About

IMDb sentiment classification using neural networks, GloVe embeddings, LSTM, and Flask deployment.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors