Sentiment Analysis with Neural Networks

Overview

This project builds a sentiment analysis system for classifying IMDb movie reviews as positive or negative. The goal is to compare multiple neural network architectures and select the best-performing model for sentiment classification.

The project compares:

Simple Neural Network
Convolutional Neural Network (CNN)
Long Short-Term Memory Network (LSTM)

The LSTM model achieved the best performance and was saved for prediction on unseen IMDb reviews.

Business Problem

Companies collect large volumes of customer reviews, product feedback, and user-generated text. Manually reading and classifying this feedback does not scale. Sentiment analysis helps convert unstructured text into structured signals that can support customer experience monitoring, product feedback analysis, and review classification.

This project demonstrates how neural networks can classify review sentiment and automate basic text interpretation.

Dataset

Dataset: IMDb Movie Reviews Dataset
Records: 50,000 reviews
Target Variable: Sentiment
Classes: Positive, Negative
Source: Kaggle IMDb Dataset of 50K Movie Reviews

The dataset contains movie reviews labeled as positive or negative. The text is cleaned, tokenized, padded, and converted into numerical sequences for neural network training.

Methodology

Loaded the IMDb review dataset
Cleaned review text by removing HTML tags, special characters, numbers, and noise
Converted all text to lowercase
Removed stopwords using NLTK
Encoded sentiment labels into numerical values
Tokenized reviews and padded sequences to a fixed length
Loaded pretrained GloVe word embeddings
Built an embedding matrix for the dataset vocabulary
Trained three neural network architectures: Simple NN, CNN, and LSTM
Compared test accuracy and loss across all models
Selected the LSTM model as the best-performing model
Saved the trained LSTM model and tokenizer for future predictions
Predicted sentiment scores for unseen IMDb review samples
Built a basic Flask web app for sentiment prediction

Word Embeddings

This project uses pretrained GloVe embeddings to convert words into dense vector representations. The embedding layer helps the model capture semantic relationships between words instead of treating every token as an isolated feature.

Model Architectures

1. Simple Neural Network

The baseline neural network uses an embedding layer, flattening layer, dense layer, dropout, and sigmoid output for binary classification.

2. Convolutional Neural Network

The CNN model uses convolution and pooling layers to detect local text patterns that are useful for sentiment classification.

3. Long Short-Term Memory Network

The LSTM model captures sequence patterns in review text and performed best among the three tested architectures.

Training Setup

Train/Test Split: 80/20
Batch Size: 128
Epochs: 20
Loss Function: Binary Crossentropy
Optimizer: Adam
Embedding: GloVe 100-dimensional word vectors

Model Evaluation

Model	Test Accuracy	Test Loss
Simple Neural Network	75.46%	0.5868
Convolutional Neural Network	84.55%	0.4196
LSTM	87.31%	0.3073

Selected Model: LSTM

The LSTM model achieved the highest test accuracy and lowest test loss, making it the best-performing model in this experiment.

Prediction on Unseen Reviews

The trained LSTM model was used to predict sentiment scores for unseen IMDb reviews. The predictions were saved to a CSV file for review and downstream use.

Web Application

A basic Flask application was built to serve sentiment predictions through a web interface.

Business Impact

This project demonstrates how neural networks can automate sentiment classification for large volumes of text. A similar workflow could be applied to customer reviews, product feedback, support tickets, app reviews, or social media comments to help teams identify positive and negative sentiment patterns faster.

Tech Stack

Python, Pandas, NumPy, TensorFlow, Keras, NLTK, GloVe Embeddings, CNN, LSTM, Flask, Jupyter Notebook

Repository Structure

Sentiment_Analysis/
├── images/
├── templates/
├── IMDB_Dataset.csv
├── IMDb_Unseen_Reviews.csv
├── IMDB_Review_sentiment_Analysis.ipynb
├── lstm_model.h5
├── tokenizer.pickle
├── app.py
├── requirements.txt
└── README.md

How to Run

git clone https://github.com/amit4009/Sentiment_Analysis.git
cd Sentiment_Analysis
pip install -r requirements.txt
jupyter notebook

Then open:

IMDB_Review_sentiment_Analysis.ipynb

Run the Flask App

python app.py

Limitations

The project uses IMDb movie reviews, so results may not generalize directly to product reviews, support tickets, or social media text.
The model performs binary sentiment classification only: positive vs. negative.
Neutral sentiment is not modeled.
The saved model uses the legacy .h5 format.
No transformer-based models such as BERT or DistilBERT are compared.
The project does not include model monitoring or production deployment infrastructure.

Future Improvements

Add neutral sentiment as a third class
Compare LSTM against BERT or DistilBERT
Save the model using the newer .keras format
Add confusion matrix, precision, recall, and F1-score
Add explainability using SHAP or LIME
Improve the Flask app UI and add API endpoint examples
Add model monitoring for sentiment drift over time

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis with Neural Networks

Overview

Business Problem

Dataset

Methodology

Word Embeddings

Model Architectures

1. Simple Neural Network

2. Convolutional Neural Network

3. Long Short-Term Memory Network

Training Setup

Model Evaluation

Prediction on Unseen Reviews

Web Application

Business Impact

Tech Stack

Repository Structure

How to Run

Run the Flask App

Limitations

Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
images		images
templates		templates
IMDB_Review_sentiment_Analysis.ipynb		IMDB_Review_sentiment_Analysis.ipynb
README.md		README.md
app.py		app.py
b2_preprocessing_function.py		b2_preprocessing_function.py
lstm_model.h5		lstm_model.h5
myvenv.yml		myvenv.yml
requirements.txt		requirements.txt
tokenizer.pickle		tokenizer.pickle

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis with Neural Networks

Overview

Business Problem

Dataset

Methodology

Word Embeddings

Model Architectures

1. Simple Neural Network

2. Convolutional Neural Network

3. Long Short-Term Memory Network

Training Setup

Model Evaluation

Prediction on Unseen Reviews

Web Application

Business Impact

Tech Stack

Repository Structure

How to Run

Run the Flask App

Limitations

Future Improvements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages