This project builds a sentiment analysis system for classifying IMDb movie reviews as positive or negative. The goal is to compare multiple neural network architectures and select the best-performing model for sentiment classification.
The project compares:
- Simple Neural Network
- Convolutional Neural Network (CNN)
- Long Short-Term Memory Network (LSTM)
The LSTM model achieved the best performance and was saved for prediction on unseen IMDb reviews.
Companies collect large volumes of customer reviews, product feedback, and user-generated text. Manually reading and classifying this feedback does not scale. Sentiment analysis helps convert unstructured text into structured signals that can support customer experience monitoring, product feedback analysis, and review classification.
This project demonstrates how neural networks can classify review sentiment and automate basic text interpretation.
- Dataset: IMDb Movie Reviews Dataset
- Records: 50,000 reviews
- Target Variable: Sentiment
- Classes: Positive, Negative
- Source: Kaggle IMDb Dataset of 50K Movie Reviews
The dataset contains movie reviews labeled as positive or negative. The text is cleaned, tokenized, padded, and converted into numerical sequences for neural network training.
- Loaded the IMDb review dataset
- Cleaned review text by removing HTML tags, special characters, numbers, and noise
- Converted all text to lowercase
- Removed stopwords using NLTK
- Encoded sentiment labels into numerical values
- Tokenized reviews and padded sequences to a fixed length
- Loaded pretrained GloVe word embeddings
- Built an embedding matrix for the dataset vocabulary
- Trained three neural network architectures: Simple NN, CNN, and LSTM
- Compared test accuracy and loss across all models
- Selected the LSTM model as the best-performing model
- Saved the trained LSTM model and tokenizer for future predictions
- Predicted sentiment scores for unseen IMDb review samples
- Built a basic Flask web app for sentiment prediction
This project uses pretrained GloVe embeddings to convert words into dense vector representations. The embedding layer helps the model capture semantic relationships between words instead of treating every token as an isolated feature.
The baseline neural network uses an embedding layer, flattening layer, dense layer, dropout, and sigmoid output for binary classification.
The CNN model uses convolution and pooling layers to detect local text patterns that are useful for sentiment classification.
The LSTM model captures sequence patterns in review text and performed best among the three tested architectures.
- Train/Test Split: 80/20
- Batch Size: 128
- Epochs: 20
- Loss Function: Binary Crossentropy
- Optimizer: Adam
- Embedding: GloVe 100-dimensional word vectors
| Model | Test Accuracy | Test Loss |
|---|---|---|
| Simple Neural Network | 75.46% | 0.5868 |
| Convolutional Neural Network | 84.55% | 0.4196 |
| LSTM | 87.31% | 0.3073 |
Selected Model: LSTM
The LSTM model achieved the highest test accuracy and lowest test loss, making it the best-performing model in this experiment.
The trained LSTM model was used to predict sentiment scores for unseen IMDb reviews. The predictions were saved to a CSV file for review and downstream use.
A basic Flask application was built to serve sentiment predictions through a web interface.
This project demonstrates how neural networks can automate sentiment classification for large volumes of text. A similar workflow could be applied to customer reviews, product feedback, support tickets, app reviews, or social media comments to help teams identify positive and negative sentiment patterns faster.
Python, Pandas, NumPy, TensorFlow, Keras, NLTK, GloVe Embeddings, CNN, LSTM, Flask, Jupyter Notebook
Sentiment_Analysis/
├── images/
├── templates/
├── IMDB_Dataset.csv
├── IMDb_Unseen_Reviews.csv
├── IMDB_Review_sentiment_Analysis.ipynb
├── lstm_model.h5
├── tokenizer.pickle
├── app.py
├── requirements.txt
└── README.md
git clone https://github.com/amit4009/Sentiment_Analysis.git
cd Sentiment_Analysis
pip install -r requirements.txt
jupyter notebookThen open:
IMDB_Review_sentiment_Analysis.ipynb
python app.py- The project uses IMDb movie reviews, so results may not generalize directly to product reviews, support tickets, or social media text.
- The model performs binary sentiment classification only: positive vs. negative.
- Neutral sentiment is not modeled.
- The saved model uses the legacy
.h5format. - No transformer-based models such as BERT or DistilBERT are compared.
- The project does not include model monitoring or production deployment infrastructure.
- Add neutral sentiment as a third class
- Compare LSTM against BERT or DistilBERT
- Save the model using the newer
.kerasformat - Add confusion matrix, precision, recall, and F1-score
- Add explainability using SHAP or LIME
- Improve the Flask app UI and add API endpoint examples
- Add model monitoring for sentiment drift over time





