Skip to content

anniemburu/Sentimental-Analysis-with-BiLSTM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Sentiment Analysis with BiLSTM

πŸ“Œ Project Overview

This project implements Sentiment Analysis using a Bidirectional LSTM (BiLSTM) neural network. The goal is to classify text data into positive or negative sentiment, leveraging the ability of BiLSTMs to capture contextual information from both past and future tokens in a sequence.

The pipeline includes:

  • Data preprocessing and train/validation/test splitting.
  • Model training with early stopping to prevent overfitting.
  • Performance visualization (accuracy and loss).
  • Evaluation on test data.

βš™οΈ Installation and Setup

1. Clone the repository

git clone https://github.com/anniemburu/Sentimental-Analysis-with-BiLSTM

2. Create and activate a virtual environment (Recommend Anaconda or miniconda)

conda create -n myenv python=3.9

conda activate myenv

3. Install dependencies

All dependencies are listed in requirements.txt. Install them with:

pip install -r requirements.txt

4. Data setup

The processed dataset is expected at: bash datasets/processed/sentiment_data.csv. The data used from this project was sourced from Movie Review, Polarity Dataset. You can modify data_split in src/data/preprocessing.py if you wish to use a different dataset. You can modify bash data_split in bash src/data/preprocessing.py if you wish to use a different dataset.

πŸš€ Training the Model

Run the training pipeline with:

python train.py

This will:

  • Train the BiLSTM model on the training data.
  • Validate it on the validation set.
  • Save the trained model to bash src/models/model_final.h5 .
  • Generate training performance plots at bash src/results/model_performance.png .

πŸ“Š Data Source

The dataset consists of labeled text samples with binary sentiment labels (0 = Negative, 1 = Positive).

Source: Movie Review, Polarity Dataset.

Preprocessing: The data has been tokenized, padded to fixed sequence length, and split into training, validation, and test sets.

πŸ“‚ Project Structure

β”œβ”€β”€ datasets/
β”‚   └── processed/
β”‚       └── sentiment_data.csv
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ data/
β”‚   β”‚   β”œβ”€β”€ preprocessing.py
β”‚   β”‚   └── data_loader.py
β”‚   β”œβ”€β”€ models/
β”‚   β”‚   β”œβ”€β”€ base_model.py   # BiLSTM model architecture
β”‚   β”‚   └── model_final.h5   # Saved model
β”‚   └── results/
β”‚       └── model_performance.png
β”œβ”€β”€ train.py                # Training pipeline
β”œβ”€β”€ requirements.txt
└── README.md

πŸ”Ž Findings & Results

TBA

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages