This project implements Sentiment Analysis using a Bidirectional LSTM (BiLSTM) neural network. The goal is to classify text data into positive or negative sentiment, leveraging the ability of BiLSTMs to capture contextual information from both past and future tokens in a sequence.
The pipeline includes:
- Data preprocessing and train/validation/test splitting.
- Model training with early stopping to prevent overfitting.
- Performance visualization (accuracy and loss).
- Evaluation on test data.
git clone https://github.com/anniemburu/Sentimental-Analysis-with-BiLSTMconda create -n myenv python=3.9
conda activate myenvAll dependencies are listed in requirements.txt. Install them with:
pip install -r requirements.txtThe processed dataset is expected at:
bash datasets/processed/sentiment_data.csv.
The data used from this project was sourced from Movie Review, Polarity Dataset. You can modify data_split in src/data/preprocessing.py if you wish to use a different dataset. You can modify bash data_split in bash src/data/preprocessing.py if you wish to use a different dataset.
Run the training pipeline with:
python train.pyThis will:
- Train the BiLSTM model on the training data.
- Validate it on the validation set.
- Save the trained model to
bash src/models/model_final.h5. - Generate training performance plots at
bash src/results/model_performance.png.
The dataset consists of labeled text samples with binary sentiment labels (0 = Negative, 1 = Positive).
Source: Movie Review, Polarity Dataset.
Preprocessing: The data has been tokenized, padded to fixed sequence length, and split into training, validation, and test sets.
π Project Structure
βββ datasets/
β βββ processed/
β βββ sentiment_data.csv
βββ src/
β βββ data/
β β βββ preprocessing.py
β β βββ data_loader.py
β βββ models/
β β βββ base_model.py # BiLSTM model architecture
β β βββ model_final.h5 # Saved model
β βββ results/
β βββ model_performance.png
βββ train.py # Training pipeline
βββ requirements.txt
βββ README.md
TBA