A comparative NLP project that analyzes sentiment in textual reviews using both lexicon-based (VADER) and transformer-based (roBERTa) approaches.
This project compares two sentiment analysis techniques on a large corpus of text reviews:
| Approach | Model | Type |
|---|---|---|
| VADER | Valence Aware Dictionary and sEntiment Reasoner | Lexicon-based, rule-based |
| roBERTa | Robustly optimized BERT pretraining approach | Transformer-based, deep learning |
Both models produce polarity scores (positive, negative, neutral) for comparative analysis.
- Size: ~500,000 textual reviews
- Task: Sentiment classification and polarity scoring
⚠️ The dataset is too large to host on GitHub (even when compressed). Please obtain the review dataset from your course materials or the original source.
├── Sentiment.ipynb # Main notebook with implementation and analysis
└── README.md
The notebook includes:
- Data preprocessing and exploratory analysis
- Data distribution visualizations
- VADER sentiment scoring and polarity outputs
- roBERTa sentiment scoring and polarity outputs
- Model comparison and polarity score analysis
- Limitations of each approach
pip install pandas numpy transformers torch vaderSentiment- Add your review dataset to the project directory
- Open
Sentiment.ipynbin Jupyter - Update the data path in the notebook
- Run all cells to preprocess, analyze, and compare sentiment scores
- Comparative performance of lexicon-based vs. transformer-based sentiment analysis
- Polarity score outputs from both VADER and roBERTa
- Discussion of strengths and limitations of each approach
Ivaylo Papazov
This project is available for educational purposes.
⭐ If you find this project useful, please consider giving it a star!