Title: Improving Sentiment Analysis Accuracy (Currently 90%) – Suggestions Needed
Hello
I built a Movie Sentiment Analysis model using TF-IDF + Logistic Regression and achieved 90% accuracy on the IMDB dataset.
Current pipeline:
- Text cleaning (HTML removal, contractions, punctuation removal)
- Stopword removal (keeping negations)
- Lemmatization with POS tagging
- TF-IDF (max_features=45000, ngram_range=(1,2))
- Models tried: Naive Bayes, KNN, Logistic Regression, SVM, Decision Tree
Goal: Improve accuracy to ~92 to 95%+
What I’ve tried:
- Ensemble methods (did not improve significantly)
- Hyperparameter tuning
Questions:
- Are there better feature engineering techniques I should try?
- Would word embeddings (Word2Vec, GloVe) help here?
- Any suggestions for handling tricky cases like negations better?
Here is my notebook:
https://github.com/Varunkumar2516/IMDb-Sentiment-Analysis-NLP-Project/blob/master/1%20IMDB_Sentiment_Analyzer_Notebook%20.ipynb
Any suggestions or feedback would be really helpful. Thanks!
Title: Improving Sentiment Analysis Accuracy (Currently 90%) – Suggestions Needed
Hello
I built a Movie Sentiment Analysis model using TF-IDF + Logistic Regression and achieved 90% accuracy on the IMDB dataset.
Current pipeline:
Goal: Improve accuracy to ~92 to 95%+
What I’ve tried:
Questions:
Here is my notebook:
https://github.com/Varunkumar2516/IMDb-Sentiment-Analysis-NLP-Project/blob/master/1%20IMDB_Sentiment_Analyzer_Notebook%20.ipynb
Any suggestions or feedback would be really helpful. Thanks!