Performance IMprovement Suggestions

**Title:** Improving Sentiment Analysis Accuracy (Currently 90%) – Suggestions Needed

Hello

I built a Movie Sentiment Analysis model using TF-IDF + Logistic Regression and achieved 90% accuracy on the IMDB dataset.

**Current pipeline:**

* Text cleaning (HTML removal, contractions, punctuation removal)
* Stopword removal (keeping negations)
* Lemmatization with POS tagging
* TF-IDF (max_features=45000, ngram_range=(1,2))
* Models tried: Naive Bayes, KNN, Logistic Regression, SVM, Decision Tree

**Goal:** Improve accuracy to ~92 to 95%+

**What I’ve tried:**

* Ensemble methods (did not improve significantly)
* Hyperparameter tuning

**Questions:**

1. Are there better feature engineering techniques I should try?
2. Would word embeddings (Word2Vec, GloVe) help here?
3. Any suggestions for handling tricky cases like negations better?


Here is my notebook:
https://github.com/Varunkumar2516/IMDb-Sentiment-Analysis-NLP-Project/blob/master/1%20IMDB_Sentiment_Analyzer_Notebook%20.ipynb

Any suggestions or feedback would be really helpful. Thanks!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance IMprovement Suggestions #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Performance IMprovement Suggestions #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions