Sentiment Analysis on Tweets
Overview
The dataset can be downloaded from the below link. http://cs.stanford.edu/people/alecmgo/trainingandtestdata.zip
This project performs sentiment analysis on a dataset of tweets. The dataset contains labeled sentiments (0: Negative, 4: Positive), and the goal is to clean, analyze, and visualize the sentiment distribution.
Features
Data preprocessing (removal of unnecessary columns, text length calculations)
Exploratory Data Analysis (EDA)
Sentiment classification using basic machine learning techniques
Visualization of results
Prerequisites
Ensure you have the following installed before running the notebook:
Python 3.x
Jupyter Notebook
Required Python libraries:
pip install pandas numpy matplotlib seaborn scikit-learn
Dataset
You need to download the dataset manually since it is not included in the repository.
Download training.csv from Sentiment140 Dataset
Place the file in the appropriate directory (update the notebook path if needed)
How to Run
Clone this repository:
git clone https://github.com/yourusername/sentiment-analysis.git cd sentiment-analysis
Open Jupyter Notebook:
jupyter notebook
Run Sentiment Analysis on Tweets.ipynb step by step.
Expected Outputs
Summary statistics of sentiment classes
Preprocessed tweet samples
Visualizations of sentiment distribution
Contributing
Feel free to submit pull requests or report issues.