Web Scraping and Sentiment Analysis on Digikala Reviews

Project Overview

This project involves web scraping reviews from the online shopping platform Digikala. The reviews are extracted for various products and analyzed based on their ratings to classify them into positive and negative sentiments.

Dataset

The dataset is created by scraping customer reviews and ratings from Digikala. The data includes:

Rating: The customer’s rating of the product, ranging from 1 to 5 stars.
Text: The review text written by the customer.
Sentiment: The sentiment of the review, classified as:
- 1 for positive sentiment (rating >= 4 stars)
- 0 for negative sentiment (rating < 4 stars)

The data is stored in a CSV file with three columns:

rating: Numeric value of the product rating (float).
text: The review written by the customer (string).
sentiment: The sentiment label (0 for negative, 1 for positive).

Methodology

Web Scraping: I used the Selenium library with an undetected Chrome driver to navigate through Digikala's product pages and extract review information.
Review Extraction:
- Product reviews are scraped from product pages using specific XPaths to locate the review elements.
- For each review, the rating and text are extracted.
- Based on the rating, reviews are labeled as positive or negative sentiment.
Iterative Scraping: To extract reviews across multiple pages, the code iterates through available pages and appends the reviews to a CSV file.
Sentiment Analysis: After the data is collected, the reviews are labeled with sentiment based on their ratings.

How to Run the Project

Make sure to have ChromeDriver installed and compatible with your Chrome browser version.

Example Output

A sample of the scraped data might look like this:

Rating	Text	Sentiment
5.0	"Great product! Very happy with the purchase."	1
2.0	"The quality was lower than expected, not worth the money."	0
4.5	"Good value for money, but the delivery was late."	1
3.0	"It's okay, but I've seen better products for the price."	0

Notes

The project is designed to scrape up to a certain number of reviews and pages, but can be adjusted for larger datasets if necessary.
Make sure to respect the terms of service of the website when scraping data.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
DigiKalaWebScraping.ipynb		DigiKalaWebScraping.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraping and Sentiment Analysis on Digikala Reviews

Project Overview

Dataset

Methodology

How to Run the Project

Example Output

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Web Scraping and Sentiment Analysis on Digikala Reviews

Project Overview

Dataset

Methodology

How to Run the Project

Example Output

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages