A Python project to scrape book information from Books to Scrape, a website designed for practicing web scraping. This project extracts book titles, prices, availability, and star ratings using BeautifulSoup, and applies machine learning models for classification and regression tasks on the scraped data.
- π Scrapes book details from multiple pages
- π° Extracts price, availability, and star rating
- π Applies classification to predict star rating categories
- π Uses regression to analyze or predict book prices
- π¦ Saves data to CSV for further analysis
- π‘ Easy-to-read and beginner-friendly code
- Python
- BeautifulSoup β HTML parsing
- Requests β fetching web content
- Pandas β data manipulation
- Scikit-learn β classification and regression models
- Jupyter Notebook
- Goal: Predict book rating category (e.g., β β β ββ, β β β β β) using features like price, availability, and title length.
- Model Used: Random Forest / Logistic Regression (customizable)
- Goal: Predict the price of a book based on its features (e.g., rating, title features, availability)
- Model Used: Linear Regression / Decision Tree Regressor
- π¬ βBooks with higher star ratings tend to be priced slightly higher on average.β
- π βOut-of-stock books have lower average ratings.β