This project focuses on extracting book data from Books to Scrape through web scraping, storing it in a CSV file, deriving SQL-based insights, and conducting Exploratory Data Analysis (EDA) with visualizations.
- Web Scraping: Extracted book details (Title, Price, Availability, Rating) and saved them in
Cleaned_Build_week_project1.csv. - SQL Insights: Loaded the dataset into MySQL Workbench and generated insights using SQL queries.
- EDA & Visualization: Analyzed data trends and visualized them using Python.
Cleaned_Build_week_project1.csv: Contains scraped book data with the following columns:Title: Book titlePrice: Price in GBP (£)Availability: Stock statusRating: Book rating (1 to 5 stars)
| File Name | Description |
|---|---|
BW_Web_scraping.ipynb |
Jupyter notebook for web scraping |
Cleaned_Build_week_project1.csv |
Scraped book data in CSV format |
BW_SQL_Queries.sql |
SQL queries for analysis |
BW_EDA_Data_Visualization.ipynb |
Jupyter notebook for EDA & visualizations |
BW_Insights |
Presentation images summarizing insights |
Run BW_Web_scraping.ipynb to scrape book data and save it as Cleaned_Build_week_project1.csv.
- Create a database and table using
BW_SQL_Queries.sql. - Import
Cleaned_Build_week_project1.csvinto the database.
Execute queries in BW_SQL_Queries.sql to generate insights:
- Total books in stock
- Top 5 expensive books
- Average book rating
- Distribution of books by rating
- Books category by price
- Longest book title
Run BW_EDA_Data_Visualization.ipynb to:
- Analyze data distributions
- Generate visualizations:
- Bar chart for book ratings
- Histogram for price distribution
- Pie chart for stock availability
- Heat map for price and rating correlation
- Book Ratings: Significant proportion of 1-star reviews (226), indicating possible quality concerns.
- Price Distribution: Prices range between £10-£60, with distinct price categories.
- Stock Status: All books are in stock, suggesting either efficient inventory management or low demand.
- Price vs. Rating: A near-zero correlation (0.03) suggests no significant impact of price on ratings.
- GitHub Repository: [https://github.com/wajiha-khanam/Build_Week_Project]