Skip to content

sophiachann/WebScrapingProject-EcommercesAnalysis

Repository files navigation

Web Scraping Ecommerce sites using Selenium & Python

This repository contains all the code to extract and analyse product data from 2 largest E-commerce Platforms in Hong Kong and US - HKTVmall & Amazon, for explorations of potential business opportunites.

Business Value of this Project

  • Identifying new Business Opportunities
    • Are there in-demand products that are under-supplied? Are there new audiences to target?
  • Understanding Customer Needs
    • What do our customers want? How can we better cater to their needs?
  • Determining Growth Factors
    • What are the biggest drivers of e-commerce sales?

Project Overview

  • Web scraped over 50,000 skincare products from HKTVmall and Amazon and preprocessed the data for analysis
  • Constructed data frames and visualisations which identified gaps in the market and analysed consumer spending habits such as price elasticity of demand and effective types of promotions
  • Made appropriate business recommendations according to the data-driven market insights

What is in this repo?

  • Notebooks with 01-04 are the complete steps - from web-scraping, data preprocessing, merging datasets to visualization.
  • dataframes contains csv files that store the cleaned product data.
  • img contains illustrations for this README.md
  • EcommerceAnalysis.pdf is a powerpoint which illustrates the complete framework of this project.

Data Collection & Preprocessing

We collected the product data from HKTVmall and Amazon, through webscraping with Selenium and Beautiful Soup. API of HKTVmall is not for public use, it would be even harder to get complete data.

We executed some common preprocessing steps, and engineered new features at a later point to improve our analytical accuracy.

Key Findings

Business Recommendations

About

Web Scraping HKTVmall & Amazon using Selenium & Python (Skincare Market Analysis)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors