Skip to content

shivamkumar-ds/ecommerce_sales_data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Amazon Sales Data Analysis – Google Capstone Project

Project Overview

This project was completed as part of the Google Data Analytics Professional Certificate Capstone.

For this capstone, dataset selection and end-to-end analysis design were done independently.

The objective was to analyze an Amazon-style e-commerce dataset and generate structured business insights using tools learned during the course.


Dataset

The dataset contains 100,000 synthetic e-commerce transactions including order details, customer information, pricing, discounts, taxes, shipping costs, and order status.

Note: The dataset is synthetically generated but structured to resemble real-world e-commerce transactions.


Google Sheets Dashboard

Interactive Google Sheets Dashboard:
https://docs.google.com/spreadsheets/d/1SCZmVAkjSMPFf8gaN37kjMjwelTbC08lk5Fl7mdi9vU/edit?usp=sharing

The dashboard includes:

  • KPI Summary Cards
  • Monthly Revenue Trend
  • Category-wise Revenue Distribution
  • Country-wise Sales
  • Brand-wise Contribution
  • Order Status Breakdown

Tools & Technologies Used

Google Sheets

  • Data cleaning
  • Pivot tables
  • KPI calculations
  • Static dashboard creation

Google Cloud BigQuery

  • SQL-based data cleaning
  • Aggregation queries
  • KPI generation
  • Revenue and operational analysis

Python (Pandas, Matplotlib, Seaborn)

  • Statistical analysis
  • Revenue distribution analysis
  • Correlation analysis
  • Discount vs Revenue evaluation
  • Data validation

Project Structure

amazon-sales-capstone/
│
├── sql/
│   └── amazon_analysis.sql
│
├── python/
│   └── python.ipynb
│
├── dashboard/
│   └── ecommerce_sales_dashboard.png
│
└── README.md

Key Business Metrics Identified

Gross Metrics

  • Total Orders
  • Total Revenue
  • Total Quantity Sold
  • Average Order Value (AOV)

Net Metrics (Delivered Orders)

  • Total Delivered Orders
  • Total Delivered Revenue

Risk Metrics

  • Cancellation Rate
  • Return Rate

Key Insights

  • Revenue distribution is right-skewed, indicating most orders fall within mid-value range.
  • Weak negative correlation between Discount and Total Amount (~ -0.10).
  • Unit Price and Quantity are primary drivers of Total Revenue.
  • Revenue distribution across categories is relatively balanced.
  • Order status analysis provides insight into operational efficiency.

Learning Outcomes

Through this project, I gained practical experience in:

  • Designing a complete analytics workflow
  • Cleaning and transforming data using SQL
  • Executing structured queries in BigQuery
  • Performing statistical analysis using Python
  • Building KPI-driven dashboards
  • Interpreting business-focused insights from structured data

This project strengthened both technical proficiency and analytical thinking.


Reproducibility

To replicate this project:

  1. Download dataset from the Kaggle link above.
  2. Upload the dataset to Google BigQuery.
  3. Execute queries from the /sql folder.
  4. Run the Python notebook for analysis.
  5. Use aggregated outputs to build the dashboard.

Author

Shivam Kumar

Live Preview

Google Sheets dashboard:
https://docs.google.com/spreadsheets/d/1SCZmVAkjSMPFf8gaN37kjMjwelTbC08lk5Fl7mdi9vU/edit?gid=571667289#gid=571667289

About

Ecommerce Sales Analytics using SQL, Python & Google Sheets dashboards (Google Capstone Project)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors