This project was completed as part of the Google Data Analytics Professional Certificate Capstone.
For this capstone, dataset selection and end-to-end analysis design were done independently.
The objective was to analyze an Amazon-style e-commerce dataset and generate structured business insights using tools learned during the course.
- Dataset Name: Amazon Sales Dataset
- Source: Kaggle
- Link: https://www.kaggle.com/datasets/rohiteng/amazon-sales-dataset
The dataset contains 100,000 synthetic e-commerce transactions including order details, customer information, pricing, discounts, taxes, shipping costs, and order status.
Note: The dataset is synthetically generated but structured to resemble real-world e-commerce transactions.
Interactive Google Sheets Dashboard:
https://docs.google.com/spreadsheets/d/1SCZmVAkjSMPFf8gaN37kjMjwelTbC08lk5Fl7mdi9vU/edit?usp=sharing
The dashboard includes:
- KPI Summary Cards
- Monthly Revenue Trend
- Category-wise Revenue Distribution
- Country-wise Sales
- Brand-wise Contribution
- Order Status Breakdown
- Data cleaning
- Pivot tables
- KPI calculations
- Static dashboard creation
- SQL-based data cleaning
- Aggregation queries
- KPI generation
- Revenue and operational analysis
- Statistical analysis
- Revenue distribution analysis
- Correlation analysis
- Discount vs Revenue evaluation
- Data validation
amazon-sales-capstone/
│
├── sql/
│ └── amazon_analysis.sql
│
├── python/
│ └── python.ipynb
│
├── dashboard/
│ └── ecommerce_sales_dashboard.png
│
└── README.md
- Total Orders
- Total Revenue
- Total Quantity Sold
- Average Order Value (AOV)
- Total Delivered Orders
- Total Delivered Revenue
- Cancellation Rate
- Return Rate
- Revenue distribution is right-skewed, indicating most orders fall within mid-value range.
- Weak negative correlation between Discount and Total Amount (~ -0.10).
- Unit Price and Quantity are primary drivers of Total Revenue.
- Revenue distribution across categories is relatively balanced.
- Order status analysis provides insight into operational efficiency.
Through this project, I gained practical experience in:
- Designing a complete analytics workflow
- Cleaning and transforming data using SQL
- Executing structured queries in BigQuery
- Performing statistical analysis using Python
- Building KPI-driven dashboards
- Interpreting business-focused insights from structured data
This project strengthened both technical proficiency and analytical thinking.
To replicate this project:
- Download dataset from the Kaggle link above.
- Upload the dataset to Google BigQuery.
- Execute queries from the
/sqlfolder. - Run the Python notebook for analysis.
- Use aggregated outputs to build the dashboard.
Shivam Kumar
Google Sheets dashboard:
https://docs.google.com/spreadsheets/d/1SCZmVAkjSMPFf8gaN37kjMjwelTbC08lk5Fl7mdi9vU/edit?gid=571667289#gid=571667289