📊 Data Analysis Project: E-Commerce Public Dataset 💻

Name: Namira Ra’ufa Dayyana
Email: naufayaa@gmail.com
ID Dicoding: naufaya

🚀 Business Problem

In this project, I explore the E-Commerce Public Dataset to address the following questions:

Which customer segments contribute the most to the company’s revenue?
Which products or product categories generate the most revenue?

📦 Data Wrangling

Data wrangling is an essential part of any analysis, and here’s how I prepared the data for this project:

A. Load Data

I loaded the following datasets:

customers_dataset.csv
geolocation_dataset.csv
order_items_dataset.csv
order_payments_dataset.csv
order_reviews_dataset.csv
orders_dataset.csv
product_category_name_translation.csv
products_dataset.csv
sellers_dataset.csv

B. Merge and Clean Data

Merged order-related datasets (order_items, order_payments, order_reviews) for a complete order overview.
Combined product data with product categories to get better insights.
Dropped irrelevant columns to focus on the analysis (like timestamps, unnecessary product details, etc.).

🔍 Exploratory Data Analysis (EDA)

I performed a thorough analysis to understand the data’s structure and identify potential issues:

A. Data Info

Investigated the data types, missing values, and duplicates.
Dropped rows with missing or duplicate data to ensure consistency.

B. Outliers Detection

Outliers were detected using the Interquartile Range (IQR) method, which helped in identifying extreme data points that might affect the analysis.

Some outliers were retained as they may indicate valuable business insights, such as premium customers or one-off large transactions.

📈 Visualization & Explanatory Analysis

With the cleaned data, I created several visualizations to answer the business questions:

Customer Segments Contributing to Revenue

I identified the top 10 customers by revenue:

Insight: A small number of customers contribute to a significant portion of revenue. The top 2 customers alone contribute approximately 30% of the total revenue!

Best-Selling Product Categories

By grouping the data by product categories, I found that:

Top categories such as bed_bath_table and health_beauty generate the most revenue.
Low-performing categories like auto and garden_tools are generating under $900K.

Revenue by City

The analysis revealed that Sao Paulo leads in sales with $2.7M in revenue, followed by Rio de Janeiro with $1.5M.

💡 RFM Analysis

RFM (Recency, Frequency, Monetary) analysis is a powerful technique for understanding customer behavior. In this project, I performed the following steps:

Recency: Calculated how many days have passed since a customer’s last purchase.
Frequency: Counted the number of orders each customer made.
Monetary: Measured how much each customer has spent.

The RFM Scores helped me segment customers into groups based on their buying behavior, enabling targeted marketing strategies.

🔨 Tools & Libraries

This project was built using Python and the following libraries:

pandas
numpy
matplotlib
seaborn

💬 Conclusions & Insights

The analysis provided deep insights into the behavior of customers and product performance.
Strategies for increasing revenue could focus on retaining high-value customers, expanding successful product categories, and targeting specific cities with tailored marketing efforts.

📂 Project Files

The project includes the following files:

Dataset: CSV files used for analysis.
Jupyter/Colab Notebook: The code used to perform the analysis.
Streamlit Dashboard: Interactive dashboard showcasing key insights.
requirements.txt: List of dependencies required to run the project.
README.md: Documentation for the project.

🚀 How to Run the Streamlit Dashboard

To run the Streamlit dashboard, follow these steps:

Clone or Download the Project
- Clone the repository to your local machine or download the ZIP file.
Install Dependencies
- Make sure you have Python 3.8 or higher installed.
- Install the necessary dependencies listed in the requirements.txt file:
```
pipenv install
pipenv shell
pip install -r requirements.txt
```
Run the Streamlit Dashboard
- In your terminal, navigate to the project directory and run the following command:
```
streamlit run dashboard/dashboard.py
```
Access the Dashboard
- After running the command, your browser should automatically open the Streamlit dashboard. If it doesn't, you can manually go to http://localhost:8501 in your web browser.

💡 Note:

The dashboard includes interactive elements to visualize the key insights, such as customer segments contributing to revenue and the best-selling product categories.

Thank you for checking out my project! I hope you found the analysis insightful. Feel free to reach out for any questions or collaborations. 🚀

🔗 Find me online
LinkedIn: Namira Ra'ufa Dayyana
Portfolio: Namira's Data Science Portfolio

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📊 Data Analysis Project: E-Commerce Public Dataset 💻

🚀 Business Problem

📦 Data Wrangling

🔍 Exploratory Data Analysis (EDA)

📈 Visualization & Explanatory Analysis

💡 RFM Analysis

🔨 Tools & Libraries

💬 Conclusions & Insights

📂 Project Files

🚀 How to Run the Streamlit Dashboard

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Dashboard		Dashboard
E-Commerce Public Dataset		E-Commerce Public Dataset
.gitattributes		.gitattributes
README.md		README.md
notebook.ipynb		notebook.ipynb
requirements.txt		requirements.txt
url.txt		url.txt

Folders and files

Latest commit

History

Repository files navigation

📊 Data Analysis Project: E-Commerce Public Dataset 💻

🚀 Business Problem

📦 Data Wrangling

🔍 Exploratory Data Analysis (EDA)

📈 Visualization & Explanatory Analysis

💡 RFM Analysis

🔨 Tools & Libraries

💬 Conclusions & Insights

📂 Project Files

🚀 How to Run the Streamlit Dashboard

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages