This project focuses on cleaning and analyzing sales data of mobile phones from different regions.
Main tasks include:
- Filling missing values in
Quantity,Price, andOrder ID. - Converting
Dateto the standardYYYY-MM-DDformat. - Checking for duplicate
Order IDs. - Calculating
Total Salesfor each order. - Summarizing sales by region.
- Load data from
data/sales_data.csv. - Check for duplicate Order IDs and print warnings.
- Handle missing values:
- Fill
QuantityandPricewith their column averages. - Drop rows missing
Order ID.
- Fill
- Format
DatetoYYYY-MM-DD. - Compute
Total Sales=Quantity * Price. - Group and sum sales by region.
- Save the cleaned data to the
output/folder.
- Duplicate
Order IDs are only printed, not removed (can be enabled in code). - NumPy is used for basic validation tasks.
- Python 3.x
- pandas
- numpy
Install dependencies with:
pip install -r requirements.txt
Kavin Kishore
B.Tech Student, DTU
Built as a real-time data engineering mini project.