DataAlchemy
Transforming messy data into meaningful business and research insights—turning raw datasets into clean, actionable analyses ready for real-world impact. This repository is a curated collection of data analysis projects, showcasing workflows from raw data to clean, analysis-ready datasets with business-relevant insights.
Tech Stack Python: pandas, numpy Excel / Google Sheets Jupyter Notebooks Data Visualization: matplotlib, seaborn, plotly, Power BI Version Control: Git & GitHub
| Project | Description | Raw Dataset | Cleaned Dataset | Key Insights |
|---|---|---|---|---|
| Inventory Data Cleaning | Standardized product data, resolved duplicates, normalized formats | messy_warehouse_data.csv |
cleaned_warehouse_data.csv |
Dataset ready for reporting, dashboarding, and predictive modeling |
| Upcoming Project | Short description | Link | Link | Summary of insights |
Inventory Data Cleaning: Before vs After 🔍 Initial Data Audit & Key Issues
- Text & Categorical Fields: Inconsistent naming, casing, and whitespace → unreliable grouping and pivot tables.
- Quantity Column: Mixed types, text numerics, missing values → numeric aggregation errors.
- Price Column: Missing values, text formatting → revenue calculations distorted.
- Date Column: Mixed formats and text dates → time-based analysis blocked.
- Duplicates: Repeated Product IDs → violates entity uniqueness, inflates counts.
Cleaning Approach & Actions
- Text Normalization: Trim whitespace, unify capitalization, standardize names → reliable grouping.
- Quantity Column: Convert to numeric, median imputation → aggregation-safe.
- Price Column: Numeric coercion, median imputation → accurate financial metrics.
- Date Column: Convert text to consistent date format → time-series analysis enabled.
- Duplicates: Detect and remove invalid duplicates → maintain referential integrity.
Final Dataset Improvements
- Numeric fields strictly numeric, no "NaN" text values
- Standardized categorical fields
- Dates fully normalized
- Missing data handled strategically
- Logical validation checks applied
- Structured as Excel Table for scalability Outcome: Dataset is now ready for EDA, dashboarding (Power BI/Tableau), SQL imports, and predictive modeling.
Skills Demonstrated
- Data auditing & quality assessment
- Data type coercion & validation
- Median-based imputation strategy
- Text normalization & categorical standardization
- Date normalization & validation logic
- Duplicate detection & entity integrity validation
- Analytical thinking applied to business data
Reflection: Data cleaning is analytical decision-making. Each transformation considered business meaning, statistical impact, and downstream effects to ensure accurate, actionable insights