📈 Data Automation Demo: Messy CSV to Clean SQL

This repository demonstrates the process of taking inconsistent, unstructured data from a client-provided CSV file and transforming it into clean, reliable SQL INSERT statements ready for direct database ingestion.

💡 Problem Solved

Clients often receive data from different sources (marketing tools, legacy systems, external vendors) with inconsistent formatting:

Mixed date formats (MM/DD/YY vs. DD-MM-YYYY vs. YYYY-MM-DD)
Non-standard email addresses (AT instead of @)
Unnecessary text prefixes in IDs (ID1234)
Inconsistent number formatting.

⚙️ The Solution (data_cleaner.py)

This Python script, using the pandas library, performs the following cleaning steps:

Date Standardization: Converts mixed date formats into a unified YYYY-MM-DD standard.
Email Standardization: Uses regex to replace non-standard characters (e.g., AT) with the standard @.
ID Normalization: Removes text prefixes (ID) and converts the field to a clean integer.
Price Cleaning: Ensures all price data is a reliable float (.00) format.

▶️ Output Example

The script's final output is a series of clean SQL statements ready to be executed:

INSERT INTO customer_orders (customer_name, customer_email, order_id, total_price, sale_date) VALUES ('ACME, Inc.', 'support@acme.com', '9876', '123.55', '2025-12-14');
INSERT INTO customer_orders (customer_name, customer_email, order_id, total_price, sale_date) VALUES ('Globex CORP', 'globex@corp.org', '5544', '321.00', '2025-10-12');
-- etc.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
data_cleaner.py		data_cleaner.py
messy_data.csv		messy_data.csv
requirements.txt		requirements.txt
test_it.txt		test_it.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📈 Data Automation Demo: Messy CSV to Clean SQL

💡 Problem Solved

⚙️ The Solution (data_cleaner.py)

▶️ Output Example

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📈 Data Automation Demo: Messy CSV to Clean SQL

💡 Problem Solved

⚙️ The Solution (data_cleaner.py)

▶️ Output Example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages