ETL Pipeline β IBM Data Engineering Foundations
This project was developed as part of the IBM Data Engineering Foundations course. It demonstrates a complete ETL (Extract, Transform, Load) process in Python using World Bank GDP data.
π Project Overview
Extract: Load raw GDP data from a CSV file.
Transform:
Rename and normalize columns
Handle missing values
Convert currencies and standardize values
Aggregate GDP data by country
Load: Save the processed dataset into a SQLite database for structured storage and analysis.
π οΈ Technologies Used
Python 3
Jupyter Notebook
Pandas
NumPy
SQLite3
π Project Structure βββ ETL.ipynb # Jupyter Notebook containing the ETL pipeline βββ data.csv # Raw GDP dataset (source) βββ etl.db # SQLite database with cleaned data (output) βββ README.md # Project documentation
π How to Run
Clone the repository:
git clone https://github.com/alopezmoreira1989/IBM_ETL_Proyect.git cd IBM_ETL_Proyect
Install dependencies:
pip install pandas numpy
Open the notebook:
jupyter notebook ETL.ipynb
Run all cells to execute the ETL pipeline.
π Learning Outcomes
Built an end-to-end ETL pipeline with Python
Gained hands-on experience in data wrangling and cleaning
Learned how to store structured data in a relational database
Practiced data engineering best practices with real-world data
π¨βπ» Author
Created by Alejandro LΓ³pez Moreira during the IBM Data Engineering Foundations course.