Skip to content

alopezmoreira1989/IBM_ETL_Proyect

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 

Repository files navigation

ETL Pipeline – IBM Data Engineering Foundations

This project was developed as part of the IBM Data Engineering Foundations course. It demonstrates a complete ETL (Extract, Transform, Load) process in Python using World Bank GDP data.

πŸ“Œ Project Overview

Extract: Load raw GDP data from a CSV file.

Transform:

Rename and normalize columns

Handle missing values

Convert currencies and standardize values

Aggregate GDP data by country

Load: Save the processed dataset into a SQLite database for structured storage and analysis.

πŸ› οΈ Technologies Used

Python 3

Jupyter Notebook

Pandas

NumPy

SQLite3

πŸ“‚ Project Structure β”œβ”€β”€ ETL.ipynb # Jupyter Notebook containing the ETL pipeline β”œβ”€β”€ data.csv # Raw GDP dataset (source) β”œβ”€β”€ etl.db # SQLite database with cleaned data (output) └── README.md # Project documentation

πŸš€ How to Run

Clone the repository:

git clone https://github.com/alopezmoreira1989/IBM_ETL_Proyect.git cd IBM_ETL_Proyect

Install dependencies:

pip install pandas numpy

Open the notebook:

jupyter notebook ETL.ipynb

Run all cells to execute the ETL pipeline.

πŸ“– Learning Outcomes

Built an end-to-end ETL pipeline with Python

Gained hands-on experience in data wrangling and cleaning

Learned how to store structured data in a relational database

Practiced data engineering best practices with real-world data

πŸ‘¨β€πŸ’» Author

Created by Alejandro LΓ³pez Moreira during the IBM Data Engineering Foundations course.

About

This project was developed as part of the IBM Data Engineering Foundations course. It demonstrates a complete ETL (Extract, Transform, Load) process in Python using World Bank GDP data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors