Welcome to my portfolio!
These projects highlight my work across data engineering, cloud deployment, visualization, and API development — all built around a shared PostgreSQL database.
Interactive climate, argicultural and economic trends explorer.
This project creates the PostgreSQL database used by the following two related projects. The database is hosted on AWS RDS and accessed by a public Streamlit app deployed on Google Cloud Run.
Visual narrative of climate, agricultural and economic patterns using curated datasets.
This project uses the same underlying data, exported into five CSV files for use in Tableau. (Direct PostgreSQL connections in Tableau require a paid subscription.) These CSVs serve as the data sources for the public Tableau story.
Lightweight API exposing climate, agricultural and economic data.
This project uses the shared PostgreSQL database to power a Flask-based REST API. The service is deployed publicly on Heroku.
PySpark ETL Pipeline: ingest, validate, transform multi-region dataset
A medallion-style Bronze → Silver → Gold pipeline built on Databricks, enriching a 10-region YouTube Trending Videos dataset (Kaggle) with engagement metrics and regional analysis.
graph TD
A[Raw CSVs & JSON] --> B[Bronze: Ingest & Validate]
B --> C[Silver: Clean & Deduplicate]
C --> D[Gold: Enrich & Engineer]
D --> E[(youtube_gold)]
If you'd like to learn more about these projects or discuss my work, feel free to reach out through my GitHub profile.




