Skip to content

Latest commit

 

History

History
97 lines (60 loc) · 3.86 KB

File metadata and controls

97 lines (60 loc) · 3.86 KB

Databricks Bootcamp 2026

Welcome to the Databricks Data Lakehouse Project by Data With Baraa.

This repository contains a complete, real-world Data Lakehouse implementation built on Databricks, including datasets, notebooks, SQL examples, and exercises. Everything here is designed to help you understand how modern data teams use Databricks in practice, from data ingestion and transformation to analytics-ready data products.


⚠️ Important Note

Build this project on your own first using the Notion roadmap.
Use this repository only as a reference if you get stuck.

Before starting, watch the Databricks Bootcamp, where I explain the architecture and decisions behind this project.


🏗️ Architecture

This project follows the Medallion Architecture:

🥉 Bronze Layer

  • Raw data ingestion
  • Schema inference and storage as Delta tables

🥈 Silver Layer

  • Data cleaning and standardization
  • Type casting and validation

🥇 Gold Layer

  • Dimensional Data Model (Business Transformation)
  • Ready for BI and analysis

🛠️ Technologies Used

  • Databricks
  • Apache Spark
  • PySpark
  • Spark SQL
  • Delta Lake
  • Unity Catalog

Prerequisites

  • Basic SQL, Python and some Pyspark knowledge
  • No prior Databricks experience required

☕ Stay Connected

🌍 Connect With Me

YouTube LinkedIn Website Newsletter


🎓 Courses (Structured & Certified)


▶️ Free YouTube Courses


🛡️ License

This project is licensed under the MIT License. You are free to use, modify, and share this project with proper attribution.

🌟 About Me

Hi, I’m Baraa Khatib Salkini, also known as Data With Baraa. I’m a senior data professional and educator with over 17 years of industry experience, working across data engineering, analytics, and modern data platforms. I’ve led large-scale data projects in real companies and now focus on teaching practical, real-world data skills through my courses, YouTube content, and bootcamps. My goal is simple: help you understand how data actually works in real systems, not just how to write code.