🚀 AdventureWorks: End-to-End Azure Data Engineering Project

📌 Project Overview

This project implements a Medallion Architecture (Bronze, Silver, Gold) to transform raw transactional data into a cloud-based Logical Data Lakehouse. The solution automates the ingestion, cleaning, and aggregation of AdventureWorks sales data to provide actionable business intelligence.

🏗 Architecture Diagram

The data flow follows these stages:

Ingestion (Bronze): Raw CSV files are moved from source to ADLS Gen2 via Azure Data Factory.
Transformation (Silver): Data is cleaned, standardized, and converted to Parquet format using PySpark in Databricks.
Serving (Gold): Business-level aggregates are created in Azure Synapse Analytics via Serverless SQL Pools for reporting.

🛠 Tech Stack

Orchestration: Azure Data Factory (ADF)
Data Lake: Azure Data Lake Storage (ADLS) Gen2
Compute: Azure Databricks (Spark 3.x)
Data Warehouse: Azure Synapse Analytics (Serverless SQL)
Visualization: Power BI
Security: Azure Key Vault & Managed Identities (SMI)

✅ Key Solutions Provided

1. Schema Drift & Evolution

Implemented recursive reads in PySpark to handle varying sales data schemas from 2015-2017. Used mergeSchema options to ensure consistent Dataframe writes.

2. Performance Optimization

Converted heavy CSV files into optimized Snappy-compressed Parquet files in the Silver layer. This reduced storage footprint and boosted query performance by ~10x in the Gold layer.

3. "Secret-less" Architecture

Configured Service Principals and Azure Key Vault for secure authentication between Databricks and ADLS, eliminating the need for hard-coded access keys.

📂 Repository Structure

/pipelines/: ADF JSON exports for ingestion logic.
/notebooks/: PySpark notebooks for Silver & Gold transformations.
/sql/: Synapse Serverless SQL scripts for views and CTAS.
/docs/: Architecture diagrams and data dictionary.

📈 Final Result

The final pipeline serves a refined Gold layer accessible via Synapse Serverless SQL, enabling real-time Power BI dashboarding with zero infrastructure management.

👨‍💻 Author

Sharad Jadhav Data Engineer | Azure Specialist LinkedIn | Portfolio

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Data		Data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 AdventureWorks: End-to-End Azure Data Engineering Project

📌 Project Overview

🏗 Architecture Diagram

🛠 Tech Stack

✅ Key Solutions Provided

1. Schema Drift & Evolution

2. Performance Optimization

3. "Secret-less" Architecture

📂 Repository Structure

📈 Final Result

👨‍💻 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

🚀 AdventureWorks: End-to-End Azure Data Engineering Project

📌 Project Overview

🏗 Architecture Diagram

🛠 Tech Stack

✅ Key Solutions Provided

1. Schema Drift & Evolution

2. Performance Optimization

3. "Secret-less" Architecture

📂 Repository Structure

📈 Final Result

👨‍💻 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages