Skip to content

Sukrut10k/sql-data-warehouse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SQL Data Warehouse - Medallion Architecture

Enterprise data warehouse implementing Medallion Architecture (Bronze-Silver-Gold) to integrate multi-source ERP and CRM sales data with automated ETL pipelines.

Architecture

CSV Sources → Bronze (Raw) → Silver (Cleansed) → Gold (Star Schema)

Architecture

ETL Architecture

Tech Stack

Database: MySQL 8.0 | Orchestration: Apache Airflow 2.7 | Deployment: Docker

Key Features

  • Automated ETL pipelines with Apache Airflow orchestration
  • Dimensional star schema (fact & dimension tables)
  • Data quality validation and deduplication
  • Multi-source data integration (CRM + ERP)

Quick Start

# Clone and navigate
git clone <repo-url>
cd SQL-DATA-WAREHOUSE

# Set environment
echo "AIRFLOW_UID=50000" > .env
mkdir -p dags logs plugins

# Start services
docker-compose up -d

# Access Airflow UI
# URL: http://localhost:8080
# Login: admin / admin

# Trigger ETL pipeline in Airflow UI

Project Structure

├── datasets/          # Source CSV files (CRM & ERP)
├── scripts/           # SQL DDL & stored procedures
│   ├── bronze/        # Raw data layer
│   ├── silver/        # Cleansed data layer
│   └── gold/          # Business layer (star schema)
├── dags/              # Airflow DAG for ETL orchestration
├── tests/             # Data quality validation scripts
├── Dockerfile         # Airflow container configuration
└── docker-compose.yml # Multi-container orchestration

Data Pipeline

Bronze Layer: Raw data ingestion from CSV files
Silver Layer: Data cleansing, standardization, deduplication
Gold Layer: Dimensional model with dim_customers, dim_products, and fact_sales

Star Schema (Gold Layer)

Star Schema Design

Sample Query

SELECT 
    p.category,
    SUM(f.sales_amount) as total_sales
FROM gold_fact_sales f
JOIN gold_dim_products p ON f.product_key = p.product_key
GROUP BY p.category;

Quality Checks

  • Primary key uniqueness and null validation
  • Referential integrity between fact and dimensions
  • Data accuracy (sales = quantity × price)
  • Date range and consistency validations

License

MIT License

Built with MySQL, Apache Airflow, and Docker

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors