Enterprise data warehouse implementing Medallion Architecture (Bronze-Silver-Gold) to integrate multi-source ERP and CRM sales data with automated ETL pipelines.
CSV Sources → Bronze (Raw) → Silver (Cleansed) → Gold (Star Schema)
Database: MySQL 8.0 | Orchestration: Apache Airflow 2.7 | Deployment: Docker
- Automated ETL pipelines with Apache Airflow orchestration
- Dimensional star schema (fact & dimension tables)
- Data quality validation and deduplication
- Multi-source data integration (CRM + ERP)
# Clone and navigate
git clone <repo-url>
cd SQL-DATA-WAREHOUSE
# Set environment
echo "AIRFLOW_UID=50000" > .env
mkdir -p dags logs plugins
# Start services
docker-compose up -d
# Access Airflow UI
# URL: http://localhost:8080
# Login: admin / admin
# Trigger ETL pipeline in Airflow UI├── datasets/ # Source CSV files (CRM & ERP)
├── scripts/ # SQL DDL & stored procedures
│ ├── bronze/ # Raw data layer
│ ├── silver/ # Cleansed data layer
│ └── gold/ # Business layer (star schema)
├── dags/ # Airflow DAG for ETL orchestration
├── tests/ # Data quality validation scripts
├── Dockerfile # Airflow container configuration
└── docker-compose.yml # Multi-container orchestration
Bronze Layer: Raw data ingestion from CSV files
Silver Layer: Data cleansing, standardization, deduplication
Gold Layer: Dimensional model with dim_customers, dim_products, and fact_sales
SELECT
p.category,
SUM(f.sales_amount) as total_sales
FROM gold_fact_sales f
JOIN gold_dim_products p ON f.product_key = p.product_key
GROUP BY p.category;- Primary key uniqueness and null validation
- Referential integrity between fact and dimensions
- Data accuracy (sales = quantity × price)
- Date range and consistency validations
MIT License
Built with MySQL, Apache Airflow, and Docker

