Skip to content

ShxradJadhav/Azure-Data-Factory-Transformation-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”„ Azure Data Factory: Dynamic Pivot Transformation Pipeline

Azure Storage

πŸ“– Business Scenario

In retail data processing, transactional records are often stored in a "Tall" format (one row per transaction). To enable per-customer performance analysis, this data must be transformed into a "Wide" format.

Goal: Automate the transformation of multi-row customer transactions into a single, consolidated record using ADF Mapping Data Flows.


πŸ— ETL Logic & Transformation

Ingestion

  • Source: Delimited text files (CSV/TXT) stored in input container of Azure Blob Storage.
  • Dataset: Parametrized DelimitedText dataset to handle incoming sales records.

Transformation (Mapping Data Flow)

  1. Pivot Logic: Grouped by CustomerID and pivoted on the Product key.
  2. Dynamic Aggregations: - Calculated SUM(Quantity) with prefix Qty_
    • Calculated SUM(Amount) with prefix Amt_
  3. Type Casting: Converted string inputs to Integers within the data flow expression builder to ensure mathematical accuracy.

Loading (Sink)

  • Target: Optimized output stored as a single partitioned CSV in the output container for direct ingestion by BI tools.

πŸ›  Technical Implementation Details

  • ADF Version: V2
  • Compute: Auto-resolve Integration Runtime (Central India)
  • Optimization: Used a Single Partition setting in the Sink to ensure a consolidated output file for reporting compatibility.
  • Security: Public network access restricted to specific Azure services.

πŸ“Š Sample Data Transformation

From (Transactional):

CustomerID Product Month Quantity Amount
101 Pen Jan 10 100
101 Notebook Jan 5 250

To (Analytical):

CustomerID Qty_Notebook Qty_Pen Amt_Notebook Amt_Pen
101 5 10 250 100

πŸ‘¨β€πŸ’» Project by

Sharad Jadhav | LinkedIn


Azure Data Factory Transformation Project πŸš€

πŸ“Œ Business Use Case

  • Raw data is stored in Azure Storage Account from external sources.
  • Requirement: Transform multiple rows β†’ single row per customer using ADF Pivot Transformation.
  • Store the transformed data into a separate target storage for reporting & analytics.

βš™οΈ Architecture

  1. Source β†’ Raw data (CSV/TXT) stored in Azure Blob / Data Lake.
  2. Data Flow in ADF
    • Pivot Transformation β†’ Reshape rows into columns per CustomerID.
    • Derived Column (optional) β†’ Calculate totals or new metrics.
  3. Sink β†’ Final transformed dataset stored in output container.

πŸ“‚ Input File

sales_raw_data.txt

CustomerID Product Month Quantity Amount
101 Pen Jan 10 100
101 Notebook Jan 5 250
101 Pencil Feb 20 200
101 Pen Feb 15 150
102 Pen Jan 8 80
102 Notebook Jan 12 600
102 Pencil Feb 10 100
102 Pen Mar 20 200
103 Notebook Jan 7 350
103 Pencil Feb 15 150
103 Pen Feb 12 120
103 Notebook Mar 9 450
104 Pen Jan 25 250
104 Pencil Jan 30 300
104 Notebook Feb 20 1000
104 Pen Mar 10 100
105 Notebook Jan 18 900
105 Pen Feb 22 220
105 Pencil Mar 12 120
105 Notebook Mar 10 500

βœ… Expected Output File

CustomerID Qty_Notebook Qty_Pen Qty_Pencil Amt_Notebook Amt_Pen Amt_Pencil
101 5 25 20 250 250 200
104 20 35 30 1000 350 300
102 12 28 10 600 280 100
103 16 12 15 800 120 150
105 28 22 12 1400 220 120

πŸ›  Step-by-Step Implementation

πŸ”Ή 1. Create Storage Account

  • Name: sharadstorageaccount
  • Region: Asia Pacific (South India)
  • Performance: Standard
  • Redundancy: Locally Redundant Storage (LRS)
  • Access Tier: Hot
  • Networking: Public network access β†’ Enabled

πŸ”Ή 2. Create Containers

  • input β†’ For raw data (sales_raw_data.txt)
  • output β†’ For transformed data

πŸ”Ή 3. Setup Azure Data Factory

  • Name: SharadDataFactory1
  • Region: Central India
  • Version: V2
  • Launch ADF Studio.

πŸ”Ή 4. Create Dataset (Source)

  • Type: Azure Blob Storage β†’ DelimitedText
  • Linked Service: Connected to sharadstorageaccount
  • File Path: input/sales_raw_data.txt
  • Verified with Test Connection βœ…

πŸ”Ή 5. Build Data Flow

  • Source β†’ Dataset from step 4.
  • Data Preview β†’ Verified data ingestion.

Pivot Transformation

  • Group By β†’ CustomerID
  • Pivot Key β†’ Product
  • Aggregations:
    • SUM(toInteger(Quantity)) β†’ prefix Qty_
    • SUM(toInteger(Amount)) β†’ prefix Amt_

Sink

  • Dataset: Azure Blob Storage β†’ DelimitedText
  • Linked Service: Same as source
  • File Path: output/
  • Optimized with Single Partition β†’ single output file.

πŸ”Ή 6. Pipeline Setup

  • Created new pipeline.
  • Drag & Drop Data Flow into pipeline.
  • Publish β†’ Trigger Now.
  • Monitor Tab β†’ Verified successful pipeline execution.

πŸ”Ή 7. Validate Output

  • Final transformed file stored in output container.
  • Verified correct pivoted structure.

πŸ“Έ Screenshots

  1. Storage Account creation
1img 2img 3img 4img
  1. Container Setup
5img 6img
  1. Azure Data Fctory Creation
7img 8img 9img 10img 11img 12img
  1. Dataset Configuration
13img 14img 15img 16img 17img 18img 19img 20 21img 22img 23img

4.Data Flow Creation

24img 25img 26img 27img 28img 29img 30img 31img 32img 33img 34img 35img

5.Pipeline Creation

36img 37img
  1. Trigeering
38img 39img
  1. Monetring
40img 41img
  1. Destination
42img 43img

🎯 Key Learnings

  • Azure Storage Account setup (Blob & Data Lake Gen2).
  • Dataset creation and Linked Service in ADF.
  • Pivot Transformation for reshaping data.
  • End-to-end pipeline execution & monitoring.
  • Data validation in Sink.

🏁 Conclusion

This project demonstrates how to use Azure Data Factory to transform raw transactional data into a customer-centric summary using pivot transformations.
It can be extended for real-world analytics pipelines such as sales aggregation, reporting dashboards, and data warehouse ETL.


About

End-to-end Azure Data Factory project transforming raw sales data into customer-level insights using pivot transformation and storing results in Blob Storage.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors