In retail data processing, transactional records are often stored in a "Tall" format (one row per transaction). To enable per-customer performance analysis, this data must be transformed into a "Wide" format.
Goal: Automate the transformation of multi-row customer transactions into a single, consolidated record using ADF Mapping Data Flows.
- Source: Delimited text files (CSV/TXT) stored in
inputcontainer of Azure Blob Storage. - Dataset: Parametrized DelimitedText dataset to handle incoming sales records.
- Pivot Logic: Grouped by
CustomerIDand pivoted on theProductkey. - Dynamic Aggregations: - Calculated
SUM(Quantity)with prefixQty_- Calculated
SUM(Amount)with prefixAmt_
- Calculated
- Type Casting: Converted string inputs to Integers within the data flow expression builder to ensure mathematical accuracy.
- Target: Optimized output stored as a single partitioned CSV in the
outputcontainer for direct ingestion by BI tools.
- ADF Version: V2
- Compute: Auto-resolve Integration Runtime (Central India)
- Optimization: Used a Single Partition setting in the Sink to ensure a consolidated output file for reporting compatibility.
- Security: Public network access restricted to specific Azure services.
From (Transactional):
| CustomerID | Product | Month | Quantity | Amount |
|---|---|---|---|---|
| 101 | Pen | Jan | 10 | 100 |
| 101 | Notebook | Jan | 5 | 250 |
To (Analytical):
| CustomerID | Qty_Notebook | Qty_Pen | Amt_Notebook | Amt_Pen |
|---|---|---|---|---|
| 101 | 5 | 10 | 250 | 100 |
Sharad Jadhav | LinkedIn
- Raw data is stored in Azure Storage Account from external sources.
- Requirement: Transform multiple rows β single row per customer using ADF Pivot Transformation.
- Store the transformed data into a separate target storage for reporting & analytics.
- Source β Raw data (CSV/TXT) stored in Azure Blob / Data Lake.
- Data Flow in ADF
- Pivot Transformation β Reshape rows into columns per
CustomerID. - Derived Column (optional) β Calculate totals or new metrics.
- Pivot Transformation β Reshape rows into columns per
- Sink β Final transformed dataset stored in output container.
sales_raw_data.txt
| CustomerID | Product | Month | Quantity | Amount |
|---|---|---|---|---|
| 101 | Pen | Jan | 10 | 100 |
| 101 | Notebook | Jan | 5 | 250 |
| 101 | Pencil | Feb | 20 | 200 |
| 101 | Pen | Feb | 15 | 150 |
| 102 | Pen | Jan | 8 | 80 |
| 102 | Notebook | Jan | 12 | 600 |
| 102 | Pencil | Feb | 10 | 100 |
| 102 | Pen | Mar | 20 | 200 |
| 103 | Notebook | Jan | 7 | 350 |
| 103 | Pencil | Feb | 15 | 150 |
| 103 | Pen | Feb | 12 | 120 |
| 103 | Notebook | Mar | 9 | 450 |
| 104 | Pen | Jan | 25 | 250 |
| 104 | Pencil | Jan | 30 | 300 |
| 104 | Notebook | Feb | 20 | 1000 |
| 104 | Pen | Mar | 10 | 100 |
| 105 | Notebook | Jan | 18 | 900 |
| 105 | Pen | Feb | 22 | 220 |
| 105 | Pencil | Mar | 12 | 120 |
| 105 | Notebook | Mar | 10 | 500 |
| CustomerID | Qty_Notebook | Qty_Pen | Qty_Pencil | Amt_Notebook | Amt_Pen | Amt_Pencil |
|---|---|---|---|---|---|---|
| 101 | 5 | 25 | 20 | 250 | 250 | 200 |
| 104 | 20 | 35 | 30 | 1000 | 350 | 300 |
| 102 | 12 | 28 | 10 | 600 | 280 | 100 |
| 103 | 16 | 12 | 15 | 800 | 120 | 150 |
| 105 | 28 | 22 | 12 | 1400 | 220 | 120 |
- Name:
sharadstorageaccount - Region: Asia Pacific (South India)
- Performance: Standard
- Redundancy: Locally Redundant Storage (LRS)
- Access Tier: Hot
- Networking: Public network access β Enabled
inputβ For raw data (sales_raw_data.txt)outputβ For transformed data
- Name:
SharadDataFactory1 - Region: Central India
- Version: V2
- Launch ADF Studio.
- Type: Azure Blob Storage β DelimitedText
- Linked Service: Connected to
sharadstorageaccount - File Path:
input/sales_raw_data.txt - Verified with Test Connection β
- Source β Dataset from step 4.
- Data Preview β Verified data ingestion.
Pivot Transformation
- Group By β
CustomerID - Pivot Key β
Product - Aggregations:
SUM(toInteger(Quantity))β prefixQty_SUM(toInteger(Amount))β prefixAmt_
Sink
- Dataset: Azure Blob Storage β DelimitedText
- Linked Service: Same as source
- File Path:
output/ - Optimized with Single Partition β single output file.
- Created new pipeline.
- Drag & Drop Data Flow into pipeline.
- Publish β Trigger Now.
- Monitor Tab β Verified successful pipeline execution.
- Final transformed file stored in
outputcontainer. - Verified correct pivoted structure.
- Storage Account creation
- Container Setup
- Azure Data Fctory Creation
- Dataset Configuration
4.Data Flow Creation
5.Pipeline Creation
- Trigeering
- Monetring
- Destination
- Azure Storage Account setup (Blob & Data Lake Gen2).
- Dataset creation and Linked Service in ADF.
- Pivot Transformation for reshaping data.
- End-to-end pipeline execution & monitoring.
- Data validation in Sink.
This project demonstrates how to use Azure Data Factory to transform raw transactional data into a customer-centric summary using pivot transformations.
It can be extended for real-world analytics pipelines such as sales aggregation, reporting dashboards, and data warehouse ETL.