-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathecommerce_analytics.yaml
More file actions
52 lines (47 loc) · 1.29 KB
/
ecommerce_analytics.yaml
File metadata and controls
52 lines (47 loc) · 1.29 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# E-commerce Orders Pipeline
# Load in Streamlit UI → YAML Editor, or trigger via AI Agent
#
# Uses the bundled demo dataset (ecommerce_orders.csv).
# Run `make demo-data` first to load demo data into containers.
pipeline:
name: ecommerce_analytics
description: >
E-commerce order analytics pipeline — extract orders data,
validate quality and completeness, detect price outliers,
fill missing numeric values with median, and save as Parquet.
steps:
- id: extract
service: extract_csv
params:
file_path: /app/data/ecommerce_demo/data.csv
- id: quality
service: data_quality
params:
rules:
min_rows: 10
check_null_ratio: true
threshold_null_ratio: 0.3
check_duplicates: true
check_completeness: true
depends_on: [extract]
- id: outliers
service: outlier_detection
params:
column: TotalAmount
z_threshold: 3.0
depends_on: [quality]
- id: clean
service: clean_nan
params:
strategy: fill_median
columns:
- Quantity
- UnitPrice
- Discount
- TotalAmount
depends_on: [outliers]
- id: save
service: load_data
params:
format: parquet
depends_on: [clean]