Skip to content

Commit 887aa88

Browse files
committed
feat(monitoring): add Grafana provisioning, cAdvisor, Airflow admin user fix, and technical docs
- docker-compose.yml: add cAdvisor service (port 8088), Grafana provisioning volume mounts, Prometheus 15d retention + lifecycle API + healthchecks - prometheus/prometheus.yml: add self-scrape, cAdvisor scrape job - prometheus/grafana/: auto-provision Prometheus datasource (uid: prometheus-etl) and ETL overview dashboard (uid: etl-monitoring-v1, 15 panels) with correct PromQL metric names ({__name__=~'.*_requests_total', job=~'.+-service'}) - airflow/Dockerfile: idempotent admin user creation in CMD (fixes cold-start where no users existed) - docs/access-credentials.md: central reference for all service URLs, credentials, env vars, security checklist - docs/architecture.md: technical architecture guide — Arrow IPC rationale, 4 communication patterns, Preparator SDK design, Kahn topological layering, parallel ThreadPoolExecutor, Gunicorn multi-worker implications, observability, XCom file-based pattern, security design, trade-offs
1 parent ff1f5d0 commit 887aa88

8 files changed

Lines changed: 1390 additions & 1 deletion

File tree

airflow/Dockerfile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,4 +14,5 @@ RUN mkdir -p /opt/airflow/dags /opt/airflow/logs /opt/airflow/plugins
1414
ENV PYTHONPATH="/opt/airflow/"
1515

1616
# CMD per avvio servizi (db migrate ensures schema exists on first boot)
17-
CMD ["bash", "-c", "airflow db migrate && (airflow webserver & airflow scheduler)"]
17+
# Creates admin user on first boot only (idempotent: skips if already exists)
18+
CMD ["bash", "-c", "airflow db migrate && (airflow users list | grep -q admin || airflow users create --username admin --password admin --firstname Admin --lastname ETL --role Admin --email admin@etl.local) && (airflow webserver & airflow scheduler)"]

docker-compose.yml

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,17 @@ services:
6565
networks:
6666
- etl-network
6767
restart: always
68+
healthcheck:
69+
test: ["CMD-SHELL", "wget -qO- http://localhost:9090/-/healthy || exit 1"]
70+
interval: 30s
71+
timeout: 10s
72+
retries: 3
73+
start_period: 15s
74+
command:
75+
- --config.file=/etc/prometheus/prometheus.yml
76+
- --storage.tsdb.path=/etc/prometheus/data
77+
- --storage.tsdb.retention.time=15d
78+
- --web.enable-lifecycle
6879

6980
grafana:
7081
image: grafana/grafana:latest
@@ -74,11 +85,43 @@ services:
7485
- GF_SECURITY_ADMIN_PASSWORD=${GF_SECURITY_ADMIN_PASSWORD:-change-me-strong-password}
7586
volumes:
7687
- etl-grafana-data:/var/lib/grafana
88+
- ./prometheus/grafana/provisioning:/etc/grafana/provisioning:ro
89+
- ./prometheus/grafana/dashboards:/etc/grafana/provisioned-dashboards:ro
7790
ports:
7891
- "3000:3000"
7992
networks:
8093
- etl-network
94+
depends_on:
95+
- prometheus
8196
restart: always
97+
healthcheck:
98+
test: ["CMD-SHELL", "wget -qO- http://localhost:3000/api/health || exit 1"]
99+
interval: 30s
100+
timeout: 10s
101+
retries: 3
102+
start_period: 30s
103+
104+
cadvisor:
105+
image: gcr.io/cadvisor/cadvisor:latest
106+
container_name: cadvisor
107+
privileged: true
108+
devices:
109+
- /dev/kmsg:/dev/kmsg
110+
volumes:
111+
- /:/rootfs:ro
112+
- /var/run:/var/run:ro
113+
- /sys:/sys:ro
114+
- /var/lib/docker/:/var/lib/docker:ro
115+
- /cgroup:/cgroup:ro
116+
ports:
117+
- "8088:8080"
118+
networks:
119+
- etl-network
120+
command:
121+
- --docker_only=true
122+
- --housekeeping_interval=15s
123+
- --disable_metrics=advtcp,cpu_topology,cpuset,hugetlb,memory_numa,process,referenced_memory,resctrl,sched,tcp,udp
124+
restart: unless-stopped
82125

83126
# Microservices
84127
extract-csv-service:

docs/access-credentials.md

Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
# Access & Credentials — ETL Microservices Platform
2+
3+
> **Security note:** I valori mostrati qui sono i **default di sviluppo** definiti in `.env.example`.
4+
> In produzione, sovrascrivili tutti nel file `.env` (mai committare `.env` su Git).
5+
6+
---
7+
8+
## Quick Reference — Service URLs
9+
10+
| Servizio | URL | Credenziali |
11+
|---|---|---|
12+
| **Grafana** (dashboard monitoraggio) | http://localhost:3000 | `admin` / `change-me-strong-password` |
13+
| **Prometheus** (metriche raw) | http://localhost:9090 | — nessuna auth — |
14+
| **Airflow** (orchestrazione DAG) | http://localhost:8080 | `admin` / `admin` (creato al primo avvio) |
15+
| **Streamlit UI** (AI pipeline builder) | http://localhost:8501 | — nessuna auth — |
16+
| **cAdvisor** (container metrics) | http://localhost:8088 | — nessuna auth — |
17+
| **PostgreSQL** (DB interno Airflow) | `localhost:5432` | `airflow` / `change-me-strong-password` |
18+
| **statsd-exporter** (metrics Airflow→Prom) | http://localhost:9102/metrics | — nessuna auth — |
19+
20+
---
21+
22+
## ETL Microservices — Porte & Health Endpoints
23+
24+
Tutti i servizi espongono `GET /health` e `GET /metrics`.
25+
26+
| Container | Porta | Health check |
27+
|---|---|---|
28+
| `extract-csv-service` | 5001 | http://localhost:5001/health |
29+
| `clean-nan-service` | 5002 | http://localhost:5002/health |
30+
| `delete-columns-service` | 5004 | http://localhost:5004/health |
31+
| `extract-sql-service` | 5005 | http://localhost:5005/health |
32+
| `extract-api-service` | 5006 | http://localhost:5006/health |
33+
| `extract-excel-service` | 5007 | http://localhost:5007/health |
34+
| `join-datasets-service` | 5008 | http://localhost:5008/health |
35+
| `load-data-service` | 5009 | http://localhost:5009/health |
36+
| `data-quality-service` | 5010 | http://localhost:5010/health |
37+
| `outlier-detection-service` | 5011 | http://localhost:5011/health |
38+
| `text-completion-llm-service` | 5012 | http://localhost:5012/health |
39+
40+
> **Porta 5003** — non assegnata (gap storico). Prossimo servizio disponibile: **5013**.
41+
42+
---
43+
44+
## Variabili d'Ambiente — `.env`
45+
46+
Copia `.env.example` in `.env` e modifica i valori prima di avviare lo stack.
47+
48+
```bash
49+
cp .env.example .env
50+
```
51+
52+
| Variabile | Default | Descrizione |
53+
|---|---|---|
54+
| `POSTGRES_USER` | `airflow` | Username PostgreSQL |
55+
| `POSTGRES_PASSWORD` | `change-me-strong-password` | Password PostgreSQL ⚠️ cambia in produzione |
56+
| `POSTGRES_DB` | `airflow` | Nome database PostgreSQL |
57+
| `GF_SECURITY_ADMIN_PASSWORD` | `change-me-strong-password` | Password admin Grafana ⚠️ cambia in produzione |
58+
| `ETL_DATA_ROOT` | `/app/data` | Root directory dati nei container |
59+
| `ALLOW_PRIVATE_API_URLS` | `false` | Permette URL privati nell'extract-api-service |
60+
| `HF_MODELS_PATH` | `./hf_models` | Path locale cache modelli HuggingFace |
61+
| `LLM_PROVIDER` | `openai` | Provider AI: `openai` oppure `local` |
62+
| `OPENAI_API_KEY` | *(vuoto)* | API key OpenAI — richiesta se `LLM_PROVIDER=openai` |
63+
| `OPENAI_MODEL` | `gpt-4o-mini` | Modello OpenAI da usare |
64+
65+
---
66+
67+
## Credenziali Airflow
68+
69+
L'utente admin viene creato automaticamente al primo avvio dal `CMD` nel Dockerfile Airflow
70+
(idempotente: se esiste già non lo ricrea).
71+
72+
| Campo | Valore default |
73+
|---|---|
74+
| Username | `admin` |
75+
| Password | `admin` |
76+
| URL | http://localhost:8080 |
77+
78+
> Se stai usando un'istanza già avviata **prima** di questa modifica, crea l'utente manualmente:
79+
> ```bash
80+
> docker exec airflow airflow users create \
81+
> --username admin --password admin \
82+
> --firstname Admin --lastname ETL \
83+
> --role Admin --email admin@etl.local
84+
> ```
85+
> Cambia la password dopo il primo login: `Admin → Users → admin → Edit`.
86+
87+
---
88+
89+
## Grafana — Login
90+
91+
| Campo | Valore default |
92+
|---|---|
93+
| URL | http://localhost:3000 |
94+
| Username | `admin` |
95+
| Password | valore di `GF_SECURITY_ADMIN_PASSWORD` nel `.env` (default: `change-me-strong-password`) |
96+
97+
Il datasource **Prometheus** e il dashboard **ETL Microservices — Monitoring Overview** sono pre-caricati automaticamente tramite provisioning (non serve configurazione manuale).
98+
99+
---
100+
101+
## PostgreSQL — Connessione diretta
102+
103+
SQLAlchemy URI (usata internamente da Airflow, utile anche per ispezione diretta):
104+
```
105+
postgresql+psycopg2://airflow:change-me-strong-password@localhost:5432/airflow
106+
```
107+
108+
---
109+
110+
## Checklist Sicurezza (Produzione)
111+
112+
- [ ] Cambia `POSTGRES_PASSWORD` con una password forte
113+
- [ ] Cambia `GF_SECURITY_ADMIN_PASSWORD` con una password forte
114+
- [ ] Imposta `OPENAI_API_KEY` con la tua chiave reale
115+
- [ ] Aggiungi autenticazione ai microservizi (attualmente open su rete interna)
116+
- [ ] Non esporre le porte ETL (5001–5012) su IP pubblici
117+
- [ ] Abilita HTTPS per Grafana e Airflow in ambienti esposti
118+
- [ ] Ruota credenziali Airflow admin dopo il primo login

0 commit comments

Comments
 (0)