A scheduled ELT pipeline that pulls daily weather data for 10 UK cities from the Open-Meteo API, lands it raw in PostgreSQL, and transforms it into dimensional models using dbt — all orchestrated by Airflow.
Stack: Python, PostgreSQL, dbt, Airflow, Docker Compose
flowchart LR
API[Open-Meteo API] --> Extract[Python Extraction]
Extract --> Raw[raw.weather_daily]
Raw --> dbt_run[dbt run]
dbt_run --> Staging[stg_weather_daily]
dbt_run --> Marts[fct_daily_weather]
Seeds[dim_cities] --> Marts
Marts --> dbt_test[dbt test]
subgraph PostgreSQL
Raw
Staging
Marts
Seeds
end
subgraph Airflow DAG
Extract --> dbt_run --> dbt_test
end
An Airflow DAG runs daily and does three things in order:
-
A Python extraction script hits the Open-Meteo API and appends the raw JSON response into a
raw.weather_dailytable in PostgreSQL. Each row gets aloaded_attimestamp so I can track when data was ingested. -
dbt picks up the raw data and transforms it through a staging layer (deduplication, type casting) into a mart layer. The final
fct_daily_weathertable joins weather readings with adim_citiesseed file and flags extreme weather days (temperature above 30C or precipitation above 50mm). -
dbt tests run automatically —
not_nullchecks on key columns across both layers. If anything fails, the DAG fails too, so I know about it.
The whole thing is idempotent. I can rerun the DAG for any historical date and it won't create duplicates — it deletes and reinserts for that date window. This also means backfilling is straightforward: trigger the DAG 30 times for 30 dates and it just works.
Docker and Docker Compose. That's it — everything else runs inside containers.
docker compose up --build -dThis builds the custom Airflow image (with dbt installed), starts PostgreSQL, runs database migrations, creates the admin user, then brings up the webserver and scheduler. First run takes a couple of minutes for the build.
Check everything is healthy:
docker compose psYou should see postgres, airflow-webserver, and airflow-scheduler running. airflow-init will show as exited (0) — that's expected, it's a one-shot setup task.
Go to localhost:8080 and log in with admin / admin.
You'll see the weather_pipeline DAG. It may take 30 seconds for the scheduler to pick it up after first boot.
Unpause the DAG using the toggle on the left, then click the play button on the right. The DAG runs three tasks in sequence:
- extract_weather — pulls today's weather data from Open-Meteo for 10 UK cities into
raw.weather_daily - dbt_run — seeds the
dim_citiestable and builds the staging + mart models - dbt_test — runs
not_nullchecks across key columns
All three should go green within about 2 minutes.
Connect to the warehouse database:
docker compose exec postgres psql -U airflow -d warehouseThen explore the three layers:
-- Raw data as extracted from the API
SELECT * FROM raw.weather_daily ORDER BY city;
-- Deduplicated staging layer
SELECT * FROM public.stg_weather_daily ORDER BY city;
-- Final mart with region and extreme weather flag
SELECT * FROM public.fct_daily_weather ORDER BY city;docker compose exec airflow-scheduler airflow dags backfill weather_pipeline -s 2025-06-01 -e 2025-06-07This runs the full pipeline once per date in the range. Each run is idempotent — safe to re-run without creating duplicates.
docker compose downAdd -v to wipe the database volume and start fresh:
docker compose down -v- stg_weather_daily — deduplicates the raw data by taking the latest
loaded_atper city and date, casts columns to the right types - dim_cities — a seed file with city name, latitude, longitude, and region
- fct_daily_weather — the final fact table joining weather data to city dimensions, with an
is_extreme_weatherflag