A production-style dbt modeling workflow that transforms raw listing, calendar, and amenities-change data into a documented, tested, analyst-ready daily listing mart.
- Layered dbt modeling: staging → intermediate → marts
- SCD-style amenity history modeling with
valid_from/valid_tointervals - Point-in-time joins to avoid leaking future amenity state into historical rows
- Incremental mart design with a
listing_id + dategrain - Data quality tests for uniqueness and not-null constraints
- Reusable Jinja macro for amenity flag generation
- Analyst-facing SQL outputs for revenue, pricing, and availability questions
models/
staging/
stg_listings.sql -- rename/cast raw listing fields
stg_calendar.sql -- dedup and cast calendar events
stg_amenities_changelog.sql -- parse JSON amenity arrays
intermediate/
int_amenities_scd.sql -- SCD Type 2 intervals per (listing, amenity)
int_listing_amenities_asof.sql -- point-in-time join: calendar × SCD amenities × listings
marts/
mart_daily_listing.sql -- incremental fact mart, one row per listing per date
mart_daily_listing.md -- column documentation and flag reference
macros/
build_amenity_flags.sql -- Jinja macro: generates has_<amenity> flag columns
seeds/
listings.csv
calendar.csv
amenities_changelog.csv
example_queries/
amenity_revenue_analysis.sql -- monthly revenue split by air conditioning
neighborhood_price_change.sql -- year-over-year nightly price change by neighborhood
long_stay_availability.sql -- longest bookable consecutive stay per qualifying listing
Mart grain: one row per (listing_id, date)
Incremental strategy: delete+insert over a rolling 7-day lookback window. Full refresh does
not scale for a growing listing-day fact table. A 7-day window handles late-arriving reservations
and corrections while keeping run cost predictable.
Historical amenity logic: int_amenities_scd computes valid_from / valid_to intervals
using LEAD over the changelog. The point-in-time join in int_listing_amenities_asof uses
date >= valid_from AND date < valid_to to select only the amenity state that was active on a
given calendar day.
Amenity flags: the build_amenity_flags macro generates has_<amenity> boolean integer
columns in the mart, which are cleaner and more query-friendly than filtering on a raw array column.
Data quality: dbt schema tests enforce not_null and unique on all key columns, and
dbt_utils.unique_combination_of_columns validates composite grain uniqueness at each layer.
In production I would:
- Tune the incremental lookback window using observed upstream data latency rather than a fixed 7 days
- Add source freshness checks so stale upstream tables surface as failures before they corrupt the mart
- Drive incremental logic off an ingestion timestamp or watermark if one is available, rather than
max(date) - Add a backfill mechanism for late bulk corrections that fall outside the lookback window
- Use warehouse-specific optimization strategies, such as partitioning in BigQuery or clustering keys in Snowflake.
- Document table ownership, SLA, and downstream dependencies in the dbt model
descriptionfields
The example_queries/ folder contains sample analyst-facing SQL that uses the final mart to answer
business questions around revenue, pricing, availability, and amenities.
This is a small local dbt case study using seed files and DuckDB. In a production warehouse, I would add source freshness tests, warehouse-specific clustering and partitioning, CI checks, environment-specific profiles, and alerting around failed or stale builds.
The seeds/ folder contains small synthetic CSV files created for demonstration purposes. The data
is not from a real company or customer system.
| File | Rows | Purpose |
|---|---|---|
listings.csv |
5 | One row per listing with neighborhood, amenities, and price |
calendar.csv |
44 | Daily availability and reservation events across two years |
amenities_changelog.csv |
9 | Amenity change events used to build SCD2 intervals |
The calendar spans 2021-07-12 to 2022-07-11 and is designed to exercise all three example queries: revenue segmentation by amenity, year-over-year price change by neighborhood, and consecutive availability windows for long-stay bookings.
# install dbt and dbt-duckdb
pip install dbt-duckdb
# install dbt packages
dbt deps
# seed raw data, build all models, and run tests
dbt seed
dbt buildA profiles.yml pointing to a local DuckDB database is required and is not committed to this repo.
See the dbt DuckDB adapter docs
for setup instructions.