dbt Analytics Engineering Case Study

A production-style dbt modeling workflow that transforms raw listing, calendar, and amenities-change data into a documented, tested, analyst-ready daily listing mart.

What this shows

Layered dbt modeling: staging → intermediate → marts
SCD-style amenity history modeling with valid_from / valid_to intervals
Point-in-time joins to avoid leaking future amenity state into historical rows
Incremental mart design with a listing_id + date grain
Data quality tests for uniqueness and not-null constraints
Reusable Jinja macro for amenity flag generation
Analyst-facing SQL outputs for revenue, pricing, and availability questions

Project structure

models/
  staging/
    stg_listings.sql              -- rename/cast raw listing fields
    stg_calendar.sql              -- dedup and cast calendar events
    stg_amenities_changelog.sql   -- parse JSON amenity arrays
  intermediate/
    int_amenities_scd.sql              -- SCD Type 2 intervals per (listing, amenity)
    int_listing_amenities_asof.sql     -- point-in-time join: calendar × SCD amenities × listings
  marts/
    mart_daily_listing.sql        -- incremental fact mart, one row per listing per date
    mart_daily_listing.md         -- column documentation and flag reference

macros/
  build_amenity_flags.sql              -- Jinja macro: generates has_<amenity> flag columns

seeds/
  listings.csv
  calendar.csv
  amenities_changelog.csv

example_queries/
  amenity_revenue_analysis.sql         -- monthly revenue split by air conditioning
  neighborhood_price_change.sql        -- year-over-year nightly price change by neighborhood
  long_stay_availability.sql           -- longest bookable consecutive stay per qualifying listing

Modeling decisions

Mart grain: one row per (listing_id, date)

Incremental strategy: delete+insert over a rolling 7-day lookback window. Full refresh does not scale for a growing listing-day fact table. A 7-day window handles late-arriving reservations and corrections while keeping run cost predictable.

Historical amenity logic: int_amenities_scd computes valid_from / valid_to intervals using LEAD over the changelog. The point-in-time join in int_listing_amenities_asof uses date >= valid_from AND date < valid_to to select only the amenity state that was active on a given calendar day.

Amenity flags: the build_amenity_flags macro generates has_<amenity> boolean integer columns in the mart, which are cleaner and more query-friendly than filtering on a raw array column.

Data quality: dbt schema tests enforce not_null and unique on all key columns, and dbt_utils.unique_combination_of_columns validates composite grain uniqueness at each layer.

Production considerations

In production I would:

Tune the incremental lookback window using observed upstream data latency rather than a fixed 7 days
Add source freshness checks so stale upstream tables surface as failures before they corrupt the mart
Drive incremental logic off an ingestion timestamp or watermark if one is available, rather than max(date)
Add a backfill mechanism for late bulk corrections that fall outside the lookback window
- Use warehouse-specific optimization strategies, such as partitioning in BigQuery or clustering keys in Snowflake.
Document table ownership, SLA, and downstream dependencies in the dbt model description fields

Example queries

The example_queries/ folder contains sample analyst-facing SQL that uses the final mart to answer business questions around revenue, pricing, availability, and amenities.

Limitations

This is a small local dbt case study using seed files and DuckDB. In a production warehouse, I would add source freshness tests, warehouse-specific clustering and partitioning, CI checks, environment-specific profiles, and alerting around failed or stale builds.

Sample data

The seeds/ folder contains small synthetic CSV files created for demonstration purposes. The data is not from a real company or customer system.

File	Rows	Purpose
`listings.csv`	5	One row per listing with neighborhood, amenities, and price
`calendar.csv`	44	Daily availability and reservation events across two years
`amenities_changelog.csv`	9	Amenity change events used to build SCD2 intervals

The calendar spans 2021-07-12 to 2022-07-11 and is designed to exercise all three example queries: revenue segmentation by amenity, year-over-year price change by neighborhood, and consecutive availability windows for long-stay bookings.

Running locally

# install dbt and dbt-duckdb
pip install dbt-duckdb

# install dbt packages
dbt deps

# seed raw data, build all models, and run tests
dbt seed
dbt build

A profiles.yml pointing to a local DuckDB database is required and is not committed to this repo. See the dbt DuckDB adapter docs for setup instructions.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
example_queries		example_queries
macros		macros
models		models
seeds		seeds
.gitignore		.gitignore
README.md		README.md
dbt_project.yml		dbt_project.yml
packages.yml		packages.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dbt Analytics Engineering Case Study

What this shows

Project structure

Modeling decisions

Production considerations

Example queries

Limitations

Sample data

Running locally

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

dbt Analytics Engineering Case Study

What this shows

Project structure

Modeling decisions

Production considerations

Example queries

Limitations

Sample data

Running locally

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages