Skip to content

dhruvi-a/dbt-analytics-engineering-case-study

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dbt Analytics Engineering Case Study

A production-style dbt modeling workflow that transforms raw listing, calendar, and amenities-change data into a documented, tested, analyst-ready daily listing mart.

What this shows

  • Layered dbt modeling: staging → intermediate → marts
  • SCD-style amenity history modeling with valid_from / valid_to intervals
  • Point-in-time joins to avoid leaking future amenity state into historical rows
  • Incremental mart design with a listing_id + date grain
  • Data quality tests for uniqueness and not-null constraints
  • Reusable Jinja macro for amenity flag generation
  • Analyst-facing SQL outputs for revenue, pricing, and availability questions

Project structure

models/
  staging/
    stg_listings.sql              -- rename/cast raw listing fields
    stg_calendar.sql              -- dedup and cast calendar events
    stg_amenities_changelog.sql   -- parse JSON amenity arrays
  intermediate/
    int_amenities_scd.sql              -- SCD Type 2 intervals per (listing, amenity)
    int_listing_amenities_asof.sql     -- point-in-time join: calendar × SCD amenities × listings
  marts/
    mart_daily_listing.sql        -- incremental fact mart, one row per listing per date
    mart_daily_listing.md         -- column documentation and flag reference

macros/
  build_amenity_flags.sql              -- Jinja macro: generates has_<amenity> flag columns

seeds/
  listings.csv
  calendar.csv
  amenities_changelog.csv

example_queries/
  amenity_revenue_analysis.sql         -- monthly revenue split by air conditioning
  neighborhood_price_change.sql        -- year-over-year nightly price change by neighborhood
  long_stay_availability.sql           -- longest bookable consecutive stay per qualifying listing

Modeling decisions

Mart grain: one row per (listing_id, date)

Incremental strategy: delete+insert over a rolling 7-day lookback window. Full refresh does not scale for a growing listing-day fact table. A 7-day window handles late-arriving reservations and corrections while keeping run cost predictable.

Historical amenity logic: int_amenities_scd computes valid_from / valid_to intervals using LEAD over the changelog. The point-in-time join in int_listing_amenities_asof uses date >= valid_from AND date < valid_to to select only the amenity state that was active on a given calendar day.

Amenity flags: the build_amenity_flags macro generates has_<amenity> boolean integer columns in the mart, which are cleaner and more query-friendly than filtering on a raw array column.

Data quality: dbt schema tests enforce not_null and unique on all key columns, and dbt_utils.unique_combination_of_columns validates composite grain uniqueness at each layer.

Production considerations

In production I would:

  • Tune the incremental lookback window using observed upstream data latency rather than a fixed 7 days
  • Add source freshness checks so stale upstream tables surface as failures before they corrupt the mart
  • Drive incremental logic off an ingestion timestamp or watermark if one is available, rather than max(date)
  • Add a backfill mechanism for late bulk corrections that fall outside the lookback window
    • Use warehouse-specific optimization strategies, such as partitioning in BigQuery or clustering keys in Snowflake.
  • Document table ownership, SLA, and downstream dependencies in the dbt model description fields

Example queries

The example_queries/ folder contains sample analyst-facing SQL that uses the final mart to answer business questions around revenue, pricing, availability, and amenities.

Limitations

This is a small local dbt case study using seed files and DuckDB. In a production warehouse, I would add source freshness tests, warehouse-specific clustering and partitioning, CI checks, environment-specific profiles, and alerting around failed or stale builds.

Sample data

The seeds/ folder contains small synthetic CSV files created for demonstration purposes. The data is not from a real company or customer system.

File Rows Purpose
listings.csv 5 One row per listing with neighborhood, amenities, and price
calendar.csv 44 Daily availability and reservation events across two years
amenities_changelog.csv 9 Amenity change events used to build SCD2 intervals

The calendar spans 2021-07-12 to 2022-07-11 and is designed to exercise all three example queries: revenue segmentation by amenity, year-over-year price change by neighborhood, and consecutive availability windows for long-stay bookings.

Running locally

# install dbt and dbt-duckdb
pip install dbt-duckdb

# install dbt packages
dbt deps

# seed raw data, build all models, and run tests
dbt seed
dbt build

A profiles.yml pointing to a local DuckDB database is required and is not committed to this repo. See the dbt DuckDB adapter docs for setup instructions.

About

Production-style dbt case study with SCD-style modeling, point-in-time joins, incremental marts, tests, and analyst-facing SQL.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors