Skip to content

akshay-gera/EventAnalytics_dbt_bigquery

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Models Documentation

This repository contains a collection of data models used for event tracking and analysis across mobile and web platforms. Below is the documentation for each model, its description, and details about the columns within each table.

Problem Context

This project tries to unite two disparate data sources of event data about customer interactions (through mobile app and web): We are transforming this raw user interaction data from both mobile and web platforms to help our team answer some preliminary questions about the core engagement metrics:

  • Daily Active Users (DAU): Any activity on the platform
  • Daily Active Learners: People engaging with the content of the platform
  • Content Completion: Customers who finished consuming the content
  • Regional Activity: Activity across different countries

The data modeling exercise also builds the foundation of the transformation from raw disparate data to more structured data models. This would create an opportunity for a more detailed analysis of our customer events on the web and mobile app

Data Model Diagram

image

Data Modeling Logic

To bring together the mobile and web event data, we apply the following transformations:

  • Data Normalization: Breaking the raw data in Fact and dimension tables was necessary for efficient and optimized data modeling. This is done to ensure star schema where a single fact table can be connected to dimensions to get more granularity
  • Union to Get Fct_Events: The mobile and web data are first unionized, based on the common columns between both data. A data source identifier is given to keep track of source data after unionization.
  • Mobile Data Attributes: The columns specific to mobile data attributes coming from mobile event data are stored as a dimension table with user_id, and session_id as primary keys. We can join this with Fct_events to get more granularity of mobile event data
  • Web Data Attributes: On similar lines, create a dimension table on web event data attributes also with user_id, and session_id as primary keys. We can join this with Fct_events to get more granularity of mobile event data

Models Overview

1. fct_events

Example of Data in fct_events

Description: This model combines both sources - mobile and web event data, unionized based on the common columns, to provide a unified events analytics model.

event_source event_timestamp event_name user_id user_access_type session_id country_code item_id
mobile 2025-02-12 09:15 view_item 12345 premium 67890 US 98765
web 2025-02-12 09:20 click_item 12346 free 67891 CA 98766

This table shows how each row represents a distinct user interaction with the platform, including the source of the event (mobile or web), timestamp, type of event, and details of the session.

2. dim_app_device_features

  • Description: This model provides additional device information for each session ID and user ID from mobile event data. It includes details about the device used to log in to the app.
session_id user_id client_id device_locale_code device_locale_country device_locale_language device_platform_name event_version device_make application_version_code application_version_name
67890 12345 54321 en-US US English Android 1.0 Samsung 15 1.2.3
67891 12346 54322 fr-CA CA French iOS 1.2 Apple 17 1.3.0

3. dim_web_browser_attributes

  • Description: This model provides additional information about the browser and device used for web sessions, including details for each session ID and user ID.
session_id user_id fingerprint browser_name browser_version language screen_resolution device_type cookies_enabled referrer
67890 12345 abc123 Chrome 92.0 en-US 1920x1080 Desktop 1 google.com
67891 12346 def456 Firefox 85.0 fr-CA 1366x768 Mobile 0 facebook.com

4. dim_items

  • Description: This model contains information about the items interacted with during sessions. This is derived by unionizing web and mobile data
item_id item_title item_type
98765 "The Great Gatsby" Book
98766 "Introduction to SQL" Book
98767 "How to Cook Pasta" Article
98768 "Web Development 101" Article

5. dim_events_arrival_times

  • Description: This model contains timestamps of events happening and when they reach our backend system.
session_id user_id event_timestamp arrival_timestamp
67890 12345 2025-02-12 09:15:00 2025-02-12 09:16:30
67891 12346 2025-02-12 09:20:00 2025-02-12 09:21:05
67892 12347 2025-02-12 09:30:00 2025-02-12 09:31:15
67893 12348 2025-02-12 09:45:00 2025-02-12 09:46:45

Glimpse of Front-End Dashboard Coming from Fct_Events Data Model

  • The model Fct_Events can be used to plot charts like these to answer the preliminary questions about our content engagement image

Challenges Faced

  • Unifying Disparate Data: Since the data comes from two different platforms and has a lot of uncommon fields, it was a challenge to conceptualize the unification of data. Unionizing led to a single source of truth for content engagement across different platform
  • Using Data Normalization Techniques to separate transactional data from descriptive data and deciding data models with a long-term vision and not focusing on short-term solutions to give a single data model that could answer everything
  • Getting a holistic understanding of customer engagement across sessions due to a lack of data completeness
  • Creating a data model on customer subscription status was a challenge based on the event data due to the nature of changing membership status. It should be recorded and streamed separately and can be modeled as a Slowly Changing Dimension in dbt to keep a history of changes to membership

Project Assumptions

  • Data Ingestion: We are assuming that the data is ingested from streaming platforms and resides in the Raw schema of our Google Big Query. To ensure simplicity we uploaded it as CSV in big query raw schema and treated that as ingested data
  • Data Orchestration: The assumption is made that the dbt runs are triggered through an orchestration tool which runs on the set time to perform these transformations on schedule
  • Data Quality and Consistency: For the sake of simplicity, we are assuming that the data is clean and consistent (in terms of format) and adheres to the schema every time it flows into our backend system.
  • Data Validation: We are assuming that the data quality is reliable to perform further transformations. We are assuming we have all data testing in place (data validation checks, accepted values, null value handling, unit testing, etc.)
  • Data Loading Process: We are assuming that incremental loads logic already exists to insert only new records into our transformed tables to ensure query optimization

About

The project aims to give a production grade data modelling to raw web and mobile app data of user interactions with an anonymized platform

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors