Data Models Documentation

This repository contains a collection of data models used for event tracking and analysis across mobile and web platforms. Below is the documentation for each model, its description, and details about the columns within each table.

Problem Context

This project tries to unite two disparate data sources of event data about customer interactions (through mobile app and web): We are transforming this raw user interaction data from both mobile and web platforms to help our team answer some preliminary questions about the core engagement metrics:

Daily Active Users (DAU): Any activity on the platform
Daily Active Learners: People engaging with the content of the platform
Content Completion: Customers who finished consuming the content
Regional Activity: Activity across different countries

The data modeling exercise also builds the foundation of the transformation from raw disparate data to more structured data models. This would create an opportunity for a more detailed analysis of our customer events on the web and mobile app

Data Model Diagram

Data Modeling Logic

To bring together the mobile and web event data, we apply the following transformations:

Data Normalization: Breaking the raw data in Fact and dimension tables was necessary for efficient and optimized data modeling. This is done to ensure star schema where a single fact table can be connected to dimensions to get more granularity
Union to Get Fct_Events: The mobile and web data are first unionized, based on the common columns between both data. A data source identifier is given to keep track of source data after unionization.
Mobile Data Attributes: The columns specific to mobile data attributes coming from mobile event data are stored as a dimension table with user_id, and session_id as primary keys. We can join this with Fct_events to get more granularity of mobile event data
Web Data Attributes: On similar lines, create a dimension table on web event data attributes also with user_id, and session_id as primary keys. We can join this with Fct_events to get more granularity of mobile event data

Models Overview

1. `fct_events`

Example of Data in `fct_events`

Description: This model combines both sources - mobile and web event data, unionized based on the common columns, to provide a unified events analytics model.

event_source	event_timestamp	event_name	user_id	user_access_type	session_id	country_code	item_id
mobile	2025-02-12 09:15	view_item	12345	premium	67890	US	98765
web	2025-02-12 09:20	click_item	12346	free	67891	CA	98766

This table shows how each row represents a distinct user interaction with the platform, including the source of the event (mobile or web), timestamp, type of event, and details of the session.

2. `dim_app_device_features`

Description: This model provides additional device information for each session ID and user ID from mobile event data. It includes details about the device used to log in to the app.

session_id	user_id	client_id	device_locale_code	device_locale_country	device_locale_language	device_platform_name	event_version	device_make	application_version_code	application_version_name
67890	12345	54321	en-US	US	English	Android	1.0	Samsung	15	1.2.3
67891	12346	54322	fr-CA	CA	French	iOS	1.2	Apple	17	1.3.0

3. `dim_web_browser_attributes`

Description: This model provides additional information about the browser and device used for web sessions, including details for each session ID and user ID.

session_id	user_id	fingerprint	browser_name	browser_version	language	screen_resolution	device_type	cookies_enabled	referrer
67890	12345	abc123	Chrome	92.0	en-US	1920x1080	Desktop	1	google.com
67891	12346	def456	Firefox	85.0	fr-CA	1366x768	Mobile	0	facebook.com

4. `dim_items`

Description: This model contains information about the items interacted with during sessions. This is derived by unionizing web and mobile data

item_id	item_title	item_type
98765	"The Great Gatsby"	Book
98766	"Introduction to SQL"	Book
98767	"How to Cook Pasta"	Article
98768	"Web Development 101"	Article

5. `dim_events_arrival_times`

Description: This model contains timestamps of events happening and when they reach our backend system.

session_id	user_id	event_timestamp	arrival_timestamp
67890	12345	2025-02-12 09:15:00	2025-02-12 09:16:30
67891	12346	2025-02-12 09:20:00	2025-02-12 09:21:05
67892	12347	2025-02-12 09:30:00	2025-02-12 09:31:15
67893	12348	2025-02-12 09:45:00	2025-02-12 09:46:45

Glimpse of Front-End Dashboard Coming from Fct_Events Data Model

The model Fct_Events can be used to plot charts like these to answer the preliminary questions about our content engagement

Challenges Faced

Unifying Disparate Data: Since the data comes from two different platforms and has a lot of uncommon fields, it was a challenge to conceptualize the unification of data. Unionizing led to a single source of truth for content engagement across different platform
Using Data Normalization Techniques to separate transactional data from descriptive data and deciding data models with a long-term vision and not focusing on short-term solutions to give a single data model that could answer everything
Getting a holistic understanding of customer engagement across sessions due to a lack of data completeness
Creating a data model on customer subscription status was a challenge based on the event data due to the nature of changing membership status. It should be recorded and streamed separately and can be modeled as a Slowly Changing Dimension in dbt to keep a history of changes to membership

Project Assumptions

Data Ingestion: We are assuming that the data is ingested from streaming platforms and resides in the Raw schema of our Google Big Query. To ensure simplicity we uploaded it as CSV in big query raw schema and treated that as ingested data
Data Orchestration: The assumption is made that the dbt runs are triggered through an orchestration tool which runs on the set time to perform these transformations on schedule
Data Quality and Consistency: For the sake of simplicity, we are assuming that the data is clean and consistent (in terms of format) and adheres to the schema every time it flows into our backend system.
Data Validation: We are assuming that the data quality is reliable to perform further transformations. We are assuming we have all data testing in place (data validation checks, accepted values, null value handling, unit testing, etc.)
Data Loading Process: We are assuming that incremental loads logic already exists to insert only new records into our transformed tables to ensure query optimization

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
analyses		analyses
macros		macros
models		models
seeds		seeds
snapshots		snapshots
tests		tests
.gitignore		.gitignore
README.md		README.md
dbt_project.yml		dbt_project.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Models Documentation

Problem Context

Data Model Diagram

Data Modeling Logic

Models Overview

1. `fct_events`

Example of Data in `fct_events`

2. `dim_app_device_features`

3. `dim_web_browser_attributes`

4. `dim_items`

5. `dim_events_arrival_times`

Glimpse of Front-End Dashboard Coming from Fct_Events Data Model

Challenges Faced

Project Assumptions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Data Models Documentation

Problem Context

Data Model Diagram

Data Modeling Logic

Models Overview

1. fct_events

Example of Data in fct_events

2. dim_app_device_features

3. dim_web_browser_attributes

4. dim_items

5. dim_events_arrival_times

Glimpse of Front-End Dashboard Coming from Fct_Events Data Model

Challenges Faced

Project Assumptions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

1. `fct_events`

Example of Data in `fct_events`

2. `dim_app_device_features`

3. `dim_web_browser_attributes`

4. `dim_items`

5. `dim_events_arrival_times`

Packages