Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/docs/assets/images/global_co_explorer.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/docs/assets/images/nyc_taxi.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/docs/assets/images/spotify_analytics.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/docs/assets/images/us_home_sales.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
66 changes: 46 additions & 20 deletions docs/docs/examples/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,48 +2,74 @@

Real-world examples showing what altimate can do across data engineering workflows. Each example demonstrates end-to-end automation — from discovery to implementation.

<div class="grid cards" markdown>
---

- :material-pipe:{ .lg .middle } **Build, Test & Document dbt Models**
## NYC Taxi Coverage Dashboard

---
`DuckDB` `dbt` `Airflow` `Python`

Pull context from your Knowledge Hub, grab requirements from a Jira ticket, and build fully tested dbt models — all from your IDE.
**Prompt:**

> Take the New York City taxi cab public dataset, bring up a DuckDB instance, and build a dashboard showing areas of maximum coverage and lowest coverage. Set up a complete dbt project with staging, intermediate, and mart layers, and create an Airflow DAG to orchestrate the pipeline.

- :material-snowflake:{ .lg .middle } **Find Broken Views in Snowflake**
![NYC Taxi Coverage Dashboard](../assets/images/nyc_taxi.png)

---
---

Create a "Sprint Work Agent" that queries Snowflake, finds empty views, traces root causes through dbt models, and files Jira tickets.
## Olist E-Commerce Analytics Pipeline

`Snowflake` `Azure Data Factory` `Azure Blob Storage` `dbt`

- :material-cash-multiple:{ .lg .middle } **Optimize Cost & Performance**
**Prompt:**

---
> Build an end-to-end e-commerce analytics pipeline using the Olist Brazilian E-Commerce dataset. Use Azure Data Factory to ingest CSV files from Blob Storage into Snowflake raw tables, then orchestrate Snowflake stored procedures to transform data through raw → staging → mart layers (star schema with customer, product, seller dimensions and orders fact table). Create mart views for customer lifetime value, seller performance scores, and delivery SLA compliance.

Automate discovery and implementation of optimization opportunities across Snowflake, Databricks, and BigQuery.
![ADF Snowflake Pipeline](../assets/images/ADF_Snowflake_Pipeline.png)

---

- :material-swap-horizontal:{ .lg .middle } **Migrate PySpark to dbt**
## Global CO2 & Climate Explorer

---
`DuckDB-WASM` `SQL` `Browser`

Convert a PySpark-based reporting project in Databricks to dbt with automated code conversion, testing, and validation.
**Prompt:**

> Build me an interactive Global CO2 & Climate Explorer dashboard using DuckDB-WASM running entirely in the browser, sourcing data from Our World in Data's CO2 dataset. Give me surprising insights about who emits the most, how that's changing, the equity angle of per-capita emissions, and which countries bear the most historical responsibility. Include an interactive SQL console with example queries showing off CTEs, window functions (LAG, RANK, SUM OVER), and make it a single index.html with a dark theme.

- :material-bug:{ .lg .middle } **Debug an Airflow DAG**
![Global CO2 Explorer](../assets/images/global_co_explorer.png)

---
---

Use AI to debug Airflow DAGs by combining platform integrations, best-practice templates, and automated fix suggestions.
## Spotify Analytics Pipeline Migration

`PySpark` `dbt` `Databricks` `Airflow`

- :material-function:{ .lg .middle } **Write Snowflake UDFs**
**Prompt:**

---
> Modernize my Spotify analytics pipeline: use the Kaggle Spotify Tracks public dataset, migrate all PySpark transformations in /spotify-analytics/ to dbt on Databricks/Spark, preserve the ML feature engineering logic (popularity tiers, mood classification, audio profile scores), add schema tests and unit tests, generate an Airflow DAG with SLAs and alerting, and validate semantic equivalence of the outputs.

Use the Knowledge Hub to guide LLMs in building Snowflake UDFs with best practices, examples, and auto-generated documentation.
![Spotify Analytics Pipeline](../assets/images/spotify_analytics.png)

---

</div>
## US Home Sales Data Science Dashboard

`Data Science` `K-Means` `OLS Regression` `R/ggplot2 Aesthetic`

**Prompt:**

> Download all available public US home sales data sets. Process and merge them into a unified format. Perform advanced data science on it to bring to the surface interesting insights. K-means, OLS regressions, and more. Build a single interactive dashboard with data science style charts, think violin plots, Q-Q plots and lollipop charts. Use a R/ggplot2 aesthetic. No BI style charts.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix compound modifier hyphenation.

The phrases "data science style charts" and "BI style charts" should use hyphens to join the compound modifiers before the noun. As per static analysis hints, compound modifiers require hyphenation.

📝 Proposed fix
-> Download all available public US home sales data sets. Process and merge them into a unified format. Perform advanced data science on it to bring to the surface interesting insights. K-means, OLS regressions, and more. Build a single interactive dashboard with data science style charts, think violin plots, Q-Q plots and lollipop charts. Use a R/ggplot2 aesthetic. No BI style charts.
+> Download all available public US home sales data sets. Process and merge them into a unified format. Perform advanced data science on it to bring to the surface interesting insights. K-means, OLS regressions, and more. Build a single interactive dashboard with data-science-style charts, think violin plots, Q-Q plots and lollipop charts. Use a R/ggplot2 aesthetic. No BI-style charts.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
> Download all available public US home sales data sets. Process and merge them into a unified format. Perform advanced data science on it to bring to the surface interesting insights. K-means, OLS regressions, and more. Build a single interactive dashboard with data science style charts, think violin plots, Q-Q plots and lollipop charts. Use a R/ggplot2 aesthetic. No BI style charts.
> Download all available public US home sales data sets. Process and merge them into a unified format. Perform advanced data science on it to bring to the surface interesting insights. K-means, OLS regressions, and more. Build a single interactive dashboard with data-science-style charts, think violin plots, Q-Q plots and lollipop charts. Use a R/ggplot2 aesthetic. No BI-style charts.
🧰 Tools
🪛 LanguageTool

[grammar] ~61-~61: Use a hyphen to join words.
Context: ... interactive dashboard with data science style charts, think violin plots, Q-Q pl...

(QB_NEW_EN_HYPHEN)


[grammar] ~61-~61: Use a hyphen to join words.
Context: ...charts. Use a R/ggplot2 aesthetic. No BI style charts. ![US Home Sales Dashboard...

(QB_NEW_EN_HYPHEN)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/docs/examples/index.md` at line 61, Update the phrasing to hyphenate the
compound modifiers: change "data science style charts" to "data-science-style
charts" and "BI style charts" to "BI-style charts" in the index.md content so
the compound adjectives correctly modify "charts".


![US Home Sales Dashboard](../assets/images/us_home_sales.png)

---

## Snowflake vs Databricks Deployment Benchmark

`Snowflake` `Databricks` `Benchmarking` `Cost Analysis`

**Prompt:**

> The NovaMart e-commerce analytics platform in the current directory is ready for deployment. Deploy to both Snowflake and Databricks, testing multiple warehouse sizes on each platform (Snowflake: X-Small, Small, Medium; Databricks: 2X-Small, Small, Medium SQL Warehouses) to find the optimal price-performance configuration. Run the full data pipeline and benchmark queries (CLV calculation, daily incremental, executive dashboard) on each warehouse size, capturing execution time, credits/DBUs consumed, and bytes scanned. Generate a cost analysis document with a recommendation matrix showing cost-per-run for each platform/size combination, and recommend the single best platform + warehouse size for production based on cost efficiency and performance.

![Snowflake vs Databricks Benchmark](../assets/images/dbrx_snowflake_benchmark.png)
Loading