Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 121 additions & 0 deletions .opencode/skills/dbt-analyze/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
---
name: dbt-analyze
description: Analyze downstream impact of dbt model changes using column-level lineage and the dependency graph. Use when evaluating the blast radius of a change before shipping. Powered by altimate-dbt.
---

# dbt Impact Analysis

## Requirements
**Agent:** any (read-only analysis)
**Tools used:** bash (runs `altimate-dbt` commands), read, glob, dbt_manifest, lineage_check, sql_analyze

## When to Use This Skill

**Use when the user wants to:**
- Understand what breaks if they change a model
- Evaluate downstream impact before shipping
- Find all consumers of a model or column
- Assess risk of a refactoring

**Do NOT use for:**
- Creating or fixing models → use `dbt-develop` or `dbt-troubleshoot`
- Adding tests → use `dbt-test`

## Workflow

### 1. Identify the Changed Model

Accept from the user, or auto-detect:
```bash
# From git diff
git diff --name-only | grep '\.sql$'

# Or user provides a model name
altimate-dbt compile --model <name> # verify it exists
```

### 2. Map the Dependency Graph

```bash
altimate-dbt children --model <name> # direct downstream
altimate-dbt parents --model <name> # what feeds it
```

For the full downstream tree, recursively call `children` on each downstream model.

### 3. Run Column-Level Lineage

Use the `lineage_check` tool on the changed model's SQL to understand:
- Which source columns flow to which output columns
- Which columns were added, removed, or renamed

### 4. Cross-Reference with Downstream

For each downstream model:
1. Read its SQL
2. Check if it references any changed/removed columns
3. Classify impact:

| Classification | Meaning | Action |
|---------------|---------|--------|
| **BREAKING** | Removed/renamed column used downstream | Must fix before shipping |
| **SAFE** | Added column, no downstream reference | Ship freely |
| **UNKNOWN** | Can't determine (dynamic SQL, macros) | Manual review needed |

### 5. Generate Impact Report

```
Impact Analysis: stg_orders
════════════════════════════

Changed Model: stg_orders (materialized: view)
Columns: 5 → 6 (+1 added)
Removed: total_amount (renamed to order_total)

Downstream Impact (3 models):

Depth 1:
[BREAKING] int_order_metrics
Uses: total_amount → COLUMN RENAMED
Fix: Update column reference to order_total

[SAFE] int_order_summary
No references to changed columns

Depth 2:
[BREAKING] mart_revenue
Uses: total_amount via int_order_metrics → CASCADING
Fix: Verify after fixing int_order_metrics

Tests at Risk: 4
- not_null_stg_orders_order_total
- unique_int_order_metrics_order_id

Summary: 2 BREAKING, 1 SAFE
Recommended: Fix int_order_metrics first, then:
altimate-dbt build --model stg_orders --downstream
```

## Without Manifest (SQL-Only Mode)

If no manifest is available:
1. Run `lineage_check` on the changed SQL
2. Show column-level data flow
3. Note: downstream impact requires a manifest
4. Suggest: `altimate-dbt build-project` to generate one

## Common Mistakes

| Mistake | Fix |
|---------|-----|
| Only checking direct children | Always trace the FULL downstream tree recursively |
| Ignoring test impacts | Check which tests reference changed columns |
| Shipping without building downstream | Always `altimate-dbt build --model <name> --downstream` |
| Not considering renamed columns | A rename is a break + add — downstream still references the old name |

## Reference Guides

| Guide | Use When |
|-------|----------|
| [references/altimate-dbt-commands.md](references/altimate-dbt-commands.md) | Need the full CLI reference |
| [references/lineage-interpretation.md](references/lineage-interpretation.md) | Understanding lineage output |
66 changes: 66 additions & 0 deletions .opencode/skills/dbt-analyze/references/altimate-dbt-commands.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# altimate-dbt Command Reference

All dbt operations use the `altimate-dbt` CLI. Output is JSON to stdout; logs go to stderr.

```bash
altimate-dbt <command> [args...]
altimate-dbt <command> [args...] --format text # Human-readable output
```

## First-Time Setup

```bash
altimate-dbt init # Auto-detect project root
altimate-dbt init --project-root /path # Explicit root
altimate-dbt init --python-path /path # Override Python
altimate-dbt doctor # Verify setup
altimate-dbt info # Project name, adapter, root
```

## Build & Run

```bash
altimate-dbt build --model <name> [--downstream] # compile + run + test
altimate-dbt run --model <name> [--downstream] # materialize only
altimate-dbt test --model <name> # run tests only
altimate-dbt build-project # full project build
```

## Compile

```bash
altimate-dbt compile --model <name>
altimate-dbt compile-query --query "SELECT * FROM {{ ref('stg_orders') }}" [--model <context>]
```

## Execute SQL

```bash
altimate-dbt execute --query "SELECT count(*) FROM {{ ref('orders') }}" --limit 100
```

## Schema & DAG

```bash
altimate-dbt columns --model <name> # column names and types
altimate-dbt columns-source --source <src> --table <tbl> # source table columns
altimate-dbt column-values --model <name> --column <col> # sample values
altimate-dbt children --model <name> # downstream models
altimate-dbt parents --model <name> # upstream models
```

## Packages

```bash
altimate-dbt deps # install packages.yml
altimate-dbt add-packages --packages dbt-utils,dbt-expectations
```

## Error Handling

All errors return JSON with `error` and `fix` fields:
```json
{ "error": "dbt-core is not installed", "fix": "Install it: python3 -m pip install dbt-core" }
```

Run `altimate-dbt doctor` as the first diagnostic step for any failure.
58 changes: 58 additions & 0 deletions .opencode/skills/dbt-analyze/references/lineage-interpretation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Lineage Interpretation Guide

## Understanding Column-Level Lineage

Column-level lineage traces how data flows from source columns through transformations to output columns.

### Direct Lineage
```
source.customers.name → stg_customers.customer_name → dim_customers.full_name
```
Column was renamed at each step. A change to the source column affects all downstream.

### Aggregation Lineage
```
source.orders.amount → (SUM) → fct_daily_revenue.total_revenue
```
Multiple source rows feed into one output value. The column type changes from row-level to aggregate.

### Conditional Lineage
```
source.orders.status → (CASE WHEN) → fct_orders.is_completed
```
The source column feeds a derived boolean. The relationship is logical, not direct.

## Impact Classification

### BREAKING Changes
- **Column removed**: Downstream models referencing it will fail
- **Column renamed**: Same as removed — downstream still uses the old name
- **Type changed**: May cause cast errors or silent data loss downstream
- **Logic changed**: Downstream aggregations/filters may produce wrong results

### SAFE Changes
- **Column added**: No downstream model can reference what didn't exist
- **Description changed**: No runtime impact
- **Test added/modified**: No impact on model data

### REQUIRES REVIEW
- **Filter changed**: May change which rows appear → downstream counts change
- **JOIN type changed**: LEFT→INNER drops rows, INNER→LEFT adds NULLs
- **Materialization changed**: view→table has no logical impact but affects freshness

## Reading the DAG

```bash
altimate-dbt parents --model <name> # what this model depends on
altimate-dbt children --model <name> # what depends on this model
```

A model with many children has high blast radius. A model with many parents has high complexity.

## Depth Matters

- **Depth 1**: Direct consumers — highest risk, most likely to break
- **Depth 2+**: Cascading impact — will break IF depth 1 breaks
- **Depth 3+**: Usually only affected by breaking column removals/renames

Focus investigation on depth 1 first.
147 changes: 147 additions & 0 deletions .opencode/skills/dbt-develop/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
---
name: dbt-develop
description: Create and modify dbt models — staging, intermediate, marts, incremental, medallion architecture. Use when building new SQL models, extending existing ones, scaffolding YAML configs, or reorganizing project structure. Powered by altimate-dbt.
---

# dbt Model Development

## Requirements
**Agent:** builder or migrator (requires file write access)
**Tools used:** bash (runs `altimate-dbt` commands), read, glob, write, edit

## When to Use This Skill

**Use when the user wants to:**
- Create a new dbt model (staging, intermediate, mart, OBT)
- Add or modify SQL logic in an existing model
- Generate sources.yml or schema.yml from warehouse metadata
- Reorganize models into layers (staging/intermediate/mart or bronze/silver/gold)
- Convert a model to incremental materialization
- Scaffold a new dbt project structure

**Do NOT use for:**
- Adding tests to models → use `dbt-test`
- Writing model/column descriptions → use `dbt-docs`
- Debugging build failures → use `dbt-troubleshoot`
- Analyzing change impact → use `dbt-analyze`

## Core Workflow: Plan → Discover → Write → Validate

### 1. Plan — Understand Before Writing

Before writing any SQL:
- Read the task requirements carefully
- Identify which layer this model belongs to (staging, intermediate, mart)
- Check existing models for naming conventions and patterns
- **Check dependencies:** If `packages.yml` exists, check for `dbt_packages/` or `package-lock.yml`. Only run `dbt deps` if packages are declared but not yet installed.

```bash
altimate-dbt info # project name, adapter type
altimate-dbt parents --model <upstream> # understand what feeds this model
altimate-dbt children --model <downstream> # understand what consumes it
```

### 2. Discover — Understand the Data Before Writing

**Never write SQL without deeply understanding your data first.** The #1 cause of wrong results is writing SQL blind — assuming grain, relationships, column names, or values without checking.

**Step 2a: Read all documentation and schema definitions**
- Read `sources.yml`, `schema.yml`, and any YAML files that describe the source/parent models
- These contain column descriptions, data types, tests, and business context
- Pay special attention to: primary keys, unique constraints, relationships between tables, and what each column represents

**Step 2b: Understand the grain of each parent model/source**
- What does one row represent? (one customer? one event? one day per customer?)
- What are the primary/unique keys?
- This is critical for JOINs — joining on the wrong grain causes fan-out (too many rows) or missing rows

```bash
altimate-dbt columns --model <name> # existing model columns
altimate-dbt columns-source --source <src> --table <tbl> # source table columns
altimate-dbt execute --query "SELECT count(*) FROM {{ ref('model') }}" --limit 1
altimate-dbt execute --query "SELECT * FROM {{ ref('model') }}" --limit 5
altimate-dbt column-values --model <name> --column <col> # sample values for key columns
```

**Step 2c: Query the actual data to verify your understanding**
- Check row counts, NULLs, date ranges, cardinality of key columns
- Verify foreign key relationships actually hold (do all IDs in child exist in parent?)
- Check for duplicates in what you think are unique keys

**Step 2d: Read existing models that your new model will reference**
- Read the actual SQL of parent models — understand their logic, filters, and transformations
- Read 2-3 existing models in the same directory to match patterns and conventions

```bash
glob models/**/*.sql # find all model files
read <model_file> # understand existing patterns and logic
```

### 3. Write — Follow Layer Patterns

See [references/layer-patterns.md](references/layer-patterns.md) for staging/intermediate/mart templates.
See [references/medallion-architecture.md](references/medallion-architecture.md) for bronze/silver/gold patterns.
See [references/incremental-strategies.md](references/incremental-strategies.md) for incremental materialization.
See [references/yaml-generation.md](references/yaml-generation.md) for sources.yml and schema.yml.

### 4. Validate — Build, Verify, Check Impact

Never stop at writing the SQL. Always validate:

**Build it:**
```bash
altimate-dbt compile --model <name> # catch Jinja errors
altimate-dbt build --model <name> # materialize + run tests
```

**Verify the output:**
```bash
altimate-dbt columns --model <name> # confirm expected columns exist
altimate-dbt execute --query "SELECT count(*) FROM {{ ref('<name>') }}" --limit 1
altimate-dbt execute --query "SELECT * FROM {{ ref('<name>') }}" --limit 10 # spot-check values
```
- Do the columns match what schema.yml or the task expects?
- Does the row count make sense? (no fan-out from bad joins, no missing rows from wrong filters)
- Are values correct? (spot-check NULLs, aggregations, date ranges)

**Check SQL quality** (on the compiled SQL from `altimate-dbt compile`):
- `sql_analyze` — catches anti-patterns (SELECT *, cartesian products, missing filters)
- `altimate_core_validate` — validates syntax and schema references
- `altimate_core_column_lineage` — traces how source columns flow to output columns. Use this to verify your SELECT is pulling the right columns from the right sources, especially for complex JOINs or multi-CTE models.

**Check downstream impact** (when modifying an existing model):
```bash
altimate-dbt children --model <name> # who depends on this?
altimate-dbt build --model <name> --downstream # rebuild downstream to catch breakage
```
Use `altimate-dbt children` and `altimate-dbt parents` to verify the DAG is intact when changes could affect downstream models.

## Iron Rules

1. **Never write SQL without reading the source columns first.** Use `altimate-dbt columns` or `altimate-dbt columns-source`.
2. **Never stop at compile.** Always `altimate-dbt build` to catch runtime errors.
3. **Match existing patterns.** Read 2-3 existing models in the same directory before writing.
4. **One model, one purpose.** A staging model should not contain business logic. An intermediate model should not be materialized as a table unless it has consumers.
5. **Fix ALL errors, not just yours.** After creating/modifying models, run a full `dbt build`. If ANY model fails — even pre-existing ones you didn't touch — fix them. Your job is to leave the project in a fully working state.

## Common Mistakes

| Mistake | Fix |
|---------|-----|
| Writing SQL without checking column names | Run `altimate-dbt columns` or `altimate-dbt columns-source` first |
| Stopping at `compile` — "it compiled, ship it" | Always `altimate-dbt build` to materialize and run tests |
| Hardcoding table references instead of `{{ ref() }}` | Always use `{{ ref('model') }}` or `{{ source('src', 'table') }}` |
| Creating a staging model with JOINs | Staging = 1:1 with source. JOINs belong in intermediate or mart |
| Not checking existing naming conventions | Read existing models in the same directory first |
| Using `SELECT *` in final models | Explicitly list columns for clarity and contract stability |

## Reference Guides

| Guide | Use When |
|-------|----------|
| [references/altimate-dbt-commands.md](references/altimate-dbt-commands.md) | Need the full CLI reference |
| [references/layer-patterns.md](references/layer-patterns.md) | Creating staging, intermediate, or mart models |
| [references/medallion-architecture.md](references/medallion-architecture.md) | Organizing into bronze/silver/gold layers |
| [references/incremental-strategies.md](references/incremental-strategies.md) | Converting to incremental materialization |
| [references/yaml-generation.md](references/yaml-generation.md) | Generating sources.yml or schema.yml |
| [references/common-mistakes.md](references/common-mistakes.md) | Extended anti-patterns catalog |
Loading
Loading