Skip to content

feat(dbt): extract dbt Jinja lineage and macros from raw .sql models#584

Open
alexisperinger-ux wants to merge 7 commits into
DeusData:mainfrom
alexisperinger-ux:feat/dbt-jinja-extraction
Open

feat(dbt): extract dbt Jinja lineage and macros from raw .sql models#584
alexisperinger-ux wants to merge 7 commits into
DeusData:mainfrom
alexisperinger-ux:feat/dbt-jinja-extraction

Conversation

@alexisperinger-ux

@alexisperinger-ux alexisperinger-ux commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Summary

dbt .sql models are Jinja-templated, so without a compiled manifest a model and its dependencies are invisible to the graph: a referenced model like stg_users is not even a node. This indexes raw (uncompiled) dbt models:

  • {{ ref('m') }} / {{ source('s','t') }} become USAGE lineage edges, via the vendored tree-sitter-jinja2 grammar.
  • A dbt model (a .sql file with no macro defs) becomes a Model node keyed by file stem, so cross-file {{ ref('that_model') }} resolves into model-to-model lineage.
  • {% macro name(...) %} becomes a Macro node.

Model is emitted only on the .sql path, so a plain .jinja / .j2 template is not treated as a model, and a macro-defining file is treated as a library, not a model.

No schema change: freeform Model / Macro labels and the existing USAGE edge type.

Index mode: macro extraction is gated to full mode, so the new nodes need a full-mode index.

Tests: dbt_jinja_macro_defs, dbt_jinja_ref_lineage, dbt_sql_ref_lineage in tests/test_extraction.c.

Complements the authoritative manifest path (#583).

Fixes #575.

Related PRs

This is one of three PRs that split the SQL + dbt graph-indexing work to keep each under the one-issue-per-PR contributing rule. They share the same extraction and registry plumbing, so they are one logical change reviewed as a set:

  • #582: SQL DDL, first-class Table / View nodes + FROM/JOIN lineage (#574).
  • This PR: dbt Jinja from raw .sql, Model / Macro nodes + ref() / source() lineage (#575).
  • #583: dbt manifest ingestion, Model / Source nodes + DEPENDS_ON lineage (#576).

CREATE TABLE/VIEW/MATERIALIZED VIEW now extract as Table/View defs (were
generic Variable nodes) and CREATE PROCEDURE as Function. FROM/JOIN
relations are emitted as usages scoped to the enclosing CREATE def, so
pass_usages resolves them into view->table USAGE lineage edges. The
definition-registry allowlist gains Table/View so those defs resolve as
edge targets (kept in sync across pass_definitions and pass_parallel).
Adds extraction tests for the new labels and the lineage usages.

Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
(cherry picked from commit 63054f04465ec33ffde0b9a23d6f29ce817d96df)
Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
CREATE TABLE/VIEW now produce Table/View nodes (63054f0) instead of the
old Variable mislabel; update the label golden and the probe accordingly.

Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
(cherry picked from commit 7cca3f4faec7d2a12ac1cc6f4e432c8bc286cd94)
Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
resolve_sql_func_name took the first identifier of object_reference (the
schema) for schema.table names; take the last (the table) so CREATE
TABLE/VIEW nodes and FROM/JOIN lineage use the real relation name. Adds
a schema-qualified regression test.

Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
(cherry picked from commit 877ad51e8c14daf901656d918ced2ef636f7a5b1)
Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
(cherry picked from commit 8ffee3834223ce58aa6c26b83c88f2902e07180e)
Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
(cherry picked from commit 23f81ed76a05c813a68a67bc5b1dc47344e160f0)
Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
…ates

dbt models are .sql (the SQL host path); a plain .jinja/.j2 template is
not a dbt model. Emit Model only on the SQL path; the JINJA2 path keeps
macro and ref/source extraction.

Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
(cherry picked from commit 582ad0c0293be148a97e1e52d0043ad6c4fe0e7d)
Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
Signed-off-by: alexisperinger-ux <alexis.peringer@iss-stoxx.com>
@alexisperinger-ux alexisperinger-ux force-pushed the feat/dbt-jinja-extraction branch from 4ddb62a to 56a5a32 Compare June 24, 2026 07:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extract dbt lineage and macros from raw .sql models (no compiled manifest)

1 participant