This project is an example implementation of a Databricks Asset Bundle using a Databricks Free Edition workspace.
Included is a Python (PySpark/Delta) project, a dbt project and Databricks Workflows using these resources. Additionally CI/CD workflows (github) are included to test and deploy the Asset Bundle to different targets.
The project is configured using pyproject.toml (Python specifics) and databricks.yaml (Databricks Bundle specifics) and uses uv to manage the Python project and dependencies.
| Directory | Description |
|---|---|
.github/workflows |
CI/CD jobs to test and deploy bundle |
src/dab_project |
Python project (Used in Databricks Workflow as Python-Wheel-Task) |
dbt |
dbt project * Used in Databricks Workflow as dbt-Task * dbt-Models used from https://github.com/dbt-labs/jaffle_shop_duckdb |
resources |
Resources such as Databricks Workflows or Databricks Volumes/Schemas * Python-based workflow: https://docs.databricks.com/aws/en/dev-tools/bundles/python * YAML-based Workflow: https://docs.databricks.com/aws/en/dev-tools/bundles/resources#job |
scripts |
Python script to setup groups, service principals and catalogs used in a Databricks (Free Edition) workspace |
tests |
Unit-tests running on Databricks (via Connect) or locally * Used in ci.yml jobs |
For this example we use a Databricks Free Edition workspace https://www.databricks.com/learn/free-edition with all resources and identities managed in the Workspace (no external connections or Cloud Identity Management).
This Databricks Asset Bundle expects pre-existing Catalogs, Groups and Service Principals to showcase providing permissions on resources such as catalogs or workflows.
A script exists to set up the Workspace (Free Edition) as described in the Setup Databricks Workspace section.
- Serverless environment: Version 4 which is similar to Databricks Runtime ~17.*
- Catalogs:
lake_dev,lake_testandlake_prod - Service principals (for CI/CD and Workflow runners)
sp_etl_dev(for dev and test) andsp_etl_prod(for prod)- Make sure the User used to deploy Workflows has
Service principal: Useron the used service principals - For CI/CD workflows we generated the Databricks secrets
DATABRICKS_CLIENT_IDandDATABRICKS_CLIENT_SECRET
- Groups
group_etlgroup withALL PRIVILEGESandgroup_readerwith limited permissions on catalogs- These are mostly to test applying grants using Asset Bundle resources
- uv: https://docs.astral.sh/uv/getting-started/installation/
uvwill default to Python version specified in .python-version
- Databricks CLI: https://docs.databricks.com/aws/en/dev-tools/cli/install
- ">=0.270.0" due to 'databricks bundle plan' command
Sync uv environment with dev (includes databricks-connect) dependencies:
uv sync --locked --group devNote: For local Spark use
uv sync --locked --group dev-sparkinstead.
Bash:
source .venv/bin/activateWindows:
.venv\Scripts\activateThe dev dependency group includes databricks-connect for remote Spark execution. This requires authentication being set up via Databricks CLI.
See https://docs.databricks.com/aws/en/dev-tools/vscode-ext/ for using Databricks Connect extension in VS Code.
uv run pytest -vBased on whether Databricks Connect or local Spark is installed, the Unit-Tests use a Databricks Cluster or start a local Spark session with Delta support.
- On Databricks the unit-tests currently assume the catalog
lake_devexists.
# Linting
uv run ruff check --fix
# Formatting
uv run ruff formatThe following script sets up a Databricks (Free Edition) Workspace for this project with additional catalogs, groups and service principals. It uses both Databricks-SDK and Databricks Connect (Serverless).
# Authenticate to your Databricks workspace, if you have not done so already:
# databricks configure
uv run ./scripts/setup_workspace.py-
Authenticate to your Databricks workspace, if you have not done so already:
$ databricks configure -
To deploy a development copy of this project, type:
$ databricks bundle deploy --target dev -
Similarly, to deploy a production copy, type:
$ databricks bundle deploy --target prod -
Deploy with custom variables
$ databricks bundle deploy --target dev --var "catalog_name=workspace"
-
Service Principals
For this example, the targets
testandproduse a group and service principals.The group
group_etlcan manage the workflow, ideally your user and the service principal are part of it. This group should also have sufficient permissions on the used Catalogs.Make sure the User used to deploy has
Service principal: Userpermissions.Service principal: Manageris not enough. -
dbt project
The
dbtproject is based on https://github.com/dbt-labs/jaffle_shop_duckdb with following changes:- Schema bronze, silver, gold
- documented materialization
use_materialization_v2 - Primary, Foreign Key Constraints
- Streaming example
- Logging
- Logging to volume