DuckDB extension for querying Power BI Semantic Models with DAX.
Experimental: This extension is under active development. APIs, behavior, and performance characteristics may change between releases.
- Query semantic models with
dax_query(connection_string, dax_text, ...) - Discover model metadata with
pbi_tables,pbi_columns,pbi_measures, andpbi_relationships - Multiple auth paths:
azure_cli,access_token, andservice_principal - DuckDB secret integration via
TYPE azuresecrets (credential_chain/service_principal) - Power BI locator support (
powerbi://...) plus direct XMLA fast path (https://.../xmla?...) - Local metadata cache for resolved targets/schemas (no token/secret persistence)
- DuckDB-Wasm build support for
wasm_mvpandwasm_eh(see docs/wasm.md)
- DuckDB extension toolchain requirements (CMake, C++ build toolchain, OpenSSL)
- Git with submodule support
- Azure CLI (
az) installed
For browser DuckDB-Wasm usage, use auth_mode := 'access_token' and pass a
short-lived token from the host application. Native auth paths such as
azure_cli, browser-side service_principal, and Azure credential-chain
secrets are intentionally unsupported in browser WASM. See
docs/wasm.md for build, testing, CORS/proxy, and browser
validation details.
git clone --recurse-submodules https://github.com/<your-org>/pbi_scanner.git
cd pbi_scanner
make
make testIf you already cloned without submodules:
git submodule update --init --recursive./build/release/duckdbThe bundled shell already has pbi_scanner linked in.
In the examples below, we use Azure CLI login to authenticate.
az loginThen use one auth option below.
INSTALL pbi_scanner FROM community;
LOAD pbi_scanner;
SET pbi_scanner_auth_mode = 'azure_cli';
SELECT *
FROM dax_query(
'Data Source=powerbi://api.powerbi.com/v1.0/myorg/Example%20Workspace;Initial Catalog=example_semantic_model;',
'EVALUATE ROW("probe_ok", 1)'
);If you already use another DuckDB Azure secret provider, you can use that secret instead of the credential_chain example shown here.
INSTALL azure;
LOAD azure;
CREATE SECRET pbi_cli (
TYPE azure,
PROVIDER credential_chain,
CHAIN 'cli'
);
INSTALL pbi_scanner FROM community;
LOAD pbi_scanner;
SELECT *
FROM dax_query(
'Data Source=powerbi://api.powerbi.com/v1.0/myorg/Example%20Workspace;Initial Catalog=example_semantic_model;',
'EVALUATE ROW("probe_ok", 1)',
secret_name := 'pbi_cli'
);Replace the workspace and semantic model placeholders with values you can access.
Data Source=powerbi://api.powerbi.com/v1.0/myorg/Example%20Workspace;Initial Catalog=example_semantic_model;Data Source=https://example.analysis.windows.net/xmla?vs=sobe_wowvirtualserver&db=<resolved-database-id>;Initial Catalog=sobe_wowvirtualserver-<resolved-database-id>;Use the direct XMLA form when you already have a resolved target and want to bypass public Power BI locator resolution.
dax_query(...) supports:
auth_mode := 'azure_cli'auth_mode := 'access_token'auth_mode := 'service_principal'
Set a session default when using direct auth modes:
SET pbi_scanner_auth_mode = 'azure_cli';When both are provided, per-call named auth_mode := ... overrides the session
SET value for that call.
By default, dax_query and the pbi_* metadata table functions return XMLA
column names unchanged (including DAX table qualifiers and square brackets).
Opt in to normalization when you want DuckDB-friendly names: DAX square
brackets are removed and table qualifiers are dropped (Fact[Amount] becomes
Amount, [Total Sales] becomes Total Sales). When the same column name
appears more than once, qualified names are disambiguated with a table prefix
(TableA[Amount] and TableB[Amount] become TableA_Amount and
TableB_Amount); any remaining collisions get numeric suffixes (Amount_2,
TableA_Amount_2, etc.). DuckDB quotes identifiers with spaces when you
reference them in SQL.
Enable normalization for the session:
SET normalize_dax_column_names = true;Per-call named parameter overrides the session default for that query:
SELECT *
FROM dax_query(
'Data Source=powerbi://api.powerbi.com/v1.0/myorg/Example%20Workspace;Initial Catalog=example_semantic_model;',
'EVALUATE ''Fact Allocation''',
normalize_dax_column_names := true
);dax_query uses a limited schema probe (TOPN) during bind to discover column
types without scanning the full result. Control the row limit with a named
parameter or a process-wide environment variable:
SELECT *
FROM dax_query(
'Data Source=powerbi://api.powerbi.com/v1.0/myorg/Example%20Workspace;Initial Catalog=example_semantic_model;',
'EVALUATE ''Fact Allocation''',
schema_probe_rows := 50
);export PBI_SCANNER_SCHEMA_PROBE_ROWS=50When unset, the default is 100 rows. Set schema_probe_rows := 0 per query to
probe with the full DAX statement when a limited probe is not applicable.
Live check (requires workspace auth in .env or PBI_BENCH_*):
uv run --group bench verify_dax_column_names.pyQuick Start already shows the Azure CLI mode and the Azure secret mode end to end. Use this section as reference for secret variants and other auth modes.
Create a reusable DuckDB Azure secret:
INSTALL azure;
LOAD azure;
CREATE SECRET pbi_cli (
TYPE azure,
PROVIDER credential_chain,
CHAIN 'cli'
);Use either pattern with dax_query:
SELECT *
FROM dax_query(
'Data Source=powerbi://api.powerbi.com/v1.0/myorg/Example%20Workspace;Initial Catalog=example_semantic_model;Secret=pbi_cli;',
'EVALUATE ROW("probe_ok", 1)'
);Or by named argument:
SELECT *
FROM dax_query(
'Data Source=powerbi://api.powerbi.com/v1.0/myorg/Example%20Workspace;Initial Catalog=example_semantic_model;',
'EVALUATE ROW("probe_ok", 1)',
secret_name := 'pbi_cli'
);For automation or when you use a service principal registered in Microsoft Entra ID,
create a TYPE azure secret with PROVIDER service_principal. The extension reads
tenant_id, client_id, and client_secret from that DuckDB secret (the same fields
you set in SQL as TENANT_ID, CLIENT_ID, and CLIENT_SECRET) and acquires a Power BI
API token the same way as auth_mode := 'service_principal' with named parameters.
When you pass secret_name (or Secret= in the connection string), the token comes
from the DuckDB secret; session SET pbi_scanner_auth_mode does not override that
for that call.
INSTALL azure;
LOAD azure;
CREATE SECRET pbi_sp (
TYPE azure,
PROVIDER service_principal,
TENANT_ID '<tenant-guid>',
CLIENT_ID '<app-client-id>',
CLIENT_SECRET '<client-secret>'
);
INSTALL pbi_scanner FROM community;
LOAD pbi_scanner;
SELECT *
FROM dax_query(
'Data Source=powerbi://api.powerbi.com/v1.0/myorg/Example%20Workspace;Initial Catalog=example_semantic_model;',
'EVALUATE ROW("probe_ok", 1)',
secret_name := 'pbi_sp'
);For a local CLI smoke test that builds this CREATE SECRET from environment variables,
see query_semantic_model_sql_minimal.py (PBI_SQL_USE_AZURE_SECRET=1,
PBI_SQL_AZURE_PROVIDER=service_principal, and SP_TENANT_ID / SP_CLIENT_ID /
SP_CLIENT_SECRET or the AZURE_* / PBI_XMLA_* fallbacks).
Use these helper functions to inspect semantic model structure.
SELECT * FROM pbi_tables('Data Source=powerbi://api.powerbi.com/v1.0/myorg/Example%20Workspace;Initial Catalog=example_semantic_model;');
SELECT * FROM pbi_columns('Data Source=powerbi://api.powerbi.com/v1.0/myorg/Example%20Workspace;Initial Catalog=example_semantic_model;');
SELECT * FROM pbi_measures('Data Source=powerbi://api.powerbi.com/v1.0/myorg/Example%20Workspace;Initial Catalog=example_semantic_model;');
SELECT * FROM pbi_relationships('Data Source=powerbi://api.powerbi.com/v1.0/myorg/Example%20Workspace;Initial Catalog=example_semantic_model;');If using another DuckDB shell, load the built extension explicitly:
duckdb -unsigned -c "LOAD './build/release/extension/pbi_scanner/pbi_scanner.duckdb_extension';"Windows:
duckdb.exe -unsigned -c "LOAD './build/release/extension/pbi_scanner/pbi_scanner.duckdb_extension';"make # release build
make test # run testsOther build variants:
make debug
make reldebug
make relassertFocused offline tests (sqllogictest + Catch unit tests):
make test-pbi-offline
# or individually:
./build/release/test/unittest "test/sql/pbi_scanner.test"
./build/release/pbi_scanner_unit_testsDuckDB-Wasm validation (requires Emscripten 3.1.64; see docs/wasm.md):
make wasm_eh
uv run test/wasm/run_pbi_wasm_smoke.py --build
uv run test/wasm/run_pbi_wasm_sqllogictest.py --build
# or, after building wasm_eh:
make test-pbi-wasmBrowser smoke proves load, mock XMLA HTTP, and auth-header propagation. Browser
sqllogictest runs wasm_sql/pbi_scanner_wasm.test for offline WASM-specific SQL
assertions (kept outside test/ so native unittest does not pick it up).
Architecture notes: docs/wasm_testing_plan.md.
make format-check
make tidy-checkApply formatting fixes:
make format-fixRun helper scripts with uv:
uv run bench_native_http.py --smoke
uv run --group bench query_semantic_model_minimal.pyIf you use Cursor worktrees (/worktree or --worktree), this repo includes
.cursor/worktrees.json to bootstrap each new worktree automatically:
- Copy
.envfrom the root worktree when present (only if the new worktree does not already have one) - Run
git submodule sync --recursiveandgit submodule update --init --recursive - Fail fast if
extension-ci-toolsdid not populate (so submodule problems are not silent)
Bootstrapping submodules does not run make; build artifacts such as
./build/release/test/unittest appear only after you build in that worktree
(for example make / make test).
If setup did not run (or you need to repair a worktree manually), from the worktree root you can run:
bash .cursor/setup-worktree-unix.shxcode-select --install
brew install cmake openssl
OPENSSL_ROOT_DIR="$(brew --prefix openssl@3)" make
make testsudo apt update
sudo apt install -y git build-essential cmake pkg-config libssl-dev
make
make testFedora/RHEL equivalent:
sudo dnf install -y git gcc-c++ make cmake pkg-config openssl-develRecommended: WSL2 + Linux steps.
For native Windows builds, bootstrap once, then use the repo wrapper:
.\scripts\dev-win.ps1 bootstrap
.\scripts\dev-win.ps1 build
.\scripts\dev-win.ps1 test -R test/sql/pbi_scanner.testCommand wrapper behavior:
bootstrapinstalls missing prerequisites withwinget(VS Build Tools + C++ workload, Git, CMake), then clones/bootstraps vcpkg intolocal/vcpkg(or%VCPKG_ROOT%when set).- Uses
VsDevCmd.batfrom VS 2022 Build Tools first, then VS 2019 Build Tools. - Uses
cmake.exeonPATH, else Visual Studio CMake fallback paths. - Configures with repo-safe defaults and vcpkg toolchain integration:
-DCMAKE_IGNORE_PATH=C:/msys64-DCMAKE_TOOLCHAIN_FILE=%VCPKG_ROOT%/scripts/buildsystems/vcpkg.cmake-DVCPKG_TARGET_TRIPLET=x64-windows-static-releaseby default (override with%VCPKG_TARGET_TRIPLET%)
- Builds with serialized MSBuild (
-- /m:1) to reduce Windows file-lock failures.
Optional launcher:
scripts\dev-win.cmd build
scripts\dev-win.cmd test -R test/sql/pbi_scanner.testFor live local benchmarking on native Windows, prefer the repo-local Python environment:
uv run --group bench python -u query_semantic_model_minimal.pyOn Windows, this path uses Azure CLI access-token auth directly to avoid a known crash observed with DuckDB Azure secret credential-chain resolution inside the Python process. The bundled CLI fallback remains available with:
$env:PBI_BENCH_USE_BUNDLED_CLI='1'
uv run --group bench python -u query_semantic_model_minimal.pyIf .\scripts\dev-win.ps1 configure reports a missing vcpkg toolchain file,
run .\scripts\dev-win.ps1 bootstrap or set %VCPKG_ROOT% to an existing
vcpkg checkout.
- macOS:
OPENSSL_ROOT_DIR="$(brew --prefix openssl@3)" - Linux: install
libssl-devoropenssl-devel - Windows: run
.\scripts\dev-win.ps1 bootstrapto install/bootstrap vcpkg
git submodule update --init --recursiveaz logout
az login --scope "https://analysis.windows.net/powerbi/api/.default"Use the bundled shell (./build/release/duckdb) or load local extension with DuckDB -unsigned.
- Live authenticated workloads are intentionally excluded from CI tests
- Power BI/Azure availability and network quality directly impact query latency
- Extension binaries should be rebuilt for each DuckDB version you target
- The project is experimental and APIs may evolve
This project is licensed under the MIT License. See LICENSE.