Add BigQuery as a reconcile source by take60 · Pull Request #2527 · databrickslabs/lakebridge

take60 · 2026-06-24T06:22:25Z

What

Adds BigQuery as a reconcile source, at parity with the other sources (schema / row / data / all / aggregate). BigQuery was already supported by the transpiler and profiler; this closes the gap for reconcile.

How it works

BigQuery uses the same Lakehouse Federation remote_query path as Snowflake/Oracle/etc. — no new dependencies. BigQueryDataSource uses backtick-quoted 3-part project.dataset.table names and reads metadata from INFORMATION_SCHEMA.COLUMNS. sqlglot already ships a BigQuery dialect, so query generation and schema comparison are reused unchanged.

Changes

Connector reconcile/connectors/bigquery.py (BigQueryDataSource); registered in ReconSourceType and source_adapter.create_adapter; install prompt + recon_capture display name.
Schema type handling: the connector's INFORMATION_SCHEMA query canonicalizes the few BigQuery types sqlglot can't bridge to Databricks (BIGNUMERIC→string, bare NUMERIC→decimal(38,9), TIME→string, JSON→variant, RANGE<T>→struct<…>); everything else is left to sqlglot.
Row hashing: adds a BigQuery hash algorithm (TO_HEX(SHA256()), matching Databricks sha2(...,256)) and a scale-aware decimal transform (FORMAT('%.<scale>f', col)) so BigQuery's trailing-zero-stripped numeric strings match Spark's scale-padded DECIMAL strings.
Docs + tests: supported-sources row, a BigQuery config tab (with a compute note), connector tests, a type-coverage guardrail test, and an adapter test.

Testing

make fmt / make lint (pylint 10.0/10.0) and make test — green (1319 passed).
End-to-end on a real workspace: a BigQuery source (UC Federation connection) reconciled against an identical Databricks copy via the deployed reconcile job — schema, row, and data all matched (0 mismatches / 0 missing).

Notes / limitations

Compute: BigQuery reads use remote_query, which requires Databricks Runtime 17.3+ or serverless compute (the reconcile job's default cluster may run an older runtime — point it at a DBR 17.3+ cluster via job_overrides.existing_cluster_id, or run serverless). This applies to all Lakehouse Federation reconcile sources. Documented in the config tab.
INTERVAL maps to two Databricks columns, which the 1:1 schema comparison can't represent — surfaces as a visible mismatch (documented).

codecov · 2026-06-24T06:26:28Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.37%. Comparing base (7ebd945) to head (ac18df1).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2527      +/-   ##
==========================================
+ Coverage   69.10%   69.37%   +0.26%     
==========================================
  Files         105      106       +1     
  Lines        9482     9565      +83     
  Branches     1050     1056       +6     
==========================================
+ Hits         6553     6636      +83     
  Misses       2735     2735              
  Partials      194      194

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2026-06-24T06:33:24Z

✅ 173/173 passed, 4 flaky, 2 skipped, 1h23m4s total

Flaky tests:

🤪 test_installs_and_runs_local_bladebridge (12.588s)
🤪 test_transpiles_informatica_to_sparksql (25.649s)
🤪 test_transpile_teradata_sql (27.105s)
🤪 test_transpile_teradata_sql_non_interactive[False] (6.12s)

_{Running from acceptance #4927}

Adds BigQuery as a Lakehouse Federation reconcile source (schema/row/data/all/aggregate), reusing the existing remote_query path like the other federation connectors. - New BigQueryDataSource: remote_query reads, backtick 3-part `project.dataset.table` names, INFORMATION_SCHEMA schema query with scale/precision canonicalization - Register BIGQUERY in ReconSourceType and source_adapter; install prompts + result display name - Row hashing for BigQuery: TO_HEX(SHA256()) (matches Databricks sha2) and scale-aware decimal FORMAT so cross-engine hashes match Databricks DECIMAL string output - Docs (supported sources + config tab incl. DBR 17.3+/serverless compute note) and unit tests incl. a type-coverage guardrail

m-abulazm

added UC connection with name bigquery_sandbox for the e2e test

…ion tests - bigquery.py: reference tables and INFORMATION_SCHEMA two-part (dataset.table); the project is abstracted by the UC connection, matching the other federated connectors. list_schemas uses bare SCHEMATA via the connection's default project. - install: drop the BigQuery project prompt; catalog is empty for the bigquery dialect. - unit tests: update assertions from three-part to two-part naming. - integration: add bigquery e2e (report_type=schema) plus read_schema/list tests against the bigquery_sandbox UC connection.

…igquery

main removed profiler_dashboard from LakebridgeConfiguration (#2512); update the BigQuery reconcile install test to match.

catalog="" is dropped by blueprint serde and reloads as None, breaking the required str field (e2e SerdeError). BigQuery has no separate catalog, so mirror the dataset into catalog (non-empty, round-trips); the connector ignores it (two-part naming).

take60 · 2026-06-24T22:47:58Z

@m-abulazm thanks for setting up the connection.

All 4 BigQuery acceptance tests now fail on the same single cause — a connection grant, not
code:

PERMISSION_DENIED: User does not have USE CONNECTION on Connection 'bigquery_sandbox'
(the deployed-job path shows it masked as bigquery_TEST_CATALOG, but it's the same
connection.)

Two things from your side:

Grant USE CONNECTION on the BigQuery connection to the acceptance test principal
Heads-up for the next step: once the grant is in, remote_query will try to materialize
results into the read dataset (public) — so please also make sure that dataset is writable by
the connection's service account, otherwise we'll hit that right after. (Not failing on this
yet — execution stops at the grant first.)

Thanks

bishwajit-db · 2026-06-25T08:18:12Z

Materialization dataset: undocumented write requirement + no config knob

In production the BigQuery materialization target always defaults to the read dataset (_mat_dataset falls back to schema; create_adapter never sets materialization_dataset). Two gaps follow from that:

Docs: the source dataset must be writable by the connection's service account (remote_query materializes results there), but only the DBR 17.3+ requirement is documented. Suggest a line in the BigQuery config notes, e.g.: "ensure the source dataset, or a dedicated materialization dataset, is writable by the connection's service account."
Config: materialization_dataset is a constructor-only arg with no path from config (SourceConnectionConfig has no field, create_adapter never passes it), so it's only settable in test code. Adding it to SourceConnectionConfig and plumbing it through create_adapter would let users keep the source dataset read-only. It also fixes list_schemas, which passes _mat_dataset("") and resolves to an empty materializationDataset when none is configured.

The doc line unblocks the PR; the config change can be a follow-up.

…ataset Address review feedback: document that the source dataset (or a dedicated materialization dataset) must be writable by the connection's service account, since remote_query materializes results there. Also correct the catalog description for two-part naming — the project is taken from the UC connection; catalog mirrors the dataset.

…igquery

take60 · 2026-06-25T13:15:31Z

Thanks @bishwajit-db for the review!

(1) Docs — done here: added a "Writable dataset required" note (source or a dedicated materialization dataset must be writable by the connection's SA). Also fixed the naming text for two-part — project comes from the UC connection, schema is the dataset, catalog mirrors it.

(2) Config — agreed, taking as a follow-up: add materialization_dataset to SourceConnectionConfig and thread it through create_adapter (also fixes list_schemas' empty materializationDataset). Separate PR so this can land on the doc fix. #2529

@m-abulazm thanks for setting up UC connection and grant the permission. added testing and verified.

take60 force-pushed the feature/reconcile-bigquery branch from d73cf77 to 13141ee Compare June 24, 2026 06:50

take60 marked this pull request as ready for review June 24, 2026 07:26

take60 requested a review from a team as a code owner June 24, 2026 07:26

m-abulazm requested changes Jun 24, 2026

View reviewed changes

take60 added 4 commits June 24, 2026 22:02

Merge remote-tracking branch 'upstream/main' into feature/reconcile-b…

bbb15a7

…igquery

Drop profiler_dashboard from bigquery install test after merging main

3a40913

main removed profiler_dashboard from LakebridgeConfiguration (#2512); update the BigQuery reconcile install test to match.

take60 added 2 commits June 25, 2026 21:23

Merge remote-tracking branch 'upstream/main' into feature/reconcile-b…

ac18df1

…igquery

take60 mentioned this pull request Jun 25, 2026

BigQuery reconcile: make materialization_dataset configurable via SourceConnectionConfig #2529

Open

take60 enabled auto-merge June 25, 2026 13:16

take60 requested a review from m-abulazm June 25, 2026 13:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add BigQuery as a reconcile source#2527

Add BigQuery as a reconcile source#2527
take60 wants to merge 7 commits into
mainfrom
feature/reconcile-bigquery

take60 commented Jun 24, 2026

Uh oh!

codecov Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

m-abulazm left a comment

Uh oh!

take60 commented Jun 24, 2026

Uh oh!

bishwajit-db commented Jun 25, 2026

Uh oh!

take60 commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

take60 commented Jun 24, 2026

What

How it works

Changes

Testing

Notes / limitations

Uh oh!

codecov Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

m-abulazm left a comment

Choose a reason for hiding this comment

Uh oh!

take60 commented Jun 24, 2026

Uh oh!

bishwajit-db commented Jun 25, 2026

Uh oh!

take60 commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented Jun 24, 2026 •

edited

Loading

github-actions Bot commented Jun 24, 2026 •

edited

Loading