Skip to content

[WIP] Add databricks dbconnect init / sync commands#5690

Draft
rugpanov wants to merge 33 commits into
mainfrom
dbconnect-init-sync
Draft

[WIP] Add databricks dbconnect init / sync commands#5690
rugpanov wants to merge 33 commits into
mainfrom
dbconnect-init-sync

Conversation

@rugpanov

Copy link
Copy Markdown

Draft / do-not-merge: opened for early review; not ready to merge.

Changes

Adds a new databricks dbconnect command namespace with two subcommands:

  • databricks dbconnect init — create a fresh pyproject.toml and provision a matched .venv.
  • databricks dbconnect sync — merge managed dependencies into an existing pyproject.toml and re-provision.

From the selected Databricks compute target (serverless / cluster / job), the command derives and provisions a local Python environment matched to the runtime: the right Python version, the right databricks-connect pin, and dependency constraints so local resolution matches the Databricks runtime. It runs a phase pipeline: discover uv → resolve target → fetch the per-environment constraints (configurable base URL, with an offline cache) → plan → apply → ensure Python → uv sync → seed pip → validate.

Implementation notes:

  • Thin Cobra layer (cmd/dbconnect/) over a unit-testable pipeline (libs/dbconnect/), with a PackageManager interface seam (uv implemented; pip/conda can follow).
  • Surgical, formatting-preserving pyproject.toml merge that touches only three managed regions and preserves the user's comments, ordering, and their own [tool.uv] keys; idempotent.
  • Target resolution via the SDK (cluster GetByClusterId → DBR → envKey, serverless, and job compute) with three-state messaging.
  • Honors the corporate PyPI proxy by bridging ~/.config/pip/pip.conf index-urlUV_INDEX_URL (uv ignores pip.conf).
  • --check dry-run prints the plan + diff and changes nothing; --output json emits a stable structured schema, and --debug adds diagnostic logging for troubleshooting on machines we can't access.
  • No new third-party dependencies.

Why

Promotes a proven proof-of-concept shell script into a real CLI command so the VS Code extension (and users directly) can set up a local environment matched to their compute, instead of guessing Python and databricks-connect versions. Doing the version/constraint resolution from the compute target avoids local/remote drift.

Tests

  • Table-driven unit tests across libs/dbconnect/: merge edge cases (single/multi-line arrays, quote styles, CRLF, idempotency, preserving user [tool.uv] keys), envKey mapping + Python-version parsing, target resolution (precedence + three-state), constraint fetch with offline-cache fallback, and pipeline orchestration incl. --check gating and validation.
  • Acceptance cases under acceptance/dbconnect/: serverless --check, --output json shape, no-target error, cluster-unsupported, flag conflict, and JSON-mode error exit code.
  • Verified end-to-end against a real serverless-v4 target: provisions a Python 3.12 .venv with databricks-connect 17.x and the injected constraints.

Out of scope for this first cut: pip/conda package managers (interface only) and the nearest-supported envKey fallback.

This pull request and its description were written by Isaac.

rugpanov added 30 commits June 19, 2026 17:03
Brainstormed design for porting the dbconnect-init.sh demo into a real
CLI subcommand namespace with init + sync commands, a shared phase
pipeline, full target resolution, a surgical TOML merge, and a stable
--json schema.

Co-authored-by: Isaac
Bite-sized, TDD task breakdown (11 tasks) covering the command scaffold,
result types, envKey mapping, constraint fetch+cache, surgical TOML merge,
target resolution, uv package manager, the phase pipeline, Cobra wiring,
acceptance tests, and changelog.

Co-authored-by: Isaac
Regenerate the golden from the built binary; the prior hand-written
version showed the command Short text instead of the rendered Long help.

Co-authored-by: Isaac
- Remove noise doc comments from Error() and Unwrap() (idiomatic for standard interface methods)
- Replace thin NewError doc comment with meaningful info about fmt.Sprintf and nil handling
- Remove YAGNI default case from Mode.String(), use if/return instead

Co-authored-by: Isaac
- Replace double TrimPrefix calls with simpler strings.TrimPrefix(strings.ToLower(version), "v")
- Hoist pythonVersionRe to package-level var to avoid repeated compilation
- Remove noise comment that restated the code

Co-authored-by: Isaac
The PythonMinorFromRequires call happens after a successful network fetch,
so wrapping its error with ErrConstraintFetchFailed was a misattribution.
Use ErrValidationFailed instead, which correctly signals that the constraint
file content failed to parse rather than that the fetch itself failed.

Co-authored-by: Isaac
Co-authored-by: Isaac
- Add json tags to PipelineError (code/message/-) so --output json emits
  the documented contract instead of Go field names
- Change uv version probe from "version" subcommand to --version flag to
  avoid project-scoped failure when no pyproject.toml exists in cwd
- Guard renderResult against nil res: synthesize a minimal Result with
  error populated so JSON mode always emits a structured object
- Use i+1 for 1-based phase numbering in text output
- Add comment explaining why ValidateTargetFlags is kept alongside
  MarkFlagsMutuallyExclusive

Co-authored-by: Isaac
Add acceptance tests for the dbconnect init/sync feature:
- flag-conflict: verifies Cobra mutual exclusion of --cluster/--serverless/--job
- no-target: verifies error when no compute target is selected
- serverless-check: verifies --serverless v4 --check with stubbed constraint server
- serverless-json: verifies --output json with full Result struct
- cluster-unsupported: verifies constraint fetch failure for unsupported DBR version
- help/test.toml: opts out of bundle-engine matrix for the help case

Each case stubs the test server via [[Server]] in test.toml and uses
DATABRICKS_DBCONNECT_CONSTRAINT_SOURCE=$DATABRICKS_HOST to point the
constraint fetch at the local test server.

Co-authored-by: Grigory Panov
no-target and cluster-unsupported tests use commands that must fail;
musterr asserts this and fails the test if the command unexpectedly
succeeds. errcode is for tolerated failures only.

Co-authored-by: Isaac
Also standardize the serverless-json acceptance uv-version replacement
regex to the unwrapped form used by the sibling cases.

Co-authored-by: Isaac
…d cluster-unsupported scaffolding

Co-authored-by: Isaac
These are internal process artifacts and don't belong in the databricks/cli tree.

Co-authored-by: Isaac
@rugpanov rugpanov temporarily deployed to test-trigger-is June 23, 2026 11:29 — with GitHub Actions Inactive
@rugpanov rugpanov temporarily deployed to test-trigger-is June 23, 2026 11:29 — with GitHub Actions Inactive
@eng-dev-ecosystem-bot

Copy link
Copy Markdown
Collaborator

Integration test report

Commit: 8f656d9

Run: 28022942392

Env 🟨​KNOWN ✅​pass 🙈​skip Time
🟨​ aws linux 1 216 99 3:18
🟨​ aws windows 1 218 97 2:32
🟨​ aws-ucws linux 1 297 18 3:36
🟨​ aws-ucws windows 1 299 16 3:35
🟨​ azure linux 1 216 98 3:41
🟨​ azure windows 1 218 96 3:07
🟨​ azure-ucws linux 1 299 15 4:11
🟨​ azure-ucws windows 1 301 13 3:33
🟨​ gcp linux 1 215 100 3:01
🟨​ gcp windows 1 217 98 2:27
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
🟨​ TestAccept 🟨​K 🟨​K 🟨​K 🟨​K 🟨​K 🟨​K 🟨​K 🟨​K 🟨​K 🟨​K

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants