Databricks SQL Warehouse MCP

Minimal Python MCP server that exposes two tools — run_sql and get_statement — for executing and monitoring SQL against a remote Databricks SQL warehouse.

This server is intentionally narrow:

SQL warehouse only
PAT authentication only
configuration from environment variables and env files
two MCP tools: run_sql and get_statement
hard-coded production guardrail (see below)
inline results for small read queries
best-effort lookup of inserted row count from query history using the returned statement ID

What it returns

run_sql returns structured JSON with:

statement_id: the Databricks statement execution ID
status: final Databricks execution state
is_terminal: whether Databricks reports the statement in a terminal state
timed_out: whether this MCP stopped polling before the statement reached a terminal state
error_message: Databricks statement error for failed/canceled statements when available
inserted_row_count: best-effort value from query history when available
query_result: inline rows for small read queries when Databricks returns result data
- columns: ordered column metadata
- rows: result rows in JSON_ARRAY format
- row_count: number of rows returned inline
- total_row_count: total row count when Databricks provides it
- truncated: whether more result rows exist than were returned inline
- next_chunk_internal_link: Databricks next-chunk hint when present
- scalar_value: convenience value for 1x1 queries such as SELECT COUNT(*)
warehouse_id
catalog and schema actually used
history_lookup_status: whether the inserted row count was found

The statement_id can be searched in the Databricks Query History UI.

Environment

Copy .env.example to .env and fill in:

DATABRICKS_HOST
DATABRICKS_TOKEN
DATABRICKS_SQL_WAREHOUSE_ID

Optional:

DATABRICKS_CATALOG
DATABRICKS_SCHEMA
DATABRICKS_STATEMENT_TIMEOUT_SECONDS
DATABRICKS_POLL_INTERVAL_SECONDS
DATABRICKS_HISTORY_LOOKUP_TIMEOUT_SECONDS
DATABRICKS_HISTORY_LOOKUP_POLL_SECONDS

Configuration loading precedence is:

real process environment variables
a local .env discovered from the current working directory
~/.config/mcp-databricks.env
~/.config/mcp-databricks/.env

For a stable machine-level setup, prefer ~/.config/mcp-databricks.env:

mkdir -p ~/.config
cat > ~/.config/mcp-databricks.env <<'EOF'
DATABRICKS_HOST=https://your-workspace.cloud.databricks.com
DATABRICKS_TOKEN=your-token
DATABRICKS_SQL_WAREHOUSE_ID=your-warehouse-id
EOF

That lets you rotate the token or switch warehouses without re-registering the MCP server in Codex or Claude Code.

After changing a machine-level config file, restart the MCP client or start a new session so the server process picks up the new values.

Authentication Notes

This server uses a Databricks access token in DATABRICKS_TOKEN.

If your Databricks workspace uses the newer scoped PAT flow, create the token with:

token type: BI Tools
API scope: sql

That matches this server's API usage, which is limited to Databricks SQL endpoints under /api/2.0/sql/*.

The identity behind the token still needs normal Databricks permissions, including:

CAN USE on the target SQL warehouse
warehouse/data permissions required by the SQL you execute
preferably CAN MONITOR on the warehouse if you want the best-effort query-history lookup to populate inserted_row_count

Install

With uv:

uv sync

With pip:

python -m venv .venv
source .venv/bin/activate
pip install -e .

Run locally

python -m mcp_databricks

This starts the server over stdio, which is the recommended transport for a local Codex MCP.

Register In MCP Clients

After installing this package into an environment, point your MCP client at that environment's Python:

/path/to/python -m mcp_databricks

Codex

Recommended registration when using ~/.config/mcp-databricks.env:

codex mcp add databricks-sql \
  -- /path/to/python -m mcp_databricks

If you prefer to keep configuration inside Codex instead, you can still register with explicit environment variables:

codex mcp add databricks-sql \
  --env DATABRICKS_HOST=https://<your-workspace>.cloud.databricks.com \
  --env DATABRICKS_TOKEN=<your-token> \
  --env DATABRICKS_SQL_WAREHOUSE_ID=<your-warehouse-id> \
  -- /path/to/python -m mcp_databricks

Verify:

codex mcp list
codex mcp get databricks-sql

Claude Code

Recommended registration when using ~/.config/mcp-databricks.env:

claude mcp add-json databricks-sql '{
  "type": "stdio",
  "command": "/path/to/python",
  "args": ["-m", "mcp_databricks"]
}'

If you prefer to keep configuration inside Claude Code instead, register with env in the JSON:

claude mcp add-json databricks-sql '{
  "type": "stdio",
  "command": "/path/to/python",
  "args": ["-m", "mcp_databricks"],
  "env": {
    "DATABRICKS_HOST": "https://<your-workspace>.cloud.databricks.com",
    "DATABRICKS_TOKEN": "<your-token>",
    "DATABRICKS_SQL_WAREHOUSE_ID": "<your-warehouse-id>"
  }
}'

Project-scoped example:

claude mcp add-json -s project databricks-sql '{
  "type": "stdio",
  "command": "/path/to/python",
  "args": ["-m", "mcp_databricks"],
  "env": {
    "DATABRICKS_HOST": "https://<your-workspace>.cloud.databricks.com",
    "DATABRICKS_TOKEN": "<your-token>",
    "DATABRICKS_SQL_WAREHOUSE_ID": "<your-warehouse-id>"
  }
}'

Verify:

claude mcp list
claude mcp get databricks-sql

Production Guardrail

This server enforces a hard-coded safety rule: any mutating SQL statement that targets an object whose name contains a component starting with pro_ is refused before it reaches Databricks.

Mutating statements checked: INSERT, UPDATE, DELETE, MERGE, TRUNCATE, COPY INTO, CREATE, DROP, ALTER.

The guardrail inspects the target of the statement, not every identifier in the query. Read-only queries (SELECT, SHOW, DESCRIBE, EXPLAIN, ...) are always allowed, even against pro_ objects.

The check applies to every dot-separated component of the target's qualified name:

SQL	Blocked?	Reason
`SELECT * FROM pro_catalog.schema.events`	No	Read-only
`INSERT INTO pro_catalog.schema.events ...`	Yes	Catalog starts with `pro_`
`DROP TABLE catalog.pro_schema.table`	Yes	Schema starts with `pro_`
`UPDATE catalog.schema.pro_users SET ...`	Yes	Table starts with `pro_`
`INSERT INTO dev.schema.events ...`	No	No `pro_` component

Default catalog/schema resolution is also considered: if the configured DATABRICKS_CATALOG or DATABRICKS_SCHEMA starts with pro_, unqualified mutating targets are blocked too.

When blocked, run_sql returns:

{"error": "Blocked: refusing to execute mutating SQL against '...'. Objects with identifiers starting with 'pro_' are protected.", "blocked": true}

This guardrail cannot be disabled via configuration. To change the protected prefix, edit PROTECTED_PREFIX in src/mcp_databricks/guardrails.py.

MCP tools

`run_sql`

Arguments:

sql: SQL statement to execute
catalog: optional catalog override
schema: optional schema override
timeout_seconds: optional per-call timeout override
poll_interval_seconds: optional per-call polling override

Notes:

Databricks Statement Execution API accepts one statement per request.
The submit request is detached from execution because the server sends wait_timeout=0s.
Inserted row counts are not reliably present in the immediate execution response, so the server performs a best-effort query-history lookup after successful execution.
If query history has not indexed the statement yet, inserted_row_count will be null.
For successful read queries, query_result returns the first inline result chunk in JSON_ARRAY format.
query_result.rows is capped at 1000 rows. Larger or multi-chunk results are flagged with query_result.truncated = true.
For 1x1 results such as SELECT COUNT(*), the value is also exposed as query_result.scalar_value.
If polling times out, the tool still returns statement_id plus the last known status so you can resume later with get_statement.

`get_statement`

Arguments:

statement_id: Databricks statement execution ID returned by run_sql
timeout_seconds: optional extra time to keep polling for completion
poll_interval_seconds: optional polling override when timeout_seconds is set

Notes:

This tool checks statement status later by statement_id; the original submit connection does not need to stay open.
If the statement has succeeded, the tool returns the same small inline query_result shape as run_sql when Databricks includes result data.
If the statement is still running and the polling window expires, it returns timed_out = true with the last known status.
If the statement failed or was canceled, the tool returns the status and error_message instead of hiding the outcome behind a timeout.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src/mcp_databricks		src/mcp_databricks
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Databricks SQL Warehouse MCP

What it returns

Environment

Authentication Notes

Install

Run locally

Register In MCP Clients

Codex

Claude Code

Production Guardrail

MCP tools

`run_sql`

`get_statement`

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Databricks SQL Warehouse MCP

What it returns

Environment

Authentication Notes

Install

Run locally

Register In MCP Clients

Codex

Claude Code

Production Guardrail

MCP tools

run_sql

get_statement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`run_sql`

`get_statement`

Packages