Minimal Python MCP server that exposes two tools — run_sql and get_statement — for executing and monitoring SQL against a remote Databricks SQL warehouse.
This server is intentionally narrow:
- SQL warehouse only
- PAT authentication only
- configuration from environment variables and env files
- two MCP tools:
run_sqlandget_statement - hard-coded production guardrail (see below)
- inline results for small read queries
- best-effort lookup of inserted row count from query history using the returned statement ID
run_sql returns structured JSON with:
statement_id: the Databricks statement execution IDstatus: final Databricks execution stateis_terminal: whether Databricks reports the statement in a terminal statetimed_out: whether this MCP stopped polling before the statement reached a terminal stateerror_message: Databricks statement error for failed/canceled statements when availableinserted_row_count: best-effort value from query history when availablequery_result: inline rows for small read queries when Databricks returns result datacolumns: ordered column metadatarows: result rows inJSON_ARRAYformatrow_count: number of rows returned inlinetotal_row_count: total row count when Databricks provides ittruncated: whether more result rows exist than were returned inlinenext_chunk_internal_link: Databricks next-chunk hint when presentscalar_value: convenience value for 1x1 queries such asSELECT COUNT(*)
warehouse_idcatalogandschemaactually usedhistory_lookup_status: whether the inserted row count was found
The statement_id can be searched in the Databricks Query History UI.
Copy .env.example to .env and fill in:
DATABRICKS_HOSTDATABRICKS_TOKENDATABRICKS_SQL_WAREHOUSE_ID
Optional:
DATABRICKS_CATALOGDATABRICKS_SCHEMADATABRICKS_STATEMENT_TIMEOUT_SECONDSDATABRICKS_POLL_INTERVAL_SECONDSDATABRICKS_HISTORY_LOOKUP_TIMEOUT_SECONDSDATABRICKS_HISTORY_LOOKUP_POLL_SECONDS
Configuration loading precedence is:
- real process environment variables
- a local
.envdiscovered from the current working directory ~/.config/mcp-databricks.env~/.config/mcp-databricks/.env
For a stable machine-level setup, prefer ~/.config/mcp-databricks.env:
mkdir -p ~/.config
cat > ~/.config/mcp-databricks.env <<'EOF'
DATABRICKS_HOST=https://your-workspace.cloud.databricks.com
DATABRICKS_TOKEN=your-token
DATABRICKS_SQL_WAREHOUSE_ID=your-warehouse-id
EOFThat lets you rotate the token or switch warehouses without re-registering the MCP server in Codex or Claude Code.
After changing a machine-level config file, restart the MCP client or start a new session so the server process picks up the new values.
This server uses a Databricks access token in DATABRICKS_TOKEN.
If your Databricks workspace uses the newer scoped PAT flow, create the token with:
- token type:
BI Tools - API scope:
sql
That matches this server's API usage, which is limited to Databricks SQL endpoints under /api/2.0/sql/*.
The identity behind the token still needs normal Databricks permissions, including:
CAN USEon the target SQL warehouse- warehouse/data permissions required by the SQL you execute
- preferably
CAN MONITORon the warehouse if you want the best-effort query-history lookup to populateinserted_row_count
With uv:
uv syncWith pip:
python -m venv .venv
source .venv/bin/activate
pip install -e .python -m mcp_databricksThis starts the server over stdio, which is the recommended transport for a local Codex MCP.
After installing this package into an environment, point your MCP client at that environment's Python:
/path/to/python -m mcp_databricksRecommended registration when using ~/.config/mcp-databricks.env:
codex mcp add databricks-sql \
-- /path/to/python -m mcp_databricksIf you prefer to keep configuration inside Codex instead, you can still register with explicit environment variables:
codex mcp add databricks-sql \
--env DATABRICKS_HOST=https://<your-workspace>.cloud.databricks.com \
--env DATABRICKS_TOKEN=<your-token> \
--env DATABRICKS_SQL_WAREHOUSE_ID=<your-warehouse-id> \
-- /path/to/python -m mcp_databricksVerify:
codex mcp list
codex mcp get databricks-sqlRecommended registration when using ~/.config/mcp-databricks.env:
claude mcp add-json databricks-sql '{
"type": "stdio",
"command": "/path/to/python",
"args": ["-m", "mcp_databricks"]
}'If you prefer to keep configuration inside Claude Code instead, register with env in the JSON:
claude mcp add-json databricks-sql '{
"type": "stdio",
"command": "/path/to/python",
"args": ["-m", "mcp_databricks"],
"env": {
"DATABRICKS_HOST": "https://<your-workspace>.cloud.databricks.com",
"DATABRICKS_TOKEN": "<your-token>",
"DATABRICKS_SQL_WAREHOUSE_ID": "<your-warehouse-id>"
}
}'Project-scoped example:
claude mcp add-json -s project databricks-sql '{
"type": "stdio",
"command": "/path/to/python",
"args": ["-m", "mcp_databricks"],
"env": {
"DATABRICKS_HOST": "https://<your-workspace>.cloud.databricks.com",
"DATABRICKS_TOKEN": "<your-token>",
"DATABRICKS_SQL_WAREHOUSE_ID": "<your-warehouse-id>"
}
}'Verify:
claude mcp list
claude mcp get databricks-sqlThis server enforces a hard-coded safety rule: any mutating SQL statement that targets an object whose name contains a component starting with pro_ is refused before it reaches Databricks.
Mutating statements checked: INSERT, UPDATE, DELETE, MERGE, TRUNCATE, COPY INTO, CREATE, DROP, ALTER.
The guardrail inspects the target of the statement, not every identifier in the query. Read-only queries (SELECT, SHOW, DESCRIBE, EXPLAIN, ...) are always allowed, even against pro_ objects.
The check applies to every dot-separated component of the target's qualified name:
| SQL | Blocked? | Reason |
|---|---|---|
SELECT * FROM pro_catalog.schema.events |
No | Read-only |
INSERT INTO pro_catalog.schema.events ... |
Yes | Catalog starts with pro_ |
DROP TABLE catalog.pro_schema.table |
Yes | Schema starts with pro_ |
UPDATE catalog.schema.pro_users SET ... |
Yes | Table starts with pro_ |
INSERT INTO dev.schema.events ... |
No | No pro_ component |
Default catalog/schema resolution is also considered: if the configured DATABRICKS_CATALOG or DATABRICKS_SCHEMA starts with pro_, unqualified mutating targets are blocked too.
When blocked, run_sql returns:
{"error": "Blocked: refusing to execute mutating SQL against '...'. Objects with identifiers starting with 'pro_' are protected.", "blocked": true}This guardrail cannot be disabled via configuration. To change the protected prefix, edit PROTECTED_PREFIX in src/mcp_databricks/guardrails.py.
Arguments:
sql: SQL statement to executecatalog: optional catalog overrideschema: optional schema overridetimeout_seconds: optional per-call timeout overridepoll_interval_seconds: optional per-call polling override
Notes:
- Databricks Statement Execution API accepts one statement per request.
- The submit request is detached from execution because the server sends
wait_timeout=0s. - Inserted row counts are not reliably present in the immediate execution response, so the server performs a best-effort query-history lookup after successful execution.
- If query history has not indexed the statement yet,
inserted_row_countwill benull. - For successful read queries,
query_resultreturns the first inline result chunk inJSON_ARRAYformat. query_result.rowsis capped at 1000 rows. Larger or multi-chunk results are flagged withquery_result.truncated = true.- For 1x1 results such as
SELECT COUNT(*), the value is also exposed asquery_result.scalar_value. - If polling times out, the tool still returns
statement_idplus the last known status so you can resume later withget_statement.
Arguments:
statement_id: Databricks statement execution ID returned byrun_sqltimeout_seconds: optional extra time to keep polling for completionpoll_interval_seconds: optional polling override whentimeout_secondsis set
Notes:
- This tool checks statement status later by
statement_id; the original submit connection does not need to stay open. - If the statement has succeeded, the tool returns the same small inline
query_resultshape asrun_sqlwhen Databricks includes result data. - If the statement is still running and the polling window expires, it returns
timed_out = truewith the last known status. - If the statement failed or was canceled, the tool returns the status and
error_messageinstead of hiding the outcome behind a timeout.