__ __ || |_____
\ \ / / / \ || | ____|
\ \ /\ / / / _ \ || | |__
\ V V / / ___ \ || | __|
\_/\_/ /_/ \_\ ||____ |_|____
Well-Architected Lakehouse Evaluator
"It only takes a moment..." to assess your entire Databricks Lakehouse.
WAL-E is an open-source assessment tool that automatically evaluates a Databricks workspace against the Well-Architected Lakehouse Framework. It turns a week-long manual assessment into a 15-minute automated scan.
WAL-E is designed to be run by the customer on their own system, with a Databricks Solutions Architect guiding them through every step. No tokens, credentials, or data ever leave the customer's environment.
WAL-E is grounded in Databricks’ standard best practices, which are publicly documented and maintained alongside the product. Use the best-practices hub for your cloud (same catalog of cheat sheets and articles; Azure is hosted on Microsoft Learn):
- AWS: Best practice articles
- Azure: Best practice articles (Databricks on Azure documentation on Microsoft Learn)
- Google Cloud: Best practice articles
The assessment’s scoring model is the Well-Architected Lakehouse Framework (seven pillars, evidence-backed checks). Cloud-specific entry points: AWS · Azure · Google Cloud. Together, the best-practices catalog and the Well-Architected framework define what WAL-E evaluates and why.
+--------------------+
| Customer (You) |
| SA guides via |
| screen share |
+--------+-----------+
|
+--------v-----------+
| WAL-E Agent |
| (runs on YOUR |
| machine only) |
+--------+-----------+
|
+--------------+--------------+
| | |
+--------v---+ +------v------+ +----v-------+
| Collectors | | Scoring | | Reporters |
| (21 APIs) | | Engine | | (5 formats)|
| read-only | | 129 checks | | stays local|
+--------+---+ +------+------+ +----+-------+
| | |
+--------v--------------v--------------v-------+
| Your Databricks Workspace |
| (Unity Catalog, Clusters, Jobs, Security) |
+----------------------------------------------+
- Customer-Run - Runs entirely on the customer's machine; no token sharing, no credential handoff
- SA-Guided - Your Databricks SA walks you through every step via screen share or call
- Automated Data Collection - 23+ read-only API/CLI queries across governance, security, compute, operations, and cost
- Cloud-Aware - Auto-detects AWS, Azure, or GCP and tailors all scoring and recommendations
- 140 Best Practices - 129 standard + 11 deep scan, scored 0-2 across 7 pillars
- Multiple Output Formats - Markdown, executive deck (PPTX), scored CSV, HTML presentation, and full audit trail
- Zero Workspace Modification - 100% read-only; no writes, no side effects
- AI-Native - Works as a Cursor skill, Claude Code skill, or MCP tool
| Concern | How WAL-E Handles It |
|---|---|
| Who runs it? | The customer runs WAL-E on their own machine |
| Token sharing? | None — the customer authenticates locally (PAT token or OAuth browser login) |
| Where do results go? | Stays on the customer's machine — nothing is transmitted externally |
| What does the SA see? | Only what the customer chooses to share (e.g., via screen share or sending the report) |
| What does WAL-E access? | Metadata only — never reads table data, file contents, or secret values |
| How to clean up? | Customer revokes their PAT token (or OAuth session expires automatically) and deletes local results |
Your Databricks SA will guide you through these steps on a call or screen share.
- Python 3.10+
- Databricks CLI v0.200+ configured with workspace access
- Workspace admin access (recommended for full assessment)
# Clone the repo (public — no authentication required)
git clone https://github.com/priyal-c/wal-e.git
cd wal-e
# Install
pip install -e .
# Or use the quick installer
./install.sh --cliOption A: OAuth (recommended) — no token to manage, session expires automatically:
databricks auth login --host https://YOUR-WORKSPACE.cloud.databricks.com \
--profile wal-assessment
# Opens a browser window — log in with your workspace credentialsOption B: PAT token — if OAuth isn't available in your environment:
databricks configure --profile wal-assessment \
--host https://YOUR-WORKSPACE.cloud.databricks.com \
--token
# When prompted, paste a PAT token you created as workspace admin
# (Settings > Developer > Access tokens > Generate — set lifetime to 1 day)# Verify connectivity before running the full assessment
wal-e validate --profile wal-assessment# Interactive mode (recommended — your SA will walk you through each step)
wal-e assess --profile wal-assessment --interactive
# Or quick scan (generates all reports automatically)
wal-e assess --profile wal-assessment --output ./my-assessment --format allYour SA will help you interpret the results. Share the output folder or screen share.
The assessment generates these files in the output directory:
| File | Description |
|---|---|
WAL_Assessment_Readout.md |
Detailed assessment report (all 7 pillars) |
WAL_Assessment_Scores.csv |
140 best practices with scores and notes |
WAL_Assessment_Presentation.pptx |
Executive readout deck (importable to Google Slides) |
WAL_Assessment_Remediation_Guide.docx |
Detailed remediation instructions with cloud-specific doc links |
WAL_Assessment_Audit_Report.md |
Complete evidence trail of all API calls |
WAL-E reports two key metrics for each pillar and overall:
| Metric | What It Means |
|---|---|
| Verified Score | The assessment score calculated only from best practices where WAL-E had enough data to make a real determination. A score of 0 (not implemented) or 2 (fully implemented) is always verified. A score of 1 is verified only when WAL-E found real evidence (e.g., "no cluster policies found"), not when it defaulted to "cannot verify from API." |
| Coverage | The percentage of best practices where WAL-E had real evidence to score. Higher coverage means more confidence in the verified score. Use --deep mode to increase coverage by querying system tables. |
Example reading:
Performance Efficiency █████████████░░ 89% 64%
^verified score ^coverage
This means: of the performance BPs that WAL-E could verify (64%), the workspace scores 89%. The remaining 36% need --deep scan or manual verification.
Maturity level is derived from the verified score:
| Verified Score | Maturity Level |
|---|---|
| 88-100% | Optimized |
| 63-87% | Established |
| 25-62% | Developing |
| 0-24% | Beginning |
# If you used a PAT token, revoke it immediately after the assessment:
# Workspace > Settings > Developer > Access tokens > Revoke
# (If you used OAuth, your session expires automatically — no action needed)
# Delete the CLI profile:
# Edit ~/.databrickscfg and remove the [wal-assessment] section
# Optionally delete local assessment files after sharing with your SAAs the SA, you don't need access to the customer's workspace. Your role is to guide them through the process.
1. Pre-Call Setup
- Send the customer the Quick Reference Card (see ACCESS_GUIDE.md)
- Ask them to install Python 3.10+ and Databricks CLI before the call
- Schedule a 30-minute screen share session
2. On the Call (Customer shares their screen)
- Guide them through 'git clone https://github.com/priyal-c/wal-e.git' and pip install
- Walk them through 'databricks configure' with their own workspace URL
- Have them authenticate via OAuth ('databricks auth login') or create a short-lived PAT token (1 day lifetime)
- Run 'wal-e validate' to confirm access
- Run 'wal-e assess --interactive' together
3. Post-Assessment
- Ask the customer to share the output folder (or screen share the results)
- Walk through the readout deck together
- Discuss findings and remediation priorities
- Customer revokes their PAT token (or confirms OAuth session will expire)
# Print the customer-facing setup guide (share your screen or send the output)
wal-e setup --guide# Specify output formats
wal-e assess --format pptx --format html --format csv
# Set a custom timeout (seconds, default: 600, use 0 for no limit)
wal-e assess --timeout 0
# Run in background (useful inside AI coding tools)
wal-e assess --run-in-background --output ./assessment-results
# Re-generate reports from cached assessment data
wal-e report --input ./my-assessment --format allThe standard assessment uses 21 read-only API calls. For a deeper analysis, WAL-E can also query Databricks system tables to assess operational reality — actual cost trends, cluster idle time, query failure rates, job success rates, and security audit events.
# Deep scan requires a running SQL warehouse and SELECT grants on system.* schemas
wal-e assess --profile wal-assessment --deep --warehouse-id <YOUR_WAREHOUSE_ID>Deep scan adds 11 additional best practices (140 total) covering:
| Area | What it reveals | System Table |
|---|---|---|
| Cost | Idle cluster waste, DBU spend trends, concentration risk | system.billing.usage, system.compute.clusters |
| Performance | Query failure rate, slow query prevalence, warehouse utilization | system.query.history |
| Reliability | Job success rate, recurring job failures | system.lakeflow.job_run_timeline |
| Security | Failed login monitoring, permission change audit | system.access.audit |
| Operations | Cluster utilization efficiency (24/7 clusters) | system.compute.clusters |
Prerequisites for deep scan:
-- Customer's account admin runs these in a SQL warehouse:
GRANT SELECT ON SCHEMA system.billing TO `your-admin-user@company.com`;
GRANT SELECT ON SCHEMA system.compute TO `your-admin-user@company.com`;
GRANT SELECT ON SCHEMA system.query TO `your-admin-user@company.com`;
GRANT SELECT ON SCHEMA system.access TO `your-admin-user@company.com`;Without --deep, the 11 system-table BPs score as "partial" with a note explaining that deep scan is needed. This way the standard assessment still works perfectly with just the API.
WAL-E integrates natively with the Databricks AI Dev Kit.
./install.sh --cursorThen in Cursor Agent, ask:
"Run a Well-Architected Lakehouse assessment on my workspace"
./install.sh --claudeThen ask naturally in Claude Code (no slash command):
"Run a WAL-E assessment on my Databricks workspace and generate a readout deck"
Claude Code Timeout: Claude Code's Bash tool has a max timeout of 10 minutes. Use
wal-e assess --timeout 0for no limit, or--run-in-backgroundfor async execution.
# Use the installer
./install.sh --mcp
# Or register manually
claude mcp add-json wal-e '{"command": "python3", "args": ["'$(pwd)'/mcp/server.py"]}'Available MCP tools: wal_e_assess, wal_e_collect, wal_e_score, wal_e_report, wal_e_validate
Full guide: See ACCESS_GUIDE.md for the complete self-service setup guide, permissions reference, and customer-facing instructions.
WAL-E needs read-only access to the workspace. It makes 21 HTTP GET API calls and zero write calls.
| Access Level | What You Get | Coverage |
|---|---|---|
| Regular user | Own clusters, permitted catalogs, own jobs | ~40% |
| Workspace admin | All clusters, warehouses, security config, all jobs | ~80% |
| Workspace admin + Metastore admin | Above + all catalogs, credentials, locations | ~95% |
| Above + System tables | Full above + billing, audit, query history | 100% |
Recommended: Workspace admin + Metastore admin for a meaningful assessment.
- Read table data, file contents, or query results
- Execute notebooks, jobs, or pipelines
- Create, modify, or delete any resource
- Start or stop any cluster or warehouse
- Access secret values (only scope names)
- Transmit data to any external service
+------------------+
| Customer |
| (runs on their |
| own machine) |
+--------+---------+
|
+--------v---------+
| WAL-E Agent |
| (CLI / Skill / |
| MCP Server) |
+--------+---------+
|
+--------------+--------------+
| | |
+--------v---+ +------v------+ +----v-------+
| Collectors | | Scoring | | Reporters |
| (23+ APIs) | | Engine | | (5 formats)|
+--------+---+ +------+------+ +----+-------+
| | |
+--------v--------------v--------------v-------+
| Customer's Databricks Workspace |
| (Unity Catalog, Clusters, Jobs, Security) |
+----------------------------------------------+
| Component | Path | Description |
|---|---|---|
src/wal_e/collectors/ |
Data collection modules for each assessment area | |
src/wal_e/framework/ |
WAL pillar definitions, best practices, scoring logic | |
src/wal_e/reporters/ |
Report generators (MD, CSV, HTML, PPTX, Audit) | |
src/wal_e/core/ |
Orchestration engine, config, cloud detection | |
mcp/ |
MCP server for AI Dev Kit integration |
| # | Pillar | Best Practices | Focus Areas |
|---|---|---|---|
| 1 | Data & AI Governance | 15 | Unity Catalog, metadata, lineage, data quality |
| 2 | Interoperability & Usability | 14 | Open formats, IaC, serverless, self-service |
| 3 | Operational Excellence | 24 | CI/CD, MLOps, monitoring, cluster utilization* |
| 4 | Security, Compliance & Privacy | 14 | IAM, SSO/SCIM, encryption, login audit*, permissions* |
| 5 | Reliability | 21 | ACID, auto-scaling, DR, job success rate*, recurring failures* |
| 6 | Performance Efficiency | 28 | Serverless, data layout, query failure rate*, slow queries* |
| 7 | Cost Optimization | 23 | Spot/preemptible, idle waste*, cost trends*, concentration* |
| Total | 140 | * = deep scan (system tables) |
We welcome contributions! See CONTRIBUTING.md for guidelines.
- Additional collectors (e.g., Databricks Apps, Clean Rooms, Marketplace)
- Custom scoring profiles per industry vertical
- Additional output formats (PDF, Notion, Confluence)
- System table query templates
(c) 2026 Databricks, Inc. All rights reserved. See LICENSE.md for details.