WAL-E: Well-Architected Lakehouse Evaluator

 __        __           ||       |_____
 \ \      / /    / \    ||        | ____|
  \ \ /\ / /    / _ \   ||        | |__
   \ V  V /    / ___ \  ||        |  __|
    \_/\_/    /_/   \_\ ||____    |_|____
    Well-Architected Lakehouse Evaluator

"It only takes a moment..." to assess your entire Databricks Lakehouse.

What is WAL-E?

WAL-E is an open-source assessment tool that automatically evaluates a Databricks workspace against the Well-Architected Lakehouse Framework. It turns a week-long manual assessment into a 15-minute automated scan.

WAL-E is designed to be run by the customer on their own system, with a Databricks Solutions Architect guiding them through every step. No tokens, credentials, or data ever leave the customer's environment.

Guiding principles

WAL-E is grounded in Databricks’ standard best practices, which are publicly documented and maintained alongside the product. Use the best-practices hub for your cloud (same catalog of cheat sheets and articles; Azure is hosted on Microsoft Learn):

AWS: Best practice articles
Azure: Best practice articles (Databricks on Azure documentation on Microsoft Learn)
Google Cloud: Best practice articles

The assessment’s scoring model is the Well-Architected Lakehouse Framework (seven pillars, evidence-backed checks). Cloud-specific entry points: AWS · Azure · Google Cloud. Together, the best-practices catalog and the Well-Architected framework define what WAL-E evaluates and why.

How It Works

                    +--------------------+
                    |   Customer (You)   |
                    |  SA guides via     |
                    |  screen share      |
                    +--------+-----------+
                             |
                    +--------v-----------+
                    |    WAL-E Agent      |
                    |  (runs on YOUR     |
                    |   machine only)    |
                    +--------+-----------+
                             |
              +--------------+--------------+
              |              |              |
     +--------v---+  +------v------+  +----v-------+
     | Collectors  |  |  Scoring    |  | Reporters  |
     | (21 APIs)   |  |  Engine     |  | (5 formats)|
     | read-only   |  | 129 checks  |  | stays local|
     +--------+---+  +------+------+  +----+-------+
              |              |              |
     +--------v--------------v--------------v-------+
     |         Your Databricks Workspace            |
     |  (Unity Catalog, Clusters, Jobs, Security)   |
     +----------------------------------------------+

Key Features

Customer-Run - Runs entirely on the customer's machine; no token sharing, no credential handoff
SA-Guided - Your Databricks SA walks you through every step via screen share or call
Automated Data Collection - 23+ read-only API/CLI queries across governance, security, compute, operations, and cost
Cloud-Aware - Auto-detects AWS, Azure, or GCP and tailors all scoring and recommendations
140 Best Practices - 129 standard + 11 deep scan, scored 0-2 across 7 pillars
Multiple Output Formats - Markdown, executive deck (PPTX), scored CSV, HTML presentation, and full audit trail
Zero Workspace Modification - 100% read-only; no writes, no side effects
AI-Native - Works as a Cursor skill, Claude Code skill, or MCP tool

Security Model

Concern	How WAL-E Handles It
Who runs it?	The customer runs WAL-E on their own machine
Token sharing?	None — the customer authenticates locally (PAT token or OAuth browser login)
Where do results go?	Stays on the customer's machine — nothing is transmitted externally
What does the SA see?	Only what the customer chooses to share (e.g., via screen share or sending the report)
What does WAL-E access?	Metadata only — never reads table data, file contents, or secret values
How to clean up?	Customer revokes their PAT token (or OAuth session expires automatically) and deletes local results

Quick Start (For Customers)

Your Databricks SA will guide you through these steps on a call or screen share.

Prerequisites

Python 3.10+
Databricks CLI v0.200+ configured with workspace access
Workspace admin access (recommended for full assessment)

Step 1: Install WAL-E

# Clone the repo (public — no authentication required)
git clone https://github.com/priyal-c/wal-e.git
cd wal-e

# Install
pip install -e .

# Or use the quick installer
./install.sh --cli

Step 2: Configure Workspace Access

Option A: OAuth (recommended) — no token to manage, session expires automatically:

databricks auth login --host https://YOUR-WORKSPACE.cloud.databricks.com \
  --profile wal-assessment
# Opens a browser window — log in with your workspace credentials

Option B: PAT token — if OAuth isn't available in your environment:

databricks configure --profile wal-assessment \
  --host https://YOUR-WORKSPACE.cloud.databricks.com \
  --token
# When prompted, paste a PAT token you created as workspace admin
# (Settings > Developer > Access tokens > Generate — set lifetime to 1 day)

Step 3: Validate Access

# Verify connectivity before running the full assessment
wal-e validate --profile wal-assessment

Step 4: Run the Assessment

# Interactive mode (recommended — your SA will walk you through each step)
wal-e assess --profile wal-assessment --interactive

# Or quick scan (generates all reports automatically)
wal-e assess --profile wal-assessment --output ./my-assessment --format all

Step 5: Review Results with Your SA

Your SA will help you interpret the results. Share the output folder or screen share.

The assessment generates these files in the output directory:

File	Description
`WAL_Assessment_Readout.md`	Detailed assessment report (all 7 pillars)
`WAL_Assessment_Scores.csv`	140 best practices with scores and notes
`WAL_Assessment_Presentation.pptx`	Executive readout deck (importable to Google Slides)
`WAL_Assessment_Remediation_Guide.docx`	Detailed remediation instructions with cloud-specific doc links
`WAL_Assessment_Audit_Report.md`	Complete evidence trail of all API calls

Understanding the Scores

WAL-E reports two key metrics for each pillar and overall:

Metric	What It Means
Verified Score	The assessment score calculated only from best practices where WAL-E had enough data to make a real determination. A score of 0 (not implemented) or 2 (fully implemented) is always verified. A score of 1 is verified only when WAL-E found real evidence (e.g., "no cluster policies found"), not when it defaulted to "cannot verify from API."
Coverage	The percentage of best practices where WAL-E had real evidence to score. Higher coverage means more confidence in the verified score. Use `--deep` mode to increase coverage by querying system tables.

Example reading:

Performance Efficiency     █████████████░░  89%    64%
                           ^verified score         ^coverage

This means: of the performance BPs that WAL-E could verify (64%), the workspace scores 89%. The remaining 36% need --deep scan or manual verification.

Maturity level is derived from the verified score:

Verified Score	Maturity Level
88-100%	Optimized
63-87%	Established
25-62%	Developing
0-24%	Beginning

Step 6: Clean Up

# If you used a PAT token, revoke it immediately after the assessment:
# Workspace > Settings > Developer > Access tokens > Revoke
# (If you used OAuth, your session expires automatically — no action needed)

# Delete the CLI profile:
# Edit ~/.databrickscfg and remove the [wal-assessment] section

# Optionally delete local assessment files after sharing with your SA

For SAs: Guiding the Customer

As the SA, you don't need access to the customer's workspace. Your role is to guide them through the process.

SA Workflow

1. Pre-Call Setup
   - Send the customer the Quick Reference Card (see ACCESS_GUIDE.md)
   - Ask them to install Python 3.10+ and Databricks CLI before the call
   - Schedule a 30-minute screen share session

2. On the Call (Customer shares their screen)
   - Guide them through 'git clone https://github.com/priyal-c/wal-e.git' and pip install
   - Walk them through 'databricks configure' with their own workspace URL
   - Have them authenticate via OAuth ('databricks auth login') or create a short-lived PAT token (1 day lifetime)
   - Run 'wal-e validate' to confirm access
   - Run 'wal-e assess --interactive' together

3. Post-Assessment
   - Ask the customer to share the output folder (or screen share the results)
   - Walk through the readout deck together
   - Discuss findings and remediation priorities
   - Customer revokes their PAT token (or confirms OAuth session will expire)

Showing the Setup Guide to Customers

# Print the customer-facing setup guide (share your screen or send the output)
wal-e setup --guide

Advanced CLI Options

# Specify output formats
wal-e assess --format pptx --format html --format csv

# Set a custom timeout (seconds, default: 600, use 0 for no limit)
wal-e assess --timeout 0

# Run in background (useful inside AI coding tools)
wal-e assess --run-in-background --output ./assessment-results

# Re-generate reports from cached assessment data
wal-e report --input ./my-assessment --format all

Deep Scan (System Tables)

The standard assessment uses 21 read-only API calls. For a deeper analysis, WAL-E can also query Databricks system tables to assess operational reality — actual cost trends, cluster idle time, query failure rates, job success rates, and security audit events.

# Deep scan requires a running SQL warehouse and SELECT grants on system.* schemas
wal-e assess --profile wal-assessment --deep --warehouse-id <YOUR_WAREHOUSE_ID>

Deep scan adds 11 additional best practices (140 total) covering:

Area	What it reveals	System Table
Cost	Idle cluster waste, DBU spend trends, concentration risk	`system.billing.usage`, `system.compute.clusters`
Performance	Query failure rate, slow query prevalence, warehouse utilization	`system.query.history`
Reliability	Job success rate, recurring job failures	`system.lakeflow.job_run_timeline`
Security	Failed login monitoring, permission change audit	`system.access.audit`
Operations	Cluster utilization efficiency (24/7 clusters)	`system.compute.clusters`

Prerequisites for deep scan:

-- Customer's account admin runs these in a SQL warehouse:
GRANT SELECT ON SCHEMA system.billing TO `your-admin-user@company.com`;
GRANT SELECT ON SCHEMA system.compute TO `your-admin-user@company.com`;
GRANT SELECT ON SCHEMA system.query TO `your-admin-user@company.com`;
GRANT SELECT ON SCHEMA system.access TO `your-admin-user@company.com`;

Without --deep, the 11 system-table BPs score as "partial" with a note explaining that deep scan is needed. This way the standard assessment still works perfectly with just the API.

Integration with AI Dev Kit

WAL-E integrates natively with the Databricks AI Dev Kit.

As a Cursor Skill

./install.sh --cursor

Then in Cursor Agent, ask:

"Run a Well-Architected Lakehouse assessment on my workspace"

As a Claude Code Skill

./install.sh --claude

Then ask naturally in Claude Code (no slash command):

"Run a WAL-E assessment on my Databricks workspace and generate a readout deck"

Claude Code Timeout: Claude Code's Bash tool has a max timeout of 10 minutes. Use wal-e assess --timeout 0 for no limit, or --run-in-background for async execution.

As an MCP Server

# Use the installer
./install.sh --mcp

# Or register manually
claude mcp add-json wal-e '{"command": "python3", "args": ["'$(pwd)'/mcp/server.py"]}'

Available MCP tools: wal_e_assess, wal_e_collect, wal_e_score, wal_e_report, wal_e_validate

Access Requirements

Full guide: See ACCESS_GUIDE.md for the complete self-service setup guide, permissions reference, and customer-facing instructions.

WAL-E needs read-only access to the workspace. It makes 21 HTTP GET API calls and zero write calls.

Permissions by Assessment Depth

Access Level	What You Get	Coverage
Regular user	Own clusters, permitted catalogs, own jobs	~40%
Workspace admin	All clusters, warehouses, security config, all jobs	~80%
Workspace admin + Metastore admin	Above + all catalogs, credentials, locations	~95%
Above + System tables	Full above + billing, audit, query history	100%

Recommended: Workspace admin + Metastore admin for a meaningful assessment.

What WAL-E Will NEVER Do

Read table data, file contents, or query results
Execute notebooks, jobs, or pipelines
Create, modify, or delete any resource
Start or stop any cluster or warehouse
Access secret values (only scope names)
Transmit data to any external service

Architecture

                    +------------------+
                    |     Customer     |
                    | (runs on their   |
                    |  own machine)    |
                    +--------+---------+
                             |
                    +--------v---------+
                    |    WAL-E Agent    |
                    |  (CLI / Skill /  |
                    |   MCP Server)    |
                    +--------+---------+
                             |
              +--------------+--------------+
              |              |              |
     +--------v---+  +------v------+  +----v-------+
     | Collectors  |  |  Scoring    |  | Reporters  |
     | (23+ APIs)  |  |  Engine     |  | (5 formats)|
     +--------+---+  +------+------+  +----+-------+
              |              |              |
     +--------v--------------v--------------v-------+
     |       Customer's Databricks Workspace        |
     |  (Unity Catalog, Clusters, Jobs, Security)   |
     +----------------------------------------------+

Component	Path	Description
`src/wal_e/collectors/`	Data collection modules for each assessment area
`src/wal_e/framework/`	WAL pillar definitions, best practices, scoring logic
`src/wal_e/reporters/`	Report generators (MD, CSV, HTML, PPTX, Audit)
`src/wal_e/core/`	Orchestration engine, config, cloud detection
`mcp/`	MCP server for AI Dev Kit integration

WAL Framework Pillars

#	Pillar	Best Practices	Focus Areas
1	Data & AI Governance	15	Unity Catalog, metadata, lineage, data quality
2	Interoperability & Usability	14	Open formats, IaC, serverless, self-service
3	Operational Excellence	24	CI/CD, MLOps, monitoring, cluster utilization*
4	Security, Compliance & Privacy	14	IAM, SSO/SCIM, encryption, login audit, permissions
5	Reliability	21	ACID, auto-scaling, DR, job success rate, recurring failures
6	Performance Efficiency	28	Serverless, data layout, query failure rate, slow queries
7	Cost Optimization	23	Spot/preemptible, idle waste, cost trends, concentration*
	Total	140	* = deep scan (system tables)

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.claude/skills		.claude/skills
.cursor/rules		.cursor/rules
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
mcp		mcp
src/wal_e		src/wal_e
tests		tests
.gitignore		.gitignore
ACCESS_GUIDE.md		ACCESS_GUIDE.md
CLAUDE.md		CLAUDE.md
LICENSE.md		LICENSE.md
README.md		README.md
WAL-E_User_Guide.docx		WAL-E_User_Guide.docx
install.sh		install.sh
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

WAL-E: Well-Architected Lakehouse Evaluator

What is WAL-E?

Guiding principles

How It Works

Key Features

Security Model

Quick Start (For Customers)

Prerequisites

Step 1: Install WAL-E

Step 2: Configure Workspace Access

Step 3: Validate Access

Step 4: Run the Assessment

Step 5: Review Results with Your SA

Understanding the Scores

Step 6: Clean Up

For SAs: Guiding the Customer

SA Workflow

Showing the Setup Guide to Customers

Advanced CLI Options

Deep Scan (System Tables)

Integration with AI Dev Kit

As a Cursor Skill

As a Claude Code Skill

As an MCP Server

Access Requirements

Permissions by Assessment Depth

What WAL-E Will NEVER Do

Architecture

WAL Framework Pillars

Contributing

Areas for Contribution

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages