Skip to content

add databricks sample notebook#495

Merged
nateshim merged 8 commits intomainfrom
nateshim/databricks_integration
May 8, 2026
Merged

add databricks sample notebook#495
nateshim merged 8 commits intomainfrom
nateshim/databricks_integration

Conversation

@nateshim
Copy link
Copy Markdown
Contributor

@nateshim nateshim commented Apr 20, 2026

Pull Request Description

What and why?

Demo - Linking Unity Catalog to Inventory Model and Running notebooks against Databricks data:

Screen.Recording.2026-04-20.at.11.18.31.AM.mov

Demo - Using linked Databricks Unity Catalog fields in custom calculated model fields:

Screen.Recording.2026-04-21.at.9.32.58.AM.mov

How to test

What needs special review?

Dependencies, breaking changes, and deployment notes

Release notes

Checklist

  • What and why
  • Screenshots or videos (Frontend)
  • How to test
  • What needs special review
  • Dependencies, breaking changes, and deployment notes
  • Labels applied
  • PR linked to Shortcut
  • Unit tests added (Backend)
  • Tested locally
  • Documentation updated (if required)
  • Environment variable additions/changes documented (if required)

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 20, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
0 out of 2 committers have signed the CLA.

❌ Nate Shim
❌ nrichers


Nate Shim seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@nrichers
Copy link
Copy Markdown
Collaborator

nrichers commented May 5, 2026

@nateshim FYI, I mention your quickstart notebook in our Databricks docs. Subscribing so that I know when your PR gets merged ...

@nateshim nateshim marked this pull request as ready for review May 7, 2026 14:10
@nateshim nateshim requested review from nrichers and validbeck May 7, 2026 14:10
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Pull requests must include at least one of the required labels: internal (no release notes required), highlight, enhancement, bug, deprecation, documentation. Except for internal, pull requests must also include a description in the release notes section.

@nateshim nateshim added the enhancement New feature or request label May 7, 2026
@nateshim nateshim changed the title [DRAFT] add databricks sample notebook add databricks sample notebook May 7, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

PR Summary

This PR introduces a comprehensive Databricks notebook that serves as a quickstart guide for integrating the ValidMind library into a Databricks environment. The notebook guides users through the entire workflow, including:

  • Installing the ValidMind library and restarting the Python kernel as needed.
  • Verifying the installation by checking the installed version.
  • Initializing the library with API credentials, either retrieved from Databricks widgets or manually provided by editing the cell.
  • Loading data from a Unity Catalog table linked to a ValidMind model, with a fallback option using synthetic data if no table binding is present.
  • Splitting the dataset into training and testing subsets and training a simple gradient boosting classifier.
  • Registering datasets and model objects with ValidMind, and assigning model predictions to the datasets.
  • Executing individual tests (such as dataset description, class imbalance, confusion matrix, ROC curve, and feature importance) and running a complete test suite that sends results to the ValidMind Platform.

The notebook is structured in clear, sequential steps with detailed inline comments and troubleshooting tips, making it easier for users to set up and validate their integration with Databricks and the ValidMind Platform.

Test Suggestions

  • Run the notebook in a Databricks environment to validate the installation and initialization steps.
  • Test the data loading functionality with a real Unity Catalog table binding and verify the synthetic data fallback when set to True.
  • Execute the individual tests (dataset description, class imbalance, confusion matrix, ROC curve, and feature importance) to ensure that each component functions correctly.
  • Perform an end-to-end run of the full test suite and confirm that the results are correctly sent to and displayed on the ValidMind Platform.
  • Simulate invalid credential scenarios to ensure proper error handling and messaging.

Copy link
Copy Markdown
Collaborator

@nrichers nrichers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! :shipit:

Did a light edit to get the notebook ready for quick publication:

  • Cell 0 (intro): reframed from QA language to a user goal, fixed the inaccurate "via Spark" bullet, and added a link to the Synchronize with Databricks docs page.
  • Cell 4, 11, 13, 15, 17, 19: added user-facing lead-in paragraphs to previously heading-only step cells.
  • Cell 6: removed dev/prod URL guidance and the "launched from the Platform via Run Tests" framing; replaced with a two-option description (Databricks widgets or edit the next cell) and pointer to the Getting Started snippet.
  • Cell 7: replaced hardcoded https://api.dev.vm.validmind.ai/... defaults with "YOUR_API_HOST" placeholders, and rewrote the inline comment to drop the "injected by Platform when run via Run Tests" language.
  • Cell 8: rewrote the data-loading prerequisites as user instructions rather than implementation notes.
  • Cell 25: removed the QA-flavored "primary validation that results can be sent" blockquote.
  • Cell 27: tightened the verification and troubleshooting prose for end users.

For a more thorough edit, I created sc-16095 in Sprint 101.

@nateshim nateshim merged commit b734072 into main May 8, 2026
5 of 6 checks passed
@nateshim nateshim deleted the nateshim/databricks_integration branch May 8, 2026 00:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants