From 622f7671156263fa801d29d817ee7afc8a14d118 Mon Sep 17 00:00:00 2001 From: Nate Shim Date: Mon, 20 Apr 2026 11:09:42 -0700 Subject: [PATCH 1/7] add databricks sample notebook --- .../validmind_databricks_quickstart.ipynb | 6042 +++++++++++++++++ 1 file changed, 6042 insertions(+) create mode 100644 notebooks/databricks/validmind_databricks_quickstart.ipynb diff --git a/notebooks/databricks/validmind_databricks_quickstart.ipynb b/notebooks/databricks/validmind_databricks_quickstart.ipynb new file mode 100644 index 000000000..b78f120ed --- /dev/null +++ b/notebooks/databricks/validmind_databricks_quickstart.ipynb @@ -0,0 +1,6042 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# ValidMind + Databricks Quickstart\n", + "\n", + "This notebook validates that the ValidMind Library works correctly within a Databricks Collaborative Notebook environment. It demonstrates:\n", + "\n", + "- Installing and initializing the ValidMind Library\n", + "- Loading data from a Unity Catalog table via Spark\n", + "- Training a simple classification model\n", + "- Running ValidMind tests and sending results to the ValidMind Platform\n", + "\n", + "## Before you begin\n", + "\n", + "You will need:\n", + "1. A running Databricks workspace with Unity Catalog enabled\n", + "2. A ValidMind account with a registered model\n", + "3. Your ValidMind API credentials (API key, API secret, model identifier)\n", + "\n", + "To get your credentials: log in to ValidMind → **Model Inventory** → select your model → **Getting Started** → **Copy snippet to clipboard**.\n", + "\n", + "> **Note:** If you don't have a UC table ready, this notebook includes a fallback that generates synthetic data so you can still validate the full workflow." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 1 — Install the ValidMind Library\n", + "\n", + "Run this cell first. Databricks requires a Python restart after `%pip install`." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m26.0.1\u001b[0m\n", + "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n", + "Note: you may need to restart the kernel to use updated packages.\n" + ] + } + ], + "source": [ + "%pip install -q validmind" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "ename": "NameError", + "evalue": "name 'dbutils' is not defined", + "output_type": "error", + "traceback": [ + "\u001b[31m---------------------------------------------------------------------------\u001b[39m", + "\u001b[31mNameError\u001b[39m Traceback (most recent call last)", + "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[2]\u001b[39m\u001b[32m, line 2\u001b[39m\n\u001b[32m 1\u001b[39m \u001b[38;5;66;03m# Restart Python kernel to pick up newly installed packages\u001b[39;00m\n\u001b[32m----> \u001b[39m\u001b[32m2\u001b[39m \u001b[43mdbutils\u001b[49m.library.restartPython()\n", + "\u001b[31mNameError\u001b[39m: name 'dbutils' is not defined" + ] + } + ], + "source": [ + "# Restart Python kernel to pick up newly installed packages\n", + "dbutils.library.restartPython()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 2 — Verify installation" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "ValidMind Library version: 2.12.5\n", + "Installation successful!\n" + ] + } + ], + "source": [ + "import importlib.metadata\n", + "version = importlib.metadata.version('validmind')\n", + "print(f'ValidMind Library version: {version}')\n", + "print('Installation successful!')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 3 — Initialize the ValidMind Library\n", + "\n", + "Replace the placeholder values below with your actual credentials from the ValidMind Platform.\n", + "\n", + "For local development, use `http://localhost:5000/api/v1/tracking` as the `api_host`.\n", + "For production, use `https://app.prod.validmind.ai/api/v1/tracking/tracking`." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2026-04-20 08:59:08,869 - ERROR(validmind.api_client): Future releases will require `document` as one of the options you must provide to `vm.init()`. To learn more, refer to https://docs.validmind.ai/developer/validmind-library.html\n", + "2026-04-20 08:59:08,913 - INFO(validmind.api_client): 🎉 Connected to ValidMind!\n", + "📊 Model: test new model (ID: cmlgyjftq00052fp24yztwo2i)\n", + "📁 Document Type: validation_report\n" + ] + } + ], + "source": [ + "import validmind as vm\n", + "\n", + "vm.init(\n", + " api_host=\"http://localhost:5000/api/v1/tracking\",\n", + " api_key=\"935442be7a5b9ac5e1aaff5f91174696\",\n", + " api_secret=\"ce822de88d8b69fdd4f96906b450e8d37cc0d85682117feb1693c099e9376fac\",\n", + " model=\"cmlgyjftq00052fp24yztwo2i\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 4 — Load data from your linked Databricks table\n", + "\n", + "Instead of querying Databricks directly, this notebook loads data through ValidMind.\n", + "ValidMind fetches and syncs the Unity Catalog table data when you create a binding in\n", + "**Settings → Integrations → Databricks**, so the same dataset is available here via the\n", + "tracking API — no Spark session or direct UC credentials needed.\n", + "\n", + "**Prerequisites:**\n", + "1. A Databricks integration configured in ValidMind Settings\n", + "2. A `table` binding created for this model (link a Unity Catalog table to this model)\n", + "3. At least one successful sync (the initial sync triggers automatically on binding creation)\n", + "\n", + "If no binding exists yet, set `USE_SYNTHETIC_FALLBACK = True` to run with generated data." + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Loaded 2,000 rows, 11 columns from workspace.default.validmind_sample\n", + "Last synced: 2026-04-20T17:32:04.754093+00:00\n", + "Target distribution: {'1': 1000, '0': 1000}\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
agetargettenurebalancegeo_encodedcredit_scorenum_productsgender_encodedhas_credit_cardestimated_salaryis_active_member
01.044975253436927410.69529272666438564.547093922936896-1.8702084671204506-3.23048532486401370.70994760370024237.149388996534131-2.8732703802156821.19742319980490741.622875273590974
1-0.0278426990784642600.4918144010879626-0.9046053315584104-0.8483194056261063-0.96945677724807270.095751156694858410.039613069143711431.0473431163438045-0.8571723738917058-0.8382832367759102
2-1.99178723505111190-0.4578169695901857-0.40706986175941995-1.2511538337688415-0.23342484254422935-2.22136239122027620.16537623719369912-2.598014189473973-0.57912883700687610.8579892971621974
31.0059944577406620-1.276618200244661-2.5291621615136360.509018910577683-1.7468422838745599-1.53757534954805352.2544633490237804-2.303886235302519-0.7483831806910334-0.13331127234599705
4-1.531812949404434111.7917822736133590.72954232776335091.2367380983245346-1.00385081132829650.1895831515529864-0.5297722396450917-0.5491428842044095-1.316862269378586E-41.4221579721661985
\n", + "
" + ], + "text/plain": [ + " age target tenure balance \\\n", + "0 1.0449752534369274 1 0.6952927266643856 4.547093922936896 \n", + "1 -0.02784269907846426 0 0.4918144010879626 -0.9046053315584104 \n", + "2 -1.9917872350511119 0 -0.4578169695901857 -0.40706986175941995 \n", + "3 1.005994457740662 0 -1.276618200244661 -2.529162161513636 \n", + "4 -1.5318129494044341 1 1.791782273613359 0.7295423277633509 \n", + "\n", + " geo_encoded credit_score num_products \\\n", + "0 -1.8702084671204506 -3.2304853248640137 0.7099476037002423 \n", + "1 -0.8483194056261063 -0.9694567772480727 0.09575115669485841 \n", + "2 -1.2511538337688415 -0.23342484254422935 -2.2213623912202762 \n", + "3 0.509018910577683 -1.7468422838745599 -1.5375753495480535 \n", + "4 1.2367380983245346 -1.0038508113282965 0.1895831515529864 \n", + "\n", + " gender_encoded has_credit_card estimated_salary \\\n", + "0 7.149388996534131 -2.873270380215682 1.1974231998049074 \n", + "1 0.03961306914371143 1.0473431163438045 -0.8571723738917058 \n", + "2 0.16537623719369912 -2.598014189473973 -0.5791288370068761 \n", + "3 2.2544633490237804 -2.303886235302519 -0.7483831806910334 \n", + "4 -0.5297722396450917 -0.5491428842044095 -1.316862269378586E-4 \n", + "\n", + " is_active_member \n", + "0 1.622875273590974 \n", + "1 -0.8382832367759102 \n", + "2 0.8579892971621974 \n", + "3 -0.13331127234599705 \n", + "4 1.4221579721661985 " + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import requests\n", + "import pandas as pd\n", + "from validmind import api_client as _vm_client\n", + "\n", + "# Set to True only if you don't have a Databricks table binding set up yet\n", + "USE_SYNTHETIC_FALLBACK = False\n", + "\n", + "# ---------------------------------------------------------------------------\n", + "# Load from ValidMind — uses the linked Databricks table binding for this model\n", + "# ---------------------------------------------------------------------------\n", + "if not USE_SYNTHETIC_FALLBACK:\n", + " _api_host = _vm_client.get_api_host() # same host as vm.init()\n", + " _headers = _vm_client._get_api_headers()\n", + "\n", + " _response = requests.get(\n", + " f\"{_api_host}/integrations/dataset\",\n", + " headers=_headers,\n", + " timeout=30,\n", + " )\n", + "\n", + " if _response.status_code == 200:\n", + " _data = _response.json()\n", + " TABLE_NAME = _data.get(\"table_name\", \"unknown\")\n", + " TARGET_COLUMN = \"target\" # <-- update if your table uses a different column name\n", + " row_data = _data.get(\"row_data\", [])\n", + "\n", + " if not row_data:\n", + " raise RuntimeError(\n", + " f\"Binding found for table '{TABLE_NAME}' but row_data is empty. \"\n", + " \"The sync may still be in progress — wait a moment and re-run this cell.\"\n", + " )\n", + "\n", + " df = pd.DataFrame(row_data)\n", + "\n", + " if TARGET_COLUMN not in df.columns:\n", + " raise ValueError(\n", + " f\"Column '{TARGET_COLUMN}' not found in synced data. \"\n", + " f\"Available columns: {list(df.columns)}. \"\n", + " \"Update TARGET_COLUMN above to match your table's target column.\"\n", + " )\n", + "\n", + " print(f\"Loaded {len(df):,} rows, {len(df.columns)} columns from {TABLE_NAME}\")\n", + " print(f\"Last synced: {_data.get('last_synced_at', 'unknown')}\")\n", + " print(f\"Target distribution: {df[TARGET_COLUMN].value_counts().to_dict()}\")\n", + " display(df.head())\n", + "\n", + " elif _response.status_code == 404:\n", + " raise RuntimeError(\n", + " \"No active Databricks table binding found for this model.\\n\\n\"\n", + " \"To fix:\\n\"\n", + " \" 1. Go to ValidMind → Settings → Integrations → Databricks\\n\"\n", + " \" 2. Open the model binding browser and select a Unity Catalog table\\n\"\n", + " \" 3. Wait ~30 seconds for the initial sync to complete\\n\"\n", + " \" 4. Re-run this cell\\n\\n\"\n", + " \"Or set USE_SYNTHETIC_FALLBACK = True above to continue with generated data.\"\n", + " )\n", + " else:\n", + " raise RuntimeError(\n", + " f\"Unexpected error loading dataset from ValidMind: \"\n", + " f\"{_response.status_code} — {_response.text}\"\n", + " )" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# ---------------------------------------------------------------------------\n", + "# Synthetic data fallback — runs when USE_SYNTHETIC_FALLBACK = True\n", + "# Uses the Bank Customer Churn dataset pattern from ValidMind examples\n", + "# ---------------------------------------------------------------------------\n", + "if USE_SYNTHETIC_FALLBACK:\n", + " import numpy as np\n", + " from sklearn.datasets import make_classification\n", + "\n", + " np.random.seed(42)\n", + " X, y = make_classification(\n", + " n_samples=1000,\n", + " n_features=10,\n", + " n_informative=6,\n", + " n_redundant=2,\n", + " random_state=42,\n", + " )\n", + " feature_names = [\n", + " \"credit_score\", \"age\", \"tenure\", \"balance\",\n", + " \"num_products\", \"has_credit_card\", \"is_active_member\",\n", + " \"estimated_salary\", \"geography_encoded\", \"gender_encoded\",\n", + " ]\n", + " df = pd.DataFrame(X, columns=feature_names)\n", + " df[\"target\"] = y\n", + " TARGET_COLUMN = \"target\"\n", + " TABLE_NAME = \"synthetic\"\n", + "\n", + " print(f\"Using synthetic dataset: {len(df):,} rows, {len(df.columns)} columns\")\n", + " print(f\"Target distribution: {df[TARGET_COLUMN].value_counts().to_dict()}\")\n", + " display(df.head())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 5 — Prepare train/test split" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Train set: 1,600 rows\n", + "Test set: 400 rows\n", + "Features: ['age', 'tenure', 'balance', 'geo_encoded', 'credit_score', 'num_products', 'gender_encoded', 'has_credit_card', 'estimated_salary', 'is_active_member']\n" + ] + } + ], + "source": [ + "from sklearn.model_selection import train_test_split\n", + "\n", + "feature_columns = [c for c in df.columns if c != TARGET_COLUMN]\n", + "\n", + "train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)\n", + "\n", + "print(f'Train set: {len(train_df):,} rows')\n", + "print(f'Test set: {len(test_df):,} rows')\n", + "print(f'Features: {feature_columns}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 6 — Train a simple model" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Train accuracy: 0.9700\n", + "Test accuracy: 0.9125\n" + ] + } + ], + "source": [ + "from sklearn.ensemble import GradientBoostingClassifier\n", + "\n", + "model = GradientBoostingClassifier(n_estimators=100, random_state=42)\n", + "model.fit(train_df[feature_columns], train_df[TARGET_COLUMN])\n", + "\n", + "train_accuracy = model.score(train_df[feature_columns], train_df[TARGET_COLUMN])\n", + "test_accuracy = model.score(test_df[feature_columns], test_df[TARGET_COLUMN])\n", + "\n", + "print(f'Train accuracy: {train_accuracy:.4f}')\n", + "print(f'Test accuracy: {test_accuracy:.4f}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 7 — Register datasets and model with ValidMind" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Datasets and model registered with ValidMind.\n" + ] + } + ], + "source": [ + "vm_train_ds = vm.init_dataset(\n", + " dataset=train_df,\n", + " input_id=\"train_dataset\",\n", + " target_column=TARGET_COLUMN,\n", + ")\n", + "\n", + "vm_test_ds = vm.init_dataset(\n", + " dataset=test_df,\n", + " input_id=\"test_dataset\",\n", + " target_column=TARGET_COLUMN,\n", + ")\n", + "\n", + "vm_model = vm.init_model(\n", + " model=model,\n", + " input_id=\"gradient_boosting_model\",\n", + ")\n", + "\n", + "print('Datasets and model registered with ValidMind.')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 8 — Assign predictions to datasets" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2026-04-20 10:09:06,584 - INFO(validmind.vm_models.dataset.utils): Running predict_proba()... This may take a while\n", + "2026-04-20 10:09:06,599 - INFO(validmind.vm_models.dataset.utils): Done running predict_proba()\n", + "2026-04-20 10:09:06,600 - INFO(validmind.vm_models.dataset.utils): Running predict()... This may take a while\n", + "2026-04-20 10:09:06,609 - INFO(validmind.vm_models.dataset.utils): Done running predict()\n", + "2026-04-20 10:09:06,611 - INFO(validmind.vm_models.dataset.utils): Running predict_proba()... This may take a while\n", + "2026-04-20 10:09:06,613 - INFO(validmind.vm_models.dataset.utils): Done running predict_proba()\n", + "2026-04-20 10:09:06,613 - INFO(validmind.vm_models.dataset.utils): Running predict()... This may take a while\n", + "2026-04-20 10:09:06,616 - INFO(validmind.vm_models.dataset.utils): Done running predict()\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Predictions assigned.\n" + ] + } + ], + "source": [ + "vm_train_ds.assign_predictions(model=vm_model)\n", + "vm_test_ds.assign_predictions(model=vm_model)\n", + "\n", + "print('Predictions assigned.')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 9 — Run individual tests\n", + "\n", + "These tests validate that results render correctly in the notebook and are sent to the ValidMind Platform." + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + " \n", + " \n", + "
\n", + "

Dataset Description

\n", + "
\n", + " \n", + "
\n", + "

The DatasetDescription test provides a comprehensive summary of the dataset columns, including data types, counts, missing values, and distinct value statistics. The results table lists each column with its inferred type, total count, missing value statistics, and the number and proportion of unique values. All columns are fully populated, and the majority are classified as text type, with the exception of the target variable, which is categorical.

\n", + "

Key insights:

\n", + "
    \n", + "
  • All columns are fully populated: Every column has 1,600 non-missing entries, with 0 missing values and 0% missingness across the dataset.
  • \n", + "
  • High cardinality in text columns: All text-type columns (age, tenure, balance, geo_encoded, credit_score, num_products, gender_encoded, has_credit_card, estimated_salary, is_active_member) have 1,600 distinct values, representing 100% uniqueness.
  • \n", + "
  • Target variable is categorical with low cardinality: The target column is the only categorical variable, containing 2 distinct values and a distinct percentage of 0.12%.
  • \n", + "
\n", + "

The dataset is complete with no missing values in any column. The prevalence of text-type columns with maximum cardinality indicates that each entry is unique for these features, which may affect downstream modeling depending on how these variables are processed. The target variable is well-formed as a categorical feature with low cardinality, supporting binary classification tasks.

\n", + "\n", + "
\n", + "

Tables

\n", + "
\n", + "

Dataset Description

\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
NameTypeCountMissingMissing %DistinctDistinct %
ageText160000.016001.0000
tenureText160000.016001.0000
balanceText160000.016001.0000
geo_encodedText160000.016001.0000
credit_scoreText160000.016001.0000
num_productsText160000.016001.0000
gender_encodedText160000.016001.0000
has_credit_cardText160000.016001.0000
estimated_salaryText160000.016001.0000
is_active_memberText160000.016001.0000
targetCategorical160000.020.0012
\n", + "
\n", + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2026-04-20 10:09:14,279 - INFO(validmind.vm_models.result.result): Test driven block with result_id validmind.data_validation.DatasetDescription does not exist in model's document\n" + ] + } + ], + "source": [ + "# Dataset statistics — validates data documentation capability\n", + "result = vm.tests.run_test(\n", + " \"validmind.data_validation.DatasetDescription\",\n", + " inputs={\"dataset\": vm_train_ds},\n", + ")\n", + "result.log()" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + " \n", + " \n", + "
\n", + "

✅ Class Imbalance

\n", + "
\n", + " \n", + "
\n", + "

The Class Imbalance test evaluates the distribution of target classes within the dataset to identify potential imbalances that could impact model performance. The results table presents the percentage of records for each class and indicates whether each class meets the minimum threshold for representation. The accompanying bar plot visually displays the proportion of each class, facilitating interpretation of class distribution.

\n", + "

Key insights:

\n", + "
    \n", + "
  • Balanced class distribution observed: Both classes are represented nearly equally, with class '1' comprising 50.94% and class '0' comprising 49.06% of the dataset.
  • \n", + "
  • All classes pass minimum threshold: Each class exceeds the default minimum threshold of 10% representation, resulting in a "Pass" outcome for both classes.
  • \n", + "
  • No evidence of class imbalance: The visual plot confirms the near-equal distribution, with both bars of similar height.
  • \n", + "
\n", + "

The results indicate that the dataset exhibits a balanced distribution between the two target classes, with both classes well above the minimum representation threshold. No class imbalance is detected, supporting the suitability of the dataset for unbiased model training and evaluation.

\n", + "\n", + "
\n", + "

Tables

\n", + "
\n", + "

target Class Imbalance

\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
targetPercentage of Rows (%)Pass/Fail
150.94%Pass
049.06%Pass
\n", + "
\n", + "

Figures

\n", + "
\n", + "
\n", + "
\"ValidMind
\n", + " \n", + " \n", + " \n", + "
\n", + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2026-04-20 10:09:20,550 - INFO(validmind.vm_models.result.result): Test driven block with result_id validmind.data_validation.ClassImbalance does not exist in model's document\n" + ] + } + ], + "source": [ + "# Class imbalance check\n", + "result = vm.tests.run_test(\n", + " \"validmind.data_validation.ClassImbalance\",\n", + " inputs={\"dataset\": vm_train_ds},\n", + ")\n", + "result.log()" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + " \n", + " \n", + "
\n", + "

Confusion Matrix

\n", + "
\n", + " \n", + "
\n", + "

The Confusion Matrix test evaluates the classification performance of the model by comparing predicted and actual class labels, providing a breakdown of true positives, true negatives, false positives, and false negatives. The resulting matrix visually displays the distribution of correct and incorrect predictions for each class, enabling assessment of the model's ability to distinguish between classes. The matrix for the evaluated model shows the counts for each outcome type, with true positives and true negatives occupying the diagonal and false positives and false negatives on the off-diagonal.

\n", + "

Key insights:

\n", + "
    \n", + "
  • High true positive and true negative counts: The model achieved 167 true positives and 198 true negatives, indicating strong performance in correctly identifying both classes.
  • \n", + "
  • Low false positive and false negative rates: There were 17 false positives and 18 false negatives, reflecting a low rate of misclassification for both types of errors.
  • \n", + "
  • Balanced error distribution: The counts of false positives and false negatives are similar, suggesting no significant bias toward one type of misclassification.
  • \n", + "
\n", + "

The confusion matrix demonstrates that the model effectively distinguishes between the two classes, with high accuracy in both positive and negative predictions. The low and balanced rates of false positives and false negatives indicate consistent classification performance and minimal systematic error.

\n", + "\n", + "
\n", + "

Figures

\n", + "
\n", + "
\n", + "
\"ValidMind
\n", + " \n", + " \n", + " \n", + "
\n", + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2026-04-20 10:09:27,460 - INFO(validmind.vm_models.result.result): Test driven block with result_id validmind.model_validation.sklearn.ConfusionMatrix does not exist in model's document\n" + ] + } + ], + "source": [ + "# Confusion matrix — validates model performance visualization\n", + "result = vm.tests.run_test(\n", + " \"validmind.model_validation.sklearn.ConfusionMatrix\",\n", + " inputs={\"dataset\": vm_test_ds, \"model\": vm_model},\n", + ")\n", + "result.log()" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + " \n", + " \n", + "
\n", + "

ROC Curve

\n", + "
\n", + " \n", + "
\n", + "

The ROC Curve test evaluates the binary classification performance of the model by plotting the Receiver Operating Characteristic (ROC) curve and calculating the Area Under the Curve (AUC) score. The resulting plot displays the trade-off between the true positive rate and false positive rate across various thresholds, with a reference line indicating random performance. The ROC curve for the evaluated model is shown alongside its corresponding AUC value.

\n", + "

Key insights:

\n", + "
    \n", + "
  • High AUC score observed: The model achieves an AUC of 0.97, indicating strong discriminative ability between the positive and negative classes.
  • \n", + "
  • ROC curve consistently above random: The ROC curve remains well above the diagonal line representing random classification, demonstrating effective separation of classes across thresholds.
  • \n", + "
  • Steep initial TPR increase: The curve shows a rapid rise in true positive rate at low false positive rates, reflecting high sensitivity at early thresholds.
  • \n", + "
\n", + "

The test results indicate that the model demonstrates robust binary classification performance, with a high AUC value and a ROC curve that consistently outperforms random chance. The observed curve shape and magnitude of the AUC suggest effective class separation and strong model discrimination across the evaluated dataset.

\n", + "\n", + "
\n", + "

Figures

\n", + "
\n", + "
\n", + "
\"ValidMind
\n", + " \n", + " \n", + " \n", + "
\n", + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2026-04-20 10:09:31,104 - INFO(validmind.vm_models.result.result): Test driven block with result_id validmind.model_validation.sklearn.ROCCurve does not exist in model's document\n" + ] + } + ], + "source": [ + "# ROC curve\n", + "result = vm.tests.run_test(\n", + " \"validmind.model_validation.sklearn.ROCCurve\",\n", + " inputs={\"dataset\": vm_test_ds, \"model\": vm_model},\n", + ")\n", + "result.log()" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + " \n", + " \n", + "
\n", + "

Feature Importance

\n", + "
\n", + " \n", + "
\n", + "

The Feature Importance test evaluates the relative contribution of individual features to the model's predictive performance using permutation feature importance scores. The resulting summary table lists the top three features ranked by their importance values, providing a clear view of which variables most strongly influence model outputs. Each feature is presented alongside its corresponding importance score, facilitating direct comparison of their impact within the model.

\n", + "

Key insights:

\n", + "
    \n", + "
  • is_active_member is the most influential feature: is_active_member registers the highest importance score at 0.3064, indicating it has the strongest effect on model predictions among the evaluated features.
  • \n", + "
  • num_products shows moderate importance: num_products is the second most important feature with a score of 0.1396, contributing meaningfully but less than is_active_member.
  • \n", + "
  • has_credit_card has lower relative impact: has_credit_card ranks third with an importance score of 0.0691, reflecting a smaller but still measurable influence on the model.
  • \n", + "
\n", + "

The feature importance results indicate that is_active_member is the dominant driver of model predictions, with num_products and has_credit_card contributing to a lesser extent. The clear separation in importance scores highlights the primary role of is_active_member in the model's decision process, while the other features provide additional, though comparatively smaller, predictive value.

\n", + "\n", + "
\n", + "

Tables

\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Feature 1Feature 2Feature 3
[is_active_member; 0.3064][num_products; 0.1396][has_credit_card; 0.0691]
\n", + "
\n", + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2026-04-20 10:09:37,317 - INFO(validmind.vm_models.result.result): Test driven block with result_id validmind.model_validation.sklearn.FeatureImportance does not exist in model's document\n" + ] + } + ], + "source": [ + "# Feature importance\n", + "result = vm.tests.run_test(\n", + " \"validmind.model_validation.sklearn.FeatureImportance\",\n", + " inputs={\"dataset\": vm_train_ds, \"model\": vm_model},\n", + ")\n", + "result.log()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 10 — Run the full test suite\n", + "\n", + "This runs the complete classifier documentation suite and sends all results to ValidMind in one call.\n", + "\n", + "> This is the primary validation that results can be sent from a Databricks notebook environment." + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "
\n", + "
Test suite complete!
\n", + "
\n", + "
\n", + "
\n", + "
\n", + "
56/56 (100.0%)
\n", + "
\n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2026-04-20 10:09:37,428 - ERROR(validmind.vm_models.test_suite.test): Failed to run test 'validmind.data_validation.DescriptiveStatistics': (ValueError) Cannot describe a DataFrame without columns\n", + "2026-04-20 10:09:37,575 - ERROR(validmind.vm_models.test_suite.test): Failed to run test 'validmind.data_validation.DatasetSplit': (MissingRequiredTestInputError) Missing required input: datasets.\n", + "2026-04-20 10:09:38,778 - ERROR(validmind.vm_models.test_suite.test): Failed to run test 'validmind.model_validation.sklearn.PrecisionRecallCurve': (ValueError) y_true takes value in {'0', '1'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.\n", + "2026-04-20 10:09:38,805 - ERROR(validmind.vm_models.test_suite.test): Failed to run test 'validmind.model_validation.sklearn.PopulationStabilityIndex': (MissingRequiredTestInputError) Missing required input: datasets.\n", + "2026-04-20 10:09:38,810 - ERROR(validmind.vm_models.test_suite.test): Failed to run test 'validmind.model_validation.sklearn.SHAPGlobalImportance': (LoadTestError) Unable to load test 'validmind.model_validation.sklearn.SHAPGlobalImportance' from validmind test provider\n", + "2026-04-20 10:09:38,814 - ERROR(validmind.vm_models.test_suite.test): Failed to run test 'validmind.model_validation.sklearn.MinimumF1Score': (ValueError) pos_label=1 is not a valid label. It should be one of ['0', '1']\n", + "2026-04-20 10:09:38,819 - ERROR(validmind.vm_models.test_suite.test): Failed to run test 'validmind.model_validation.sklearn.TrainingTestDegradation': (MissingRequiredTestInputError) Missing required input: datasets.\n", + "2026-04-20 10:09:38,822 - ERROR(validmind.vm_models.test_suite.test): Failed to run test 'validmind.model_validation.sklearn.ModelsPerformanceComparison': (MissingRequiredTestInputError) Missing required input: models.\n", + "2026-04-20 10:09:38,823 - ERROR(validmind.vm_models.test_suite.test): Failed to run test 'validmind.model_validation.sklearn.OverfitDiagnosis': (MissingRequiredTestInputError) Missing required input: datasets.\n", + "2026-04-20 10:09:38,824 - ERROR(validmind.vm_models.test_suite.test): Failed to run test 'validmind.model_validation.sklearn.WeakspotsDiagnosis': (MissingRequiredTestInputError) Missing required input: datasets.\n", + "2026-04-20 10:09:38,825 - ERROR(validmind.vm_models.test_suite.test): Failed to run test 'validmind.model_validation.sklearn.RobustnessDiagnosis': (MissingRequiredTestInputError) Missing required input: datasets.\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + " \n", + " \n", + "

Test Suite Results: Classifier Full Suite


\n", + " \n", + "

\n", + " Check out the updated documentation on\n", + " ValidMind.\n", + "

\n", + "

Full test suite for binary classification models.

\n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " Tabular Dataset Description\n", + "
\n", + "
\n", + "
\n", + " \n", + "
Test suite to extract metadata and descriptive\n",
+       "statistics from a tabular dataset
\n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " Test Result: Dataset Description (validmind.data_validation.DatasetDescription)\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + "

Dataset Description

\n", + "
\n", + " \n", + "
\n", + "

The DatasetDescription test provides a comprehensive summary of each column in the dataset, detailing data types, counts, missing values, and the number of distinct values. The results table lists all columns, their inferred types, and key statistics, enabling a clear understanding of dataset structure and completeness. All columns except the target are classified as text, with the target column identified as categorical and containing two distinct values. No missing values are present in any column.

\n", + "

Key insights:

\n", + "
    \n", + "
  • All columns exhibit complete data: Every column has 400 entries with zero missing values, resulting in 0% missingness across the dataset.
  • \n", + "
  • High cardinality in text columns: All text-type columns (age, tenure, balance, geo_encoded, credit_score, num_products, gender_encoded, has_credit_card, estimated_salary, is_active_member) have 400 distinct values, indicating each entry is unique.
  • \n", + "
  • Target variable is categorical with low cardinality: The target column is classified as categorical and contains only 2 distinct values, representing a binary outcome.
  • \n", + "
\n", + "

The dataset is fully populated with no missing data, and all non-target columns are characterized by high cardinality, with each value unique across the 400 records. The target variable is appropriately identified as categorical with two classes. The structure indicates a dataset with complete records and a binary classification target, while the high uniqueness in text columns may reflect either identifier-like features or non-aggregated data.

\n", + "\n", + "
\n", + "

Tables

\n", + "
\n", + "

Dataset Description

\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
NameTypeCountMissingMissing %DistinctDistinct %
ageText40000.04001.000
tenureText40000.04001.000
balanceText40000.04001.000
geo_encodedText40000.04001.000
credit_scoreText40000.04001.000
num_productsText40000.04001.000
gender_encodedText40000.04001.000
has_credit_cardText40000.04001.000
estimated_salaryText40000.04001.000
is_active_memberText40000.04001.000
targetCategorical40000.020.005
\n", + "
\n", + "
\n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " ❌ Failed Test: Descriptive Statistics (validmind.data_validation.DescriptiveStatistics)\n", + "
\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + "
\n", + "

Failed to run 'Descriptive Statistics'

\n", + "

Cannot describe a DataFrame without columns

\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " Test Result: Pearson Correlation Matrix (validmind.data_validation.PearsonCorrelationMatrix)\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + "

Pearson Correlation Matrix

\n", + "
\n", + " \n", + "
\n", + "

The Pearson Correlation Matrix test evaluates the linear relationships between all pairs of numerical variables in the dataset, visualizing the strength and direction of these relationships using a heat map. The resulting plot displays the correlation coefficients, with color intensity indicating the magnitude and direction of each pairwise correlation. The heat map highlights coefficients above an absolute value of 0.7 in white, signaling high correlation. The matrix provides a comprehensive overview of potential redundancy among variables.

\n", + "

Key insights:

\n", + "
    \n", + "
  • No high correlations detected: The heat map does not display any white cells, indicating that no pair of variables exhibits a correlation coefficient above the 0.7 threshold in absolute value.
  • \n", + "
  • Low to moderate linear relationships: All observed correlations fall within the low to moderate range, as reflected by the absence of strong color intensity or white highlights in the matrix.
  • \n", + "
  • No evidence of multicollinearity: The lack of high-magnitude correlations suggests that the dataset does not contain redundant numerical variables with strong linear dependencies.
  • \n", + "
\n", + "

The correlation structure demonstrates that numerical variables in the dataset are largely independent, with no evidence of strong linear relationships or multicollinearity. This supports the suitability of the dataset for modeling without concerns regarding redundancy or overfitting due to highly correlated predictors.

\n", + "\n", + "
\n", + "

Figures

\n", + "
\n", + "
\n", + "
\"ValidMind
\n", + " \n", + " \n", + " \n", + "
\n", + "
\n", + "
\n", + "
\n", + " \n", + "
\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " Tabular Data Quality\n", + "
\n", + "
\n", + "
\n", + " \n", + "
Test suite for data quality on tabular datasets
\n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " Test Result: Class Imbalance (validmind.data_validation.ClassImbalance)\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + "

✅ Class Imbalance

\n", + "
\n", + " \n", + "
\n", + "

The Class Imbalance test evaluates the distribution of target classes within the dataset to identify potential imbalances that could affect model performance. The results table presents the percentage of records for each class and indicates whether each class meets the minimum threshold for representation. The accompanying bar plot visually displays the proportion of each class, facilitating interpretation of class distribution.

\n", + "

Key insights:

\n", + "
    \n", + "
  • Both classes exceed minimum threshold: Class 0 constitutes 53.75% and class 1 constitutes 46.25% of the dataset, with both surpassing the default 10% minimum threshold.
  • \n", + "
  • No class imbalance detected: Both classes are marked as "Pass," indicating that neither class is under-represented according to the test criteria.
  • \n", + "
  • Class proportions are relatively balanced: The difference between the two classes is 7.5 percentage points, as visualized in the bar plot, reflecting a near-even split.
  • \n", + "
\n", + "

The results indicate that the dataset exhibits a balanced class distribution, with both classes well above the minimum representation threshold. No evidence of class imbalance is present, supporting the suitability of the dataset for unbiased model training with respect to class representation.

\n", + "\n", + "
\n", + "

Tables

\n", + "
\n", + "

target Class Imbalance

\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
targetPercentage of Rows (%)Pass/Fail
053.75%Pass
146.25%Pass
\n", + "
\n", + "

Figures

\n", + "
\n", + "
\n", + "
\"ValidMind
\n", + " \n", + " \n", + " \n", + "
\n", + "
\n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " Test Result: Duplicates (validmind.data_validation.Duplicates)\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + "

✅ Duplicates

\n", + "
\n", + " \n", + "
\n", + "

The Duplicates test evaluates the presence of duplicate rows within the dataset to ensure data quality and reduce the risk of model overfitting due to redundant information. The results table presents the absolute number and percentage of duplicate rows detected in the dataset. Both metrics are reported to provide a comprehensive view of data redundancy prior to model training.

\n", + "

Key insights:

\n", + "
    \n", + "
  • No duplicate rows detected: The dataset contains zero duplicate rows, as indicated by a "Number of Duplicates" value of 0.
  • \n", + "
  • Zero percent duplication rate: The "Percentage of Rows (%)" is 0.0%, confirming the absence of redundant entries in the dataset.
  • \n", + "
\n", + "

The results demonstrate that the dataset is free from duplicate rows, indicating a high level of data quality with respect to redundancy. This supports the reliability of subsequent model training and reduces the risk of overfitting due to repeated data points.

\n", + "\n", + "
\n", + "

Tables

\n", + "
\n", + "

Duplicate Rows Results for Dataset

\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Number of DuplicatesPercentage of Rows (%)
00.0
\n", + "
\n", + "
\n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " Test Result: High Cardinality (validmind.data_validation.HighCardinality)\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + "

❌ High Cardinality

\n", + "
\n", + " \n", + "
\n", + "

The High Cardinality test evaluates the number of unique values in categorical columns to identify potential overfitting risks and data quality concerns. The results table presents, for each categorical column, the count and percentage of distinct values, along with a pass/fail status based on a predefined threshold. All columns listed in the results exhibit 400 unique values, corresponding to 100% distinctness, and each column is marked as failing the test.

\n", + "

Key insights:

\n", + "
    \n", + "
  • Universal high cardinality across all columns: Every categorical column assessed contains 400 unique values, representing 100% distinctness within each column.
  • \n", + "
  • Consistent test failure for all features: All columns fail the high cardinality threshold, indicating that none meet the criteria for acceptable cardinality as defined by the test parameters.
  • \n", + "
  • No variation in cardinality distribution: The uniformity in the number and percentage of distinct values across all columns suggests a consistent pattern of high cardinality throughout the dataset.
  • \n", + "
\n", + "

The results indicate that all evaluated categorical columns exhibit maximum cardinality, with each value being unique across all records. This uniform pattern of high cardinality results in universal test failure, highlighting a dataset structure where categorical features do not contain repeated values. Such a configuration may have implications for model generalizability and warrants further examination of feature encoding and data preprocessing practices.

\n", + "\n", + "
\n", + "

Tables

\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ColumnNumber of Distinct ValuesPercentage of Distinct Values (%)Pass/Fail
age400100.0Fail
tenure400100.0Fail
balance400100.0Fail
geo_encoded400100.0Fail
credit_score400100.0Fail
num_products400100.0Fail
gender_encoded400100.0Fail
has_credit_card400100.0Fail
estimated_salary400100.0Fail
is_active_member400100.0Fail
\n", + "
\n", + "
\n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " Test Result: High Pearson Correlation (validmind.data_validation.HighPearsonCorrelation)\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + "

✅ High Pearson Correlation

\n", + "
\n", + " \n", + "
\n", + "

The High Pearson Correlation test evaluates the linear relationships between feature pairs in the dataset to identify potential feature redundancy or multicollinearity. The test result is presented as a table listing feature pairs, their Pearson correlation coefficients, and Pass/Fail status based on a threshold. In this instance, the result table is empty, indicating that no feature pairs were identified or evaluated for high correlation.

\n", + "

Key insights:

\n", + "
    \n", + "
  • No feature pairs evaluated: The result table contains no entries, indicating that no pairwise correlations were computed or reported for the dataset.
  • \n", + "
  • No evidence of high correlation: Absence of data in the result table means there are no observed instances of high correlation among features.
  • \n", + "
\n", + "

The absence of reported feature pairs in the test output indicates that no linear relationships were identified or assessed for potential redundancy or multicollinearity in the dataset. No evidence of high correlation is present based on the current test results.

\n", + "\n", + "
\n", + "

Tables

\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
\n", + "
\n", + "
\n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " Test Result: Missing Values (validmind.data_validation.MissingValues)\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + "

✅ Missing Values

\n", + "
\n", + " \n", + "
\n", + "

The Missing Values test evaluates dataset quality by measuring the proportion of missing values in each feature and comparing it to a predefined threshold. The results table lists each feature alongside the number and percentage of missing values, as well as a Pass/Fail status based on whether the missingness exceeds the threshold. All features in the dataset are shown with zero missing values, and each passes the threshold criterion.

\n", + "

Key insights:

\n", + "
    \n", + "
  • No missing values detected: All features report zero missing values, with a missingness percentage of 0.0% for each column.
  • \n", + "
  • Universal pass across features: Every feature meets the missing value threshold, resulting in a Pass status for all columns.
  • \n", + "
\n", + "

The dataset demonstrates complete data integrity with respect to missing values, as all features contain full data coverage and satisfy the established quality threshold. This indicates a high level of dataset completeness, supporting reliable downstream modeling and analysis.

\n", + "\n", + "
\n", + "

Tables

\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ColumnNumber of Missing ValuesPercentage of Missing Values (%)Pass/Fail
age00.0Pass
tenure00.0Pass
balance00.0Pass
geo_encoded00.0Pass
credit_score00.0Pass
num_products00.0Pass
gender_encoded00.0Pass
has_credit_card00.0Pass
estimated_salary00.0Pass
is_active_member00.0Pass
target00.0Pass
\n", + "
\n", + "
\n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " Test Result: Skewness (validmind.data_validation.Skewness)\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + "

✅ Skewness

\n", + "
\n", + " \n", + "
\n", + "

The Skewness test evaluates the asymmetry of numerical data distributions within the dataset to identify potential data quality issues that may impact model performance. The results are presented in a table format, which would typically display the skewness values for each numerical column and indicate whether these values exceed the defined threshold. In this instance, the results table is empty, indicating that no numerical columns were available for skewness evaluation or that no skewness values were computed for the dataset.

\n", + "

Key insights:

\n", + "
    \n", + "
  • No numerical columns evaluated: The results table contains no entries, indicating the absence of numerical columns or computed skewness values in the dataset.
  • \n", + "
  • No skewness values reported: There are no skewness statistics available for review, and no columns are flagged for exceeding the skewness threshold.
  • \n", + "
\n", + "

The absence of results in the skewness evaluation indicates that the dataset did not contain numerical columns suitable for this test or that no skewness calculations were performed. As a result, no assessment of distributional asymmetry or related data quality risks can be made based on this test.

\n", + "\n", + "
\n", + "

Tables

\n", + "
\n", + "

Skewness Results for Dataset

\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
\n", + "
\n", + "
\n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " Test Result: Unique Rows (validmind.data_validation.UniqueRows)\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + "

❌ Unique Rows

\n", + "
\n", + " \n", + "
\n", + "

The UniqueRows test evaluates the diversity of the dataset by measuring the proportion of unique values in each column relative to the total row count. The results table presents, for each column, the number and percentage of unique values, along with a pass/fail outcome based on a predefined uniqueness threshold. All feature columns except the target variable exhibit 100% unique values, while the target column shows a markedly lower percentage of unique values and does not meet the threshold.

\n", + "

Key insights:

\n", + "
    \n", + "
  • All feature columns exhibit maximum uniqueness: Each feature column (e.g., age, tenure, balance) contains 400 unique values, corresponding to 100% uniqueness, and passes the test threshold.
  • \n", + "
  • Target variable fails uniqueness threshold: The target column contains only 2 unique values, representing 0.5% uniqueness, and fails the test.
  • \n", + "
\n", + "

The dataset demonstrates complete row-level uniqueness across all feature columns, indicating high data diversity and minimal duplication in the input features. The target variable, as expected for a binary outcome, contains only two unique values and does not meet the uniqueness threshold, resulting in a fail outcome for this column. Overall, the feature set passes the UniqueRows test, confirming strong data variety in the predictors.

\n", + "\n", + "
\n", + "

Tables

\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ColumnNumber of Unique ValuesPercentage of Unique Values (%)Pass/Fail
age400100.0Pass
tenure400100.0Pass
balance400100.0Pass
geo_encoded400100.0Pass
credit_score400100.0Pass
num_products400100.0Pass
gender_encoded400100.0Pass
has_credit_card400100.0Pass
estimated_salary400100.0Pass
is_active_member400100.0Pass
target20.5Fail
\n", + "
\n", + "
\n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " Test Result: Too Many Zero Values (validmind.data_validation.TooManyZeroValues)\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + "

✅ Too Many Zero Values

\n", + "
\n", + " \n", + "
\n", + "

The TooManyZeroValues test identifies numerical columns in the dataset that contain a proportion of zero values exceeding a predefined threshold, with the intent to highlight potential data sparsity or lack of variation. The results are presented in a tabular format, summarizing the count and percentage of zero values for each numerical column, and indicating whether each column passes or fails the threshold criterion. In this test execution, the results table is empty, indicating that no numerical columns were identified or assessed for excessive zero values.

\n", + "

Key insights:

\n", + "
    \n", + "
  • No numerical columns evaluated: The results table contains no entries, indicating that the dataset did not include any numerical columns for assessment.
  • \n", + "
  • No excessive zero values detected: With no numerical columns present, there are no instances of columns exceeding the zero value threshold.
  • \n", + "
\n", + "

The absence of numerical columns in the dataset precluded the evaluation of zero value prevalence. As a result, the test did not identify any columns with excessive zero values, and no data sparsity concerns were observed in this context.

\n", + "\n", + "
\n", + "

Tables

\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
\n", + "
\n", + "
\n", + "
\n", + "
\n", + " \n", + "
\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " Classifier Metrics\n", + "
\n", + "
\n", + "
\n", + " \n", + "
Test suite for sklearn classifier metrics
\n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " Test Result: Model Metadata (validmind.model_validation.ModelMetadata)\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + "

Model Metadata

\n", + "
\n", + " \n", + "
\n", + "

The ModelMetadata test compares key metadata attributes across models to assess consistency and completeness in model documentation. The resulting summary table presents the modeling technique, framework, framework version, and programming language for the evaluated model. This information enables a structured review of model architecture and implementation characteristics.

\n", + "

Key insights:

\n", + "
    \n", + "
  • Consistent use of SKlearnModel technique: The model utilizes the SKlearnModel technique, indicating a standardized modeling approach.
  • \n", + "
  • Uniform framework and version: The modeling framework is sklearn, with version 1.8.0 specified, providing clarity on the software environment.
  • \n", + "
  • Single programming language identified: Python is the sole programming language reported, supporting consistency in codebase and deployment.
  • \n", + "
\n", + "

The metadata comparison reveals a single, well-documented model instance with clear specification of modeling technique, framework, version, and programming language. No inconsistencies or missing fields are observed in the reported metadata.

\n", + "\n", + "
\n", + "

Tables

\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Modeling TechniqueModeling FrameworkFramework VersionProgramming Language
SKlearnModelsklearn1.8.0Python
\n", + "
\n", + "
\n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " ❌ Failed Test: Dataset Split (validmind.data_validation.DatasetSplit)\n", + "
\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + "
\n", + "

Failed to run 'Dataset Split'

\n", + "

Missing required input: datasets.

\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " Test Result: Confusion Matrix (validmind.model_validation.sklearn.ConfusionMatrix)\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + "

Confusion Matrix

\n", + "
\n", + " \n", + "
\n", + "

The Confusion Matrix test evaluates the classification performance of the model by comparing predicted and actual class labels, providing a breakdown of true positives, true negatives, false positives, and false negatives. The resulting matrix visually displays the distribution of correct and incorrect predictions for each class. The matrix for this model shows the counts for each outcome, enabling assessment of both overall accuracy and the types of errors made.

\n", + "

Key insights:

\n", + "
    \n", + "
  • High true positive and true negative counts: The model correctly identified 167 true positives and 198 true negatives, indicating strong performance in both classes.
  • \n", + "
  • Low false positive and false negative rates: There are 17 false positives and 18 false negatives, reflecting a low rate of misclassification for both error types.
  • \n", + "
  • Balanced error distribution: The numbers of false positives and false negatives are similar, suggesting no significant bias toward one type of misclassification.
  • \n", + "
\n", + "

The confusion matrix indicates that the model demonstrates strong classification accuracy, with high counts of correct predictions and low, balanced rates of both false positives and false negatives. This distribution suggests effective discrimination between classes and a low incidence of misclassification.

\n", + "\n", + "
\n", + "

Figures

\n", + "
\n", + "
\n", + "
\"ValidMind
\n", + " \n", + " \n", + " \n", + "
\n", + "
\n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " Test Result: Classifier Performance (validmind.model_validation.sklearn.ClassifierPerformance)\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + "

Classifier Performance

\n", + "
\n", + " \n", + "
\n", + "

The Classifier Performance test evaluates the predictive effectiveness of classification models by reporting precision, recall, F1-score, accuracy, and ROC AUC metrics. The results are presented in tabular format, with class-specific and aggregate (macro and weighted) averages for precision, recall, and F1, as well as overall accuracy and ROC AUC values. The tables provide a comprehensive view of the model's ability to correctly classify both classes and summarize overall discriminative power.

\n", + "

Key insights:

\n", + "
    \n", + "
  • Consistently high precision and recall across classes: Precision and recall values for both Class 0 (precision: 0.9167, recall: 0.9209) and Class 1 (precision: 0.9076, recall: 0.9027) are closely aligned, indicating balanced performance.
  • \n", + "
  • Strong aggregate performance metrics: Weighted and macro averages for precision, recall, and F1-score are all approximately 0.912, reflecting uniform effectiveness across classes.
  • \n", + "
  • High overall accuracy and ROC AUC: The model achieves an accuracy of 0.9125 and a ROC AUC of 0.973, demonstrating strong overall classification accuracy and excellent discriminative capability.
  • \n", + "
\n", + "

The results indicate that the model delivers robust and balanced classification performance, with high precision, recall, and F1-scores for both classes. The elevated accuracy and ROC AUC values further confirm the model's strong ability to distinguish between classes, with no evidence of class imbalance or performance degradation.

\n", + "\n", + "
\n", + "

Tables

\n", + "
\n", + "

Precision, Recall, and F1

\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ClassPrecisionRecallF1
00.91670.92090.9188
10.90760.90270.9051
Weighted Average0.91250.91250.9125
Macro Average0.91210.91180.9120
\n", + "
\n", + " \n", + "
\n", + "

Accuracy and ROC AUC

\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
MetricValue
Accuracy0.9125
ROC AUC0.9730
\n", + "
\n", + "
\n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " Test Result: Permutation Feature Importance (validmind.model_validation.sklearn.PermutationFeatureImportance)\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + "

Permutation Feature Importance

\n", + "
\n", + " \n", + "
\n", + "

The Permutation Feature Importance test evaluates the relative importance of each input feature by measuring the decrease in model performance when the feature's values are randomly permuted. The resulting plot displays the permutation importance scores for all features, with higher values indicating greater influence on the model's predictive accuracy. The horizontal bars represent the magnitude of importance for each feature, allowing for direct comparison of their contributions.

\n", + "

Key insights:

\n", + "
    \n", + "
  • Dominant influence of is_active_member: The is_active_member feature exhibits the highest permutation importance, with a score exceeding 0.25, indicating it is the most influential variable in the model.
  • \n", + "
  • Substantial contribution from num_products: num_products is the second most important feature, with an importance score above 0.10, reflecting a significant impact on model predictions.
  • \n", + "
  • Moderate importance for has_credit_card and geo_encoded: has_credit_card and geo_encoded display intermediate importance scores, each contributing meaningfully but less than the top two features.
  • \n", + "
  • Minimal impact from remaining features: credit_score, gender_encoded, age, balance, estimated_salary, and tenure all show low permutation importance scores, indicating limited influence on model performance.
  • \n", + "
\n", + "

The permutation importance results indicate that model predictions are primarily driven by is_active_member and num_products, with moderate contributions from has_credit_card and geo_encoded. The remaining features have minimal impact on predictive accuracy, suggesting a concentrated reliance on a small subset of variables. This distribution of importance highlights the model's dependence on a few key features, with most others playing a limited role in prediction.

\n", + "\n", + "
\n", + "

Figures

\n", + "
\n", + "
\n", + "
\"ValidMind
\n", + " \n", + " \n", + " \n", + "
\n", + "
\n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " ❌ Failed Test: Precision Recall Curve (validmind.model_validation.sklearn.PrecisionRecallCurve)\n", + "
\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + "
\n", + "

Failed to run 'Precision Recall Curve'

\n", + "

y_true takes value in {'0', '1'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.

\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " Test Result: ROC Curve (validmind.model_validation.sklearn.ROCCurve)\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + "

ROC Curve

\n", + "
\n", + " \n", + "
\n", + "

The ROC Curve test evaluates the binary classification performance of the model by plotting the Receiver Operating Characteristic (ROC) curve and calculating the Area Under the Curve (AUC) score. The resulting plot displays the trade-off between the true positive rate and false positive rate across various thresholds, with the model's ROC curve compared against a baseline representing random classification. The AUC value is provided as a quantitative summary of the model's discriminative ability.

\n", + "

Key insights:

\n", + "
    \n", + "
  • High AUC score observed: The ROC curve yields an AUC of 0.97, indicating strong separation between the positive and negative classes.
  • \n", + "
  • ROC curve consistently above random baseline: The model's ROC curve remains well above the diagonal line representing random performance (AUC = 0.5) across all thresholds.
  • \n", + "
  • Steep initial rise in true positive rate: The curve demonstrates a rapid increase in true positive rate at low false positive rates, reflecting effective early discrimination.
  • \n", + "
\n", + "

The test results demonstrate that the model exhibits robust discriminative performance on the test dataset, as evidenced by the high AUC value and the ROC curve's consistent dominance over the random baseline. The observed curve shape and magnitude of the AUC indicate effective binary classification capability across a range of thresholds.

\n", + "\n", + "
\n", + "

Figures

\n", + "
\n", + "
\n", + "
\"ValidMind
\n", + " \n", + " \n", + " \n", + "
\n", + "
\n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " ❌ Failed Test: Population Stability Index (validmind.model_validation.sklearn.PopulationStabilityIndex)\n", + "
\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + "
\n", + "

Failed to run 'Population Stability Index'

\n", + "

Missing required input: datasets.

\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " ❌ Failed Test: SHAP Global Importance (validmind.model_validation.sklearn.SHAPGlobalImportance)\n", + "
\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + "
\n", + "

Failed to run 'SHAP Global Importance'

\n", + "

Unable to load test 'validmind.model_validation.sklearn.SHAPGlobalImportance' from validmind test provider

\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " Classifier Validation\n", + "
\n", + "
\n", + "
\n", + " \n", + "
Test suite for sklearn classifier models
\n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " Test Result: Minimum Accuracy (validmind.model_validation.sklearn.MinimumAccuracy)\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + "

✅ Minimum Accuracy

\n", + "
\n", + " \n", + "
\n", + "

The Minimum Accuracy test evaluates whether the model's prediction accuracy meets or exceeds a specified threshold, providing a direct measure of overall model correctness. The results table presents the model's achieved accuracy score, the threshold applied, and the corresponding pass/fail outcome. The accuracy score is compared against the threshold to determine if the model's performance is sufficient according to the test criteria.

\n", + "

Key insights:

\n", + "
    \n", + "
  • Accuracy score exceeds threshold: The model achieved an accuracy score of 0.9125, which is above the minimum threshold of 0.7.
  • \n", + "
  • Test outcome is Pass: The test result is marked as "Pass," indicating that the model's accuracy meets the required standard for this evaluation.
  • \n", + "
\n", + "

The results demonstrate that the model's prediction accuracy substantially surpasses the minimum threshold established for this test. The observed accuracy score indicates strong overall model performance in terms of correct predictions relative to the total, as measured by the test methodology.

\n", + "\n", + "
\n", + "

Tables

\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ScoreThresholdPass/Fail
0.91250.7Pass
\n", + "
\n", + "
\n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " ❌ Failed Test: Minimum F1 Score (validmind.model_validation.sklearn.MinimumF1Score)\n", + "
\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + "
\n", + "

Failed to run 'Minimum F1 Score'

\n", + "

pos_label=1 is not a valid label. It should be one of ['0', '1']

\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " Test Result: Minimum ROCAUC Score (validmind.model_validation.sklearn.MinimumROCAUCScore)\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + "

✅ Minimum ROCAUC Score

\n", + "
\n", + " \n", + "
\n", + "

The Minimum ROC AUC Score test evaluates whether the model's multiclass ROC AUC score on the validation dataset meets or exceeds a predefined threshold, serving as an indicator of the model's ability to distinguish between classes. The results table presents the calculated ROC AUC score, the threshold applied, and the corresponding pass/fail outcome. The observed ROC AUC score is 0.973, with a threshold set at 0.5, and the test outcome is recorded as "Pass."

\n", + "

Key insights:

\n", + "
    \n", + "
  • ROC AUC score substantially exceeds threshold: The model achieved a ROC AUC score of 0.973, which is significantly higher than the minimum threshold of 0.5.
  • \n", + "
  • Test outcome is a clear pass: The pass/fail indicator confirms that the model's performance on this metric meets the required standard.
  • \n", + "
\n", + "

The results demonstrate that the model exhibits strong discriminatory power between classes, as evidenced by the high ROC AUC score relative to the threshold. The test outcome indicates that the model satisfies the minimum performance criterion established for this evaluation.

\n", + "\n", + "
\n", + "

Tables

\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ScoreThresholdPass/Fail
0.9730.5Pass
\n", + "
\n", + "
\n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " ❌ Failed Test: Training Test Degradation (validmind.model_validation.sklearn.TrainingTestDegradation)\n", + "
\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + "
\n", + "

Failed to run 'Training Test Degradation'

\n", + "

Missing required input: datasets.

\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " ❌ Failed Test: Models Performance Comparison (validmind.model_validation.sklearn.ModelsPerformanceComparison)\n", + "
\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + "
\n", + "

Failed to run 'Models Performance Comparison'

\n", + "

Missing required input: models.

\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " Classifier Model Diagnosis\n", + "
\n", + "
\n", + "
\n", + " \n", + "
Test suite for sklearn classifier model diagnosis tests
\n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " ❌ Failed Test: Overfit Diagnosis (validmind.model_validation.sklearn.OverfitDiagnosis)\n", + "
\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + "
\n", + "

Failed to run 'Overfit Diagnosis'

\n", + "

Missing required input: datasets.

\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " ❌ Failed Test: Weakspots Diagnosis (validmind.model_validation.sklearn.WeakspotsDiagnosis)\n", + "
\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + "
\n", + "

Failed to run 'Weakspots Diagnosis'

\n", + "

Missing required input: datasets.

\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " ❌ Failed Test: Robustness Diagnosis (validmind.model_validation.sklearn.RobustnessDiagnosis)\n", + "
\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + "
\n", + "

Failed to run 'Robustness Diagnosis'

\n", + "

Missing required input: datasets.

\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n", + " \n", + "
\n", + "\n", + " \n", + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Full test suite completed and results sent to ValidMind Platform.\n" + ] + } + ], + "source": [ + "test_suite_result = vm.run_test_suite(\n", + " \"classifier_full_suite\",\n", + " inputs={\n", + " \"dataset\": vm_test_ds,\n", + " \"model\": vm_model,\n", + " \"train_dataset\": vm_train_ds,\n", + " \"test_dataset\": vm_test_ds,\n", + " },\n", + ")\n", + "print('Full test suite completed and results sent to ValidMind Platform.')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 11 — Verify results on the platform\n", + "\n", + "1. Go to [ValidMind Platform](https://app.prod.validmind.ai) (or your local instance)\n", + "2. Navigate to **Model Inventory** → your model\n", + "3. Open the **Documentation** tab\n", + "4. Confirm that test results from this notebook appear\n", + "\n", + "**Expected results visible on platform:**\n", + "- Dataset Description table\n", + "- Class Imbalance chart\n", + "- Confusion Matrix\n", + "- ROC Curve\n", + "- Feature Importance chart\n", + "- Full classifier suite results\n", + "\n", + "---\n", + "\n", + "## Troubleshooting\n", + "\n", + "| Issue | Fix |\n", + "|-------|-----|\n", + "| `ModuleNotFoundError` after install | Re-run the `dbutils.library.restartPython()` cell |\n", + "| `ConnectionError` on `vm.init()` | Workspace may block outbound traffic — check network policy or use a cluster with internet access |\n", + "| `401 Unauthorized` on `vm.init()` | API key/secret are incorrect — copy credentials fresh from the platform |\n", + "| `numpy` version conflict | Pin with `%pip install -q validmind \"numpy>=1.23,<2.0.0\"` |\n", + "| `404` on dataset load | No Databricks table binding found — create one in Settings → Integrations → Databricks, then wait for sync |\n", + "| `row_data is empty` after binding created | Initial sync is still running — wait ~30 seconds and re-run Step 4 |\n", + "| Wrong columns / target not found | Update `TARGET_COLUMN` in Step 4 to match the actual target column in your UC table |\n", + "| Want to test without a binding | Set `USE_SYNTHETIC_FALLBACK = True` in Step 4 |" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.7" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} From a8394ebe7a704b384c7539634196a5f0eb2ebb10 Mon Sep 17 00:00:00 2001 From: Nate Shim Date: Thu, 7 May 2026 12:22:51 -0700 Subject: [PATCH 2/7] notebook edits --- .../validmind_databricks_quickstart.ipynb | 30 ++++++++++++++----- 1 file changed, 23 insertions(+), 7 deletions(-) diff --git a/notebooks/databricks/validmind_databricks_quickstart.ipynb b/notebooks/databricks/validmind_databricks_quickstart.ipynb index b78f120ed..22bc64059 100644 --- a/notebooks/databricks/validmind_databricks_quickstart.ipynb +++ b/notebooks/databricks/validmind_databricks_quickstart.ipynb @@ -112,13 +112,13 @@ "\n", "Replace the placeholder values below with your actual credentials from the ValidMind Platform.\n", "\n", - "For local development, use `http://localhost:5000/api/v1/tracking` as the `api_host`.\n", + "For development, use `https://api.dev.vm.validmind.ai/api/v1/tracking` as the `api_host`.\n", "For production, use `https://app.prod.validmind.ai/api/v1/tracking/tracking`." ] }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -135,12 +135,28 @@ "source": [ "import validmind as vm\n", "\n", + "# ---------------------------------------------------------------------------\n", + "# Credentials — injected by ValidMind Platform when run via \"Run Tests\",\n", + "# or replace the fallback values to run this notebook manually.\n", + "# ---------------------------------------------------------------------------\n", + "try:\n", + " api_host = dbutils.widgets.getAll().get(\"vm_api_host\", \"https://api.dev.vm.validmind.ai/api/v1/tracking\")\n", + " api_key = dbutils.widgets.getAll().get(\"vm_api_key\", \"YOUR_API_KEY\")\n", + " api_secret = dbutils.widgets.getAll().get(\"vm_api_secret\", \"YOUR_API_SECRET\")\n", + " model_cuid = dbutils.widgets.getAll().get(\"vm_model_cuid\", \"YOUR_MODEL_CUID\")\n", + "except NameError:\n", + " # dbutils is not available — running outside Databricks\n", + " api_host = \"https://api.dev.vm.validmind.ai/api/v1/tracking\"\n", + " api_key = \"YOUR_API_KEY\"\n", + " api_secret = \"YOUR_API_SECRET\"\n", + " model_cuid = \"YOUR_MODEL_CUID\"\n", + "\n", "vm.init(\n", - " api_host=\"http://localhost:5000/api/v1/tracking\",\n", - " api_key=\"935442be7a5b9ac5e1aaff5f91174696\",\n", - " api_secret=\"ce822de88d8b69fdd4f96906b450e8d37cc0d85682117feb1693c099e9376fac\",\n", - " model=\"cmlgyjftq00052fp24yztwo2i\",\n", - ")" + " api_host=api_host,\n", + " api_key=api_key,\n", + " api_secret=api_secret,\n", + " model=model_cuid,\n", + ")\n" ] }, { From 5ee64316356efd04f00f9623f1069f8db4e82d98 Mon Sep 17 00:00:00 2001 From: Nate Shim Date: Thu, 7 May 2026 12:34:22 -0700 Subject: [PATCH 3/7] edits --- .../databricks/validmind_databricks_quickstart.ipynb | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/notebooks/databricks/validmind_databricks_quickstart.ipynb b/notebooks/databricks/validmind_databricks_quickstart.ipynb index 22bc64059..c23401c49 100644 --- a/notebooks/databricks/validmind_databricks_quickstart.ipynb +++ b/notebooks/databricks/validmind_databricks_quickstart.ipynb @@ -140,16 +140,16 @@ "# or replace the fallback values to run this notebook manually.\n", "# ---------------------------------------------------------------------------\n", "try:\n", - " api_host = dbutils.widgets.getAll().get(\"vm_api_host\", \"https://api.dev.vm.validmind.ai/api/v1/tracking\")\n", - " api_key = dbutils.widgets.getAll().get(\"vm_api_key\", \"YOUR_API_KEY\")\n", + " api_host = dbutils.widgets.getAll().get(\"vm_api_host\", \"https://api.dev.vm.validmind.ai/api/v1/tracking\")\n", + " api_key = dbutils.widgets.getAll().get(\"vm_api_key\", \"YOUR_API_KEY\")\n", " api_secret = dbutils.widgets.getAll().get(\"vm_api_secret\", \"YOUR_API_SECRET\")\n", " model_cuid = dbutils.widgets.getAll().get(\"vm_model_cuid\", \"YOUR_MODEL_CUID\")\n", "except NameError:\n", " # dbutils is not available — running outside Databricks\n", " api_host = \"https://api.dev.vm.validmind.ai/api/v1/tracking\"\n", - " api_key = \"YOUR_API_KEY\"\n", - " api_secret = \"YOUR_API_SECRET\"\n", - " model_cuid = \"YOUR_MODEL_CUID\"\n", + " api_key = \"YOUR_API_KEY\" # replace with your API key\n", + " api_secret = \"YOUR_API_SECRET\" # replace with your API secret\n", + " model_cuid = \"YOUR_MODEL_CUID\" # replace with your model CUID\n", "\n", "vm.init(\n", " api_host=api_host,\n", From 6cb79a73faa00acb04f63fe40d0fb452553b7d90 Mon Sep 17 00:00:00 2001 From: Nate Shim Date: Thu, 7 May 2026 12:44:20 -0700 Subject: [PATCH 4/7] clear cell outputs --- .../validmind_databricks_quickstart.ipynb | 5591 +---------------- 1 file changed, 27 insertions(+), 5564 deletions(-) diff --git a/notebooks/databricks/validmind_databricks_quickstart.ipynb b/notebooks/databricks/validmind_databricks_quickstart.ipynb index c23401c49..5887f9d04 100644 --- a/notebooks/databricks/validmind_databricks_quickstart.ipynb +++ b/notebooks/databricks/validmind_databricks_quickstart.ipynb @@ -36,41 +36,18 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m26.0.1\u001b[0m\n", - "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n", - "Note: you may need to restart the kernel to use updated packages.\n" - ] - } - ], + "outputs": [], "source": [ "%pip install -q validmind" ] }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "ename": "NameError", - "evalue": "name 'dbutils' is not defined", - "output_type": "error", - "traceback": [ - "\u001b[31m---------------------------------------------------------------------------\u001b[39m", - "\u001b[31mNameError\u001b[39m Traceback (most recent call last)", - "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[2]\u001b[39m\u001b[32m, line 2\u001b[39m\n\u001b[32m 1\u001b[39m \u001b[38;5;66;03m# Restart Python kernel to pick up newly installed packages\u001b[39;00m\n\u001b[32m----> \u001b[39m\u001b[32m2\u001b[39m \u001b[43mdbutils\u001b[49m.library.restartPython()\n", - "\u001b[31mNameError\u001b[39m: name 'dbutils' is not defined" - ] - } - ], + "outputs": [], "source": [ "# Restart Python kernel to pick up newly installed packages\n", "dbutils.library.restartPython()" @@ -120,18 +97,7 @@ "cell_type": "code", "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2026-04-20 08:59:08,869 - ERROR(validmind.api_client): Future releases will require `document` as one of the options you must provide to `vm.init()`. To learn more, refer to https://docs.validmind.ai/developer/validmind-library.html\n", - "2026-04-20 08:59:08,913 - INFO(validmind.api_client): 🎉 Connected to ValidMind!\n", - "📊 Model: test new model (ID: cmlgyjftq00052fp24yztwo2i)\n", - "📁 Document Type: validation_report\n" - ] - } - ], + "outputs": [], "source": [ "import validmind as vm\n", "\n", @@ -180,163 +146,11 @@ }, { "cell_type": "code", - "execution_count": 39, + "execution_count": null, "metadata": { "scrolled": true }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Loaded 2,000 rows, 11 columns from workspace.default.validmind_sample\n", - "Last synced: 2026-04-20T17:32:04.754093+00:00\n", - "Target distribution: {'1': 1000, '0': 1000}\n" - ] - }, - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
agetargettenurebalancegeo_encodedcredit_scorenum_productsgender_encodedhas_credit_cardestimated_salaryis_active_member
01.044975253436927410.69529272666438564.547093922936896-1.8702084671204506-3.23048532486401370.70994760370024237.149388996534131-2.8732703802156821.19742319980490741.622875273590974
1-0.0278426990784642600.4918144010879626-0.9046053315584104-0.8483194056261063-0.96945677724807270.095751156694858410.039613069143711431.0473431163438045-0.8571723738917058-0.8382832367759102
2-1.99178723505111190-0.4578169695901857-0.40706986175941995-1.2511538337688415-0.23342484254422935-2.22136239122027620.16537623719369912-2.598014189473973-0.57912883700687610.8579892971621974
31.0059944577406620-1.276618200244661-2.5291621615136360.509018910577683-1.7468422838745599-1.53757534954805352.2544633490237804-2.303886235302519-0.7483831806910334-0.13331127234599705
4-1.531812949404434111.7917822736133590.72954232776335091.2367380983245346-1.00385081132829650.1895831515529864-0.5297722396450917-0.5491428842044095-1.316862269378586E-41.4221579721661985
\n", - "
" - ], - "text/plain": [ - " age target tenure balance \\\n", - "0 1.0449752534369274 1 0.6952927266643856 4.547093922936896 \n", - "1 -0.02784269907846426 0 0.4918144010879626 -0.9046053315584104 \n", - "2 -1.9917872350511119 0 -0.4578169695901857 -0.40706986175941995 \n", - "3 1.005994457740662 0 -1.276618200244661 -2.529162161513636 \n", - "4 -1.5318129494044341 1 1.791782273613359 0.7295423277633509 \n", - "\n", - " geo_encoded credit_score num_products \\\n", - "0 -1.8702084671204506 -3.2304853248640137 0.7099476037002423 \n", - "1 -0.8483194056261063 -0.9694567772480727 0.09575115669485841 \n", - "2 -1.2511538337688415 -0.23342484254422935 -2.2213623912202762 \n", - "3 0.509018910577683 -1.7468422838745599 -1.5375753495480535 \n", - "4 1.2367380983245346 -1.0038508113282965 0.1895831515529864 \n", - "\n", - " gender_encoded has_credit_card estimated_salary \\\n", - "0 7.149388996534131 -2.873270380215682 1.1974231998049074 \n", - "1 0.03961306914371143 1.0473431163438045 -0.8571723738917058 \n", - "2 0.16537623719369912 -2.598014189473973 -0.5791288370068761 \n", - "3 2.2544633490237804 -2.303886235302519 -0.7483831806910334 \n", - "4 -0.5297722396450917 -0.5491428842044095 -1.316862269378586E-4 \n", - "\n", - " is_active_member \n", - "0 1.622875273590974 \n", - "1 -0.8382832367759102 \n", - "2 0.8579892971621974 \n", - "3 -0.13331127234599705 \n", - "4 1.4221579721661985 " - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "outputs": [], "source": [ "import requests\n", "import pandas as pd\n", @@ -447,19 +261,9 @@ }, { "cell_type": "code", - "execution_count": 29, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Train set: 1,600 rows\n", - "Test set: 400 rows\n", - "Features: ['age', 'tenure', 'balance', 'geo_encoded', 'credit_score', 'num_products', 'gender_encoded', 'has_credit_card', 'estimated_salary', 'is_active_member']\n" - ] - } - ], + "outputs": [], "source": [ "from sklearn.model_selection import train_test_split\n", "\n", @@ -481,18 +285,9 @@ }, { "cell_type": "code", - "execution_count": 30, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Train accuracy: 0.9700\n", - "Test accuracy: 0.9125\n" - ] - } - ], + "outputs": [], "source": [ "from sklearn.ensemble import GradientBoostingClassifier\n", "\n", @@ -515,17 +310,9 @@ }, { "cell_type": "code", - "execution_count": 31, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Datasets and model registered with ValidMind.\n" - ] - } - ], + "outputs": [], "source": [ "vm_train_ds = vm.init_dataset(\n", " dataset=train_df,\n", @@ -556,31 +343,9 @@ }, { "cell_type": "code", - "execution_count": 32, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2026-04-20 10:09:06,584 - INFO(validmind.vm_models.dataset.utils): Running predict_proba()... This may take a while\n", - "2026-04-20 10:09:06,599 - INFO(validmind.vm_models.dataset.utils): Done running predict_proba()\n", - "2026-04-20 10:09:06,600 - INFO(validmind.vm_models.dataset.utils): Running predict()... This may take a while\n", - "2026-04-20 10:09:06,609 - INFO(validmind.vm_models.dataset.utils): Done running predict()\n", - "2026-04-20 10:09:06,611 - INFO(validmind.vm_models.dataset.utils): Running predict_proba()... This may take a while\n", - "2026-04-20 10:09:06,613 - INFO(validmind.vm_models.dataset.utils): Done running predict_proba()\n", - "2026-04-20 10:09:06,613 - INFO(validmind.vm_models.dataset.utils): Running predict()... This may take a while\n", - "2026-04-20 10:09:06,616 - INFO(validmind.vm_models.dataset.utils): Done running predict()\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Predictions assigned.\n" - ] - } - ], + "outputs": [], "source": [ "vm_train_ds.assign_predictions(model=vm_model)\n", "vm_test_ds.assign_predictions(model=vm_model)\n", @@ -599,242 +364,9 @@ }, { "cell_type": "code", - "execution_count": 33, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - " \n", - " \n", - "
\n", - "

Dataset Description

\n", - "
\n", - " \n", - "
\n", - "

The DatasetDescription test provides a comprehensive summary of the dataset columns, including data types, counts, missing values, and distinct value statistics. The results table lists each column with its inferred type, total count, missing value statistics, and the number and proportion of unique values. All columns are fully populated, and the majority are classified as text type, with the exception of the target variable, which is categorical.

\n", - "

Key insights:

\n", - "
    \n", - "
  • All columns are fully populated: Every column has 1,600 non-missing entries, with 0 missing values and 0% missingness across the dataset.
  • \n", - "
  • High cardinality in text columns: All text-type columns (age, tenure, balance, geo_encoded, credit_score, num_products, gender_encoded, has_credit_card, estimated_salary, is_active_member) have 1,600 distinct values, representing 100% uniqueness.
  • \n", - "
  • Target variable is categorical with low cardinality: The target column is the only categorical variable, containing 2 distinct values and a distinct percentage of 0.12%.
  • \n", - "
\n", - "

The dataset is complete with no missing values in any column. The prevalence of text-type columns with maximum cardinality indicates that each entry is unique for these features, which may affect downstream modeling depending on how these variables are processed. The target variable is well-formed as a categorical feature with low cardinality, supporting binary classification tasks.

\n", - "\n", - "
\n", - "

Tables

\n", - "
\n", - "

Dataset Description

\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
NameTypeCountMissingMissing %DistinctDistinct %
ageText160000.016001.0000
tenureText160000.016001.0000
balanceText160000.016001.0000
geo_encodedText160000.016001.0000
credit_scoreText160000.016001.0000
num_productsText160000.016001.0000
gender_encodedText160000.016001.0000
has_credit_cardText160000.016001.0000
estimated_salaryText160000.016001.0000
is_active_memberText160000.016001.0000
targetCategorical160000.020.0012
\n", - "
\n", - "
" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2026-04-20 10:09:14,279 - INFO(validmind.vm_models.result.result): Test driven block with result_id validmind.data_validation.DatasetDescription does not exist in model's document\n" - ] - } - ], + "outputs": [], "source": [ "# Dataset statistics — validates data documentation capability\n", "result = vm.tests.run_test(\n", @@ -846,191 +378,9 @@ }, { "cell_type": "code", - "execution_count": 34, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - " \n", - " \n", - "
\n", - "

✅ Class Imbalance

\n", - "
\n", - " \n", - "
\n", - "

The Class Imbalance test evaluates the distribution of target classes within the dataset to identify potential imbalances that could impact model performance. The results table presents the percentage of records for each class and indicates whether each class meets the minimum threshold for representation. The accompanying bar plot visually displays the proportion of each class, facilitating interpretation of class distribution.

\n", - "

Key insights:

\n", - "
    \n", - "
  • Balanced class distribution observed: Both classes are represented nearly equally, with class '1' comprising 50.94% and class '0' comprising 49.06% of the dataset.
  • \n", - "
  • All classes pass minimum threshold: Each class exceeds the default minimum threshold of 10% representation, resulting in a "Pass" outcome for both classes.
  • \n", - "
  • No evidence of class imbalance: The visual plot confirms the near-equal distribution, with both bars of similar height.
  • \n", - "
\n", - "

The results indicate that the dataset exhibits a balanced distribution between the two target classes, with both classes well above the minimum representation threshold. No class imbalance is detected, supporting the suitability of the dataset for unbiased model training and evaluation.

\n", - "\n", - "
\n", - "

Tables

\n", - "
\n", - "

target Class Imbalance

\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
targetPercentage of Rows (%)Pass/Fail
150.94%Pass
049.06%Pass
\n", - "
\n", - "

Figures

\n", - "
\n", - "
\n", - "
\"ValidMind
\n", - " \n", - " \n", - " \n", - "
\n", - "
" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2026-04-20 10:09:20,550 - INFO(validmind.vm_models.result.result): Test driven block with result_id validmind.data_validation.ClassImbalance does not exist in model's document\n" - ] - } - ], + "outputs": [], "source": [ "# Class imbalance check\n", "result = vm.tests.run_test(\n", @@ -1042,166 +392,9 @@ }, { "cell_type": "code", - "execution_count": 35, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - " \n", - " \n", - "
\n", - "

Confusion Matrix

\n", - "
\n", - " \n", - "
\n", - "

The Confusion Matrix test evaluates the classification performance of the model by comparing predicted and actual class labels, providing a breakdown of true positives, true negatives, false positives, and false negatives. The resulting matrix visually displays the distribution of correct and incorrect predictions for each class, enabling assessment of the model's ability to distinguish between classes. The matrix for the evaluated model shows the counts for each outcome type, with true positives and true negatives occupying the diagonal and false positives and false negatives on the off-diagonal.

\n", - "

Key insights:

\n", - "
    \n", - "
  • High true positive and true negative counts: The model achieved 167 true positives and 198 true negatives, indicating strong performance in correctly identifying both classes.
  • \n", - "
  • Low false positive and false negative rates: There were 17 false positives and 18 false negatives, reflecting a low rate of misclassification for both types of errors.
  • \n", - "
  • Balanced error distribution: The counts of false positives and false negatives are similar, suggesting no significant bias toward one type of misclassification.
  • \n", - "
\n", - "

The confusion matrix demonstrates that the model effectively distinguishes between the two classes, with high accuracy in both positive and negative predictions. The low and balanced rates of false positives and false negatives indicate consistent classification performance and minimal systematic error.

\n", - "\n", - "
\n", - "

Figures

\n", - "
\n", - "
\n", - "
\"ValidMind
\n", - " \n", - " \n", - " \n", - "
\n", - "
" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2026-04-20 10:09:27,460 - INFO(validmind.vm_models.result.result): Test driven block with result_id validmind.model_validation.sklearn.ConfusionMatrix does not exist in model's document\n" - ] - } - ], + "outputs": [], "source": [ "# Confusion matrix — validates model performance visualization\n", "result = vm.tests.run_test(\n", @@ -1213,166 +406,9 @@ }, { "cell_type": "code", - "execution_count": 36, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - " \n", - " \n", - "
\n", - "

ROC Curve

\n", - "
\n", - " \n", - "
\n", - "

The ROC Curve test evaluates the binary classification performance of the model by plotting the Receiver Operating Characteristic (ROC) curve and calculating the Area Under the Curve (AUC) score. The resulting plot displays the trade-off between the true positive rate and false positive rate across various thresholds, with a reference line indicating random performance. The ROC curve for the evaluated model is shown alongside its corresponding AUC value.

\n", - "

Key insights:

\n", - "
    \n", - "
  • High AUC score observed: The model achieves an AUC of 0.97, indicating strong discriminative ability between the positive and negative classes.
  • \n", - "
  • ROC curve consistently above random: The ROC curve remains well above the diagonal line representing random classification, demonstrating effective separation of classes across thresholds.
  • \n", - "
  • Steep initial TPR increase: The curve shows a rapid rise in true positive rate at low false positive rates, reflecting high sensitivity at early thresholds.
  • \n", - "
\n", - "

The test results indicate that the model demonstrates robust binary classification performance, with a high AUC value and a ROC curve that consistently outperforms random chance. The observed curve shape and magnitude of the AUC suggest effective class separation and strong model discrimination across the evaluated dataset.

\n", - "\n", - "
\n", - "

Figures

\n", - "
\n", - "
\n", - "
\"ValidMind
\n", - " \n", - " \n", - " \n", - "
\n", - "
" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2026-04-20 10:09:31,104 - INFO(validmind.vm_models.result.result): Test driven block with result_id validmind.model_validation.sklearn.ROCCurve does not exist in model's document\n" - ] - } - ], + "outputs": [], "source": [ "# ROC curve\n", "result = vm.tests.run_test(\n", @@ -1384,144 +420,9 @@ }, { "cell_type": "code", - "execution_count": 37, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - " \n", - " \n", - "
\n", - "

Feature Importance

\n", - "
\n", - " \n", - "
\n", - "

The Feature Importance test evaluates the relative contribution of individual features to the model's predictive performance using permutation feature importance scores. The resulting summary table lists the top three features ranked by their importance values, providing a clear view of which variables most strongly influence model outputs. Each feature is presented alongside its corresponding importance score, facilitating direct comparison of their impact within the model.

\n", - "

Key insights:

\n", - "
    \n", - "
  • is_active_member is the most influential feature: is_active_member registers the highest importance score at 0.3064, indicating it has the strongest effect on model predictions among the evaluated features.
  • \n", - "
  • num_products shows moderate importance: num_products is the second most important feature with a score of 0.1396, contributing meaningfully but less than is_active_member.
  • \n", - "
  • has_credit_card has lower relative impact: has_credit_card ranks third with an importance score of 0.0691, reflecting a smaller but still measurable influence on the model.
  • \n", - "
\n", - "

The feature importance results indicate that is_active_member is the dominant driver of model predictions, with num_products and has_credit_card contributing to a lesser extent. The clear separation in importance scores highlights the primary role of is_active_member in the model's decision process, while the other features provide additional, though comparatively smaller, predictive value.

\n", - "\n", - "
\n", - "

Tables

\n", - "
\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
Feature 1Feature 2Feature 3
[is_active_member; 0.3064][num_products; 0.1396][has_credit_card; 0.0691]
\n", - "
\n", - "
" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2026-04-20 10:09:37,317 - INFO(validmind.vm_models.result.result): Test driven block with result_id validmind.model_validation.sklearn.FeatureImportance does not exist in model's document\n" - ] - } - ], + "outputs": [], "source": [ "# Feature importance\n", "result = vm.tests.run_test(\n", @@ -1544,4447 +445,9 @@ }, { "cell_type": "code", - "execution_count": 38, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "\n", - "
\n", - "
Test suite complete!
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
56/56 (100.0%)
\n", - "
\n", - " " - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2026-04-20 10:09:37,428 - ERROR(validmind.vm_models.test_suite.test): Failed to run test 'validmind.data_validation.DescriptiveStatistics': (ValueError) Cannot describe a DataFrame without columns\n", - "2026-04-20 10:09:37,575 - ERROR(validmind.vm_models.test_suite.test): Failed to run test 'validmind.data_validation.DatasetSplit': (MissingRequiredTestInputError) Missing required input: datasets.\n", - "2026-04-20 10:09:38,778 - ERROR(validmind.vm_models.test_suite.test): Failed to run test 'validmind.model_validation.sklearn.PrecisionRecallCurve': (ValueError) y_true takes value in {'0', '1'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.\n", - "2026-04-20 10:09:38,805 - ERROR(validmind.vm_models.test_suite.test): Failed to run test 'validmind.model_validation.sklearn.PopulationStabilityIndex': (MissingRequiredTestInputError) Missing required input: datasets.\n", - "2026-04-20 10:09:38,810 - ERROR(validmind.vm_models.test_suite.test): Failed to run test 'validmind.model_validation.sklearn.SHAPGlobalImportance': (LoadTestError) Unable to load test 'validmind.model_validation.sklearn.SHAPGlobalImportance' from validmind test provider\n", - "2026-04-20 10:09:38,814 - ERROR(validmind.vm_models.test_suite.test): Failed to run test 'validmind.model_validation.sklearn.MinimumF1Score': (ValueError) pos_label=1 is not a valid label. It should be one of ['0', '1']\n", - "2026-04-20 10:09:38,819 - ERROR(validmind.vm_models.test_suite.test): Failed to run test 'validmind.model_validation.sklearn.TrainingTestDegradation': (MissingRequiredTestInputError) Missing required input: datasets.\n", - "2026-04-20 10:09:38,822 - ERROR(validmind.vm_models.test_suite.test): Failed to run test 'validmind.model_validation.sklearn.ModelsPerformanceComparison': (MissingRequiredTestInputError) Missing required input: models.\n", - "2026-04-20 10:09:38,823 - ERROR(validmind.vm_models.test_suite.test): Failed to run test 'validmind.model_validation.sklearn.OverfitDiagnosis': (MissingRequiredTestInputError) Missing required input: datasets.\n", - "2026-04-20 10:09:38,824 - ERROR(validmind.vm_models.test_suite.test): Failed to run test 'validmind.model_validation.sklearn.WeakspotsDiagnosis': (MissingRequiredTestInputError) Missing required input: datasets.\n", - "2026-04-20 10:09:38,825 - ERROR(validmind.vm_models.test_suite.test): Failed to run test 'validmind.model_validation.sklearn.RobustnessDiagnosis': (MissingRequiredTestInputError) Missing required input: datasets.\n" - ] - }, - { - "data": { - "text/html": [ - "
\n", - " \n", - " \n", - "

Test Suite Results: Classifier Full Suite


\n", - " \n", - "

\n", - " Check out the updated documentation on\n", - " ValidMind.\n", - "

\n", - "

Full test suite for binary classification models.

\n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " Tabular Dataset Description\n", - "
\n", - "
\n", - "
\n", - " \n", - "
Test suite to extract metadata and descriptive\n",
-       "statistics from a tabular dataset
\n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " Test Result: Dataset Description (validmind.data_validation.DatasetDescription)\n", - "
\n", - "
\n", - "
\n", - " \n", - " \n", - "
\n", - "

Dataset Description

\n", - "
\n", - " \n", - "
\n", - "

The DatasetDescription test provides a comprehensive summary of each column in the dataset, detailing data types, counts, missing values, and the number of distinct values. The results table lists all columns, their inferred types, and key statistics, enabling a clear understanding of dataset structure and completeness. All columns except the target are classified as text, with the target column identified as categorical and containing two distinct values. No missing values are present in any column.

\n", - "

Key insights:

\n", - "
    \n", - "
  • All columns exhibit complete data: Every column has 400 entries with zero missing values, resulting in 0% missingness across the dataset.
  • \n", - "
  • High cardinality in text columns: All text-type columns (age, tenure, balance, geo_encoded, credit_score, num_products, gender_encoded, has_credit_card, estimated_salary, is_active_member) have 400 distinct values, indicating each entry is unique.
  • \n", - "
  • Target variable is categorical with low cardinality: The target column is classified as categorical and contains only 2 distinct values, representing a binary outcome.
  • \n", - "
\n", - "

The dataset is fully populated with no missing data, and all non-target columns are characterized by high cardinality, with each value unique across the 400 records. The target variable is appropriately identified as categorical with two classes. The structure indicates a dataset with complete records and a binary classification target, while the high uniqueness in text columns may reflect either identifier-like features or non-aggregated data.

\n", - "\n", - "
\n", - "

Tables

\n", - "
\n", - "

Dataset Description

\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
NameTypeCountMissingMissing %DistinctDistinct %
ageText40000.04001.000
tenureText40000.04001.000
balanceText40000.04001.000
geo_encodedText40000.04001.000
credit_scoreText40000.04001.000
num_productsText40000.04001.000
gender_encodedText40000.04001.000
has_credit_cardText40000.04001.000
estimated_salaryText40000.04001.000
is_active_memberText40000.04001.000
targetCategorical40000.020.005
\n", - "
\n", - "
\n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " ❌ Failed Test: Descriptive Statistics (validmind.data_validation.DescriptiveStatistics)\n", - "
\n", - "
\n", - " \n", - " \n", - " \n", - " \n", - "
\n", - "

Failed to run 'Descriptive Statistics'

\n", - "

Cannot describe a DataFrame without columns

\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " Test Result: Pearson Correlation Matrix (validmind.data_validation.PearsonCorrelationMatrix)\n", - "
\n", - "
\n", - "
\n", - " \n", - " \n", - "
\n", - "

Pearson Correlation Matrix

\n", - "
\n", - " \n", - "
\n", - "

The Pearson Correlation Matrix test evaluates the linear relationships between all pairs of numerical variables in the dataset, visualizing the strength and direction of these relationships using a heat map. The resulting plot displays the correlation coefficients, with color intensity indicating the magnitude and direction of each pairwise correlation. The heat map highlights coefficients above an absolute value of 0.7 in white, signaling high correlation. The matrix provides a comprehensive overview of potential redundancy among variables.

\n", - "

Key insights:

\n", - "
    \n", - "
  • No high correlations detected: The heat map does not display any white cells, indicating that no pair of variables exhibits a correlation coefficient above the 0.7 threshold in absolute value.
  • \n", - "
  • Low to moderate linear relationships: All observed correlations fall within the low to moderate range, as reflected by the absence of strong color intensity or white highlights in the matrix.
  • \n", - "
  • No evidence of multicollinearity: The lack of high-magnitude correlations suggests that the dataset does not contain redundant numerical variables with strong linear dependencies.
  • \n", - "
\n", - "

The correlation structure demonstrates that numerical variables in the dataset are largely independent, with no evidence of strong linear relationships or multicollinearity. This supports the suitability of the dataset for modeling without concerns regarding redundancy or overfitting due to highly correlated predictors.

\n", - "\n", - "
\n", - "

Figures

\n", - "
\n", - "
\n", - "
\"ValidMind
\n", - " \n", - " \n", - " \n", - "
\n", - "
\n", - "
\n", - "
\n", - " \n", - "
\n", - "\n", - " \n", - "
\n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " Tabular Data Quality\n", - "
\n", - "
\n", - "
\n", - " \n", - "
Test suite for data quality on tabular datasets
\n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " Test Result: Class Imbalance (validmind.data_validation.ClassImbalance)\n", - "
\n", - "
\n", - "
\n", - " \n", - " \n", - "
\n", - "

✅ Class Imbalance

\n", - "
\n", - " \n", - "
\n", - "

The Class Imbalance test evaluates the distribution of target classes within the dataset to identify potential imbalances that could affect model performance. The results table presents the percentage of records for each class and indicates whether each class meets the minimum threshold for representation. The accompanying bar plot visually displays the proportion of each class, facilitating interpretation of class distribution.

\n", - "

Key insights:

\n", - "
    \n", - "
  • Both classes exceed minimum threshold: Class 0 constitutes 53.75% and class 1 constitutes 46.25% of the dataset, with both surpassing the default 10% minimum threshold.
  • \n", - "
  • No class imbalance detected: Both classes are marked as "Pass," indicating that neither class is under-represented according to the test criteria.
  • \n", - "
  • Class proportions are relatively balanced: The difference between the two classes is 7.5 percentage points, as visualized in the bar plot, reflecting a near-even split.
  • \n", - "
\n", - "

The results indicate that the dataset exhibits a balanced class distribution, with both classes well above the minimum representation threshold. No evidence of class imbalance is present, supporting the suitability of the dataset for unbiased model training with respect to class representation.

\n", - "\n", - "
\n", - "

Tables

\n", - "
\n", - "

target Class Imbalance

\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
targetPercentage of Rows (%)Pass/Fail
053.75%Pass
146.25%Pass
\n", - "
\n", - "

Figures

\n", - "
\n", - "
\n", - "
\"ValidMind
\n", - " \n", - " \n", - " \n", - "
\n", - "
\n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " Test Result: Duplicates (validmind.data_validation.Duplicates)\n", - "
\n", - "
\n", - "
\n", - " \n", - " \n", - "
\n", - "

✅ Duplicates

\n", - "
\n", - " \n", - "
\n", - "

The Duplicates test evaluates the presence of duplicate rows within the dataset to ensure data quality and reduce the risk of model overfitting due to redundant information. The results table presents the absolute number and percentage of duplicate rows detected in the dataset. Both metrics are reported to provide a comprehensive view of data redundancy prior to model training.

\n", - "

Key insights:

\n", - "
    \n", - "
  • No duplicate rows detected: The dataset contains zero duplicate rows, as indicated by a "Number of Duplicates" value of 0.
  • \n", - "
  • Zero percent duplication rate: The "Percentage of Rows (%)" is 0.0%, confirming the absence of redundant entries in the dataset.
  • \n", - "
\n", - "

The results demonstrate that the dataset is free from duplicate rows, indicating a high level of data quality with respect to redundancy. This supports the reliability of subsequent model training and reduces the risk of overfitting due to repeated data points.

\n", - "\n", - "
\n", - "

Tables

\n", - "
\n", - "

Duplicate Rows Results for Dataset

\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
Number of DuplicatesPercentage of Rows (%)
00.0
\n", - "
\n", - "
\n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " Test Result: High Cardinality (validmind.data_validation.HighCardinality)\n", - "
\n", - "
\n", - "
\n", - " \n", - " \n", - "
\n", - "

❌ High Cardinality

\n", - "
\n", - " \n", - "
\n", - "

The High Cardinality test evaluates the number of unique values in categorical columns to identify potential overfitting risks and data quality concerns. The results table presents, for each categorical column, the count and percentage of distinct values, along with a pass/fail status based on a predefined threshold. All columns listed in the results exhibit 400 unique values, corresponding to 100% distinctness, and each column is marked as failing the test.

\n", - "

Key insights:

\n", - "
    \n", - "
  • Universal high cardinality across all columns: Every categorical column assessed contains 400 unique values, representing 100% distinctness within each column.
  • \n", - "
  • Consistent test failure for all features: All columns fail the high cardinality threshold, indicating that none meet the criteria for acceptable cardinality as defined by the test parameters.
  • \n", - "
  • No variation in cardinality distribution: The uniformity in the number and percentage of distinct values across all columns suggests a consistent pattern of high cardinality throughout the dataset.
  • \n", - "
\n", - "

The results indicate that all evaluated categorical columns exhibit maximum cardinality, with each value being unique across all records. This uniform pattern of high cardinality results in universal test failure, highlighting a dataset structure where categorical features do not contain repeated values. Such a configuration may have implications for model generalizability and warrants further examination of feature encoding and data preprocessing practices.

\n", - "\n", - "
\n", - "

Tables

\n", - "
\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
ColumnNumber of Distinct ValuesPercentage of Distinct Values (%)Pass/Fail
age400100.0Fail
tenure400100.0Fail
balance400100.0Fail
geo_encoded400100.0Fail
credit_score400100.0Fail
num_products400100.0Fail
gender_encoded400100.0Fail
has_credit_card400100.0Fail
estimated_salary400100.0Fail
is_active_member400100.0Fail
\n", - "
\n", - "
\n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " Test Result: High Pearson Correlation (validmind.data_validation.HighPearsonCorrelation)\n", - "
\n", - "
\n", - "
\n", - " \n", - " \n", - "
\n", - "

✅ High Pearson Correlation

\n", - "
\n", - " \n", - "
\n", - "

The High Pearson Correlation test evaluates the linear relationships between feature pairs in the dataset to identify potential feature redundancy or multicollinearity. The test result is presented as a table listing feature pairs, their Pearson correlation coefficients, and Pass/Fail status based on a threshold. In this instance, the result table is empty, indicating that no feature pairs were identified or evaluated for high correlation.

\n", - "

Key insights:

\n", - "
    \n", - "
  • No feature pairs evaluated: The result table contains no entries, indicating that no pairwise correlations were computed or reported for the dataset.
  • \n", - "
  • No evidence of high correlation: Absence of data in the result table means there are no observed instances of high correlation among features.
  • \n", - "
\n", - "

The absence of reported feature pairs in the test output indicates that no linear relationships were identified or assessed for potential redundancy or multicollinearity in the dataset. No evidence of high correlation is present based on the current test results.

\n", - "\n", - "
\n", - "

Tables

\n", - "
\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " Test Result: Missing Values (validmind.data_validation.MissingValues)\n", - "
\n", - "
\n", - "
\n", - " \n", - " \n", - "
\n", - "

✅ Missing Values

\n", - "
\n", - " \n", - "
\n", - "

The Missing Values test evaluates dataset quality by measuring the proportion of missing values in each feature and comparing it to a predefined threshold. The results table lists each feature alongside the number and percentage of missing values, as well as a Pass/Fail status based on whether the missingness exceeds the threshold. All features in the dataset are shown with zero missing values, and each passes the threshold criterion.

\n", - "

Key insights:

\n", - "
    \n", - "
  • No missing values detected: All features report zero missing values, with a missingness percentage of 0.0% for each column.
  • \n", - "
  • Universal pass across features: Every feature meets the missing value threshold, resulting in a Pass status for all columns.
  • \n", - "
\n", - "

The dataset demonstrates complete data integrity with respect to missing values, as all features contain full data coverage and satisfy the established quality threshold. This indicates a high level of dataset completeness, supporting reliable downstream modeling and analysis.

\n", - "\n", - "
\n", - "

Tables

\n", - "
\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
ColumnNumber of Missing ValuesPercentage of Missing Values (%)Pass/Fail
age00.0Pass
tenure00.0Pass
balance00.0Pass
geo_encoded00.0Pass
credit_score00.0Pass
num_products00.0Pass
gender_encoded00.0Pass
has_credit_card00.0Pass
estimated_salary00.0Pass
is_active_member00.0Pass
target00.0Pass
\n", - "
\n", - "
\n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " Test Result: Skewness (validmind.data_validation.Skewness)\n", - "
\n", - "
\n", - "
\n", - " \n", - " \n", - "
\n", - "

✅ Skewness

\n", - "
\n", - " \n", - "
\n", - "

The Skewness test evaluates the asymmetry of numerical data distributions within the dataset to identify potential data quality issues that may impact model performance. The results are presented in a table format, which would typically display the skewness values for each numerical column and indicate whether these values exceed the defined threshold. In this instance, the results table is empty, indicating that no numerical columns were available for skewness evaluation or that no skewness values were computed for the dataset.

\n", - "

Key insights:

\n", - "
    \n", - "
  • No numerical columns evaluated: The results table contains no entries, indicating the absence of numerical columns or computed skewness values in the dataset.
  • \n", - "
  • No skewness values reported: There are no skewness statistics available for review, and no columns are flagged for exceeding the skewness threshold.
  • \n", - "
\n", - "

The absence of results in the skewness evaluation indicates that the dataset did not contain numerical columns suitable for this test or that no skewness calculations were performed. As a result, no assessment of distributional asymmetry or related data quality risks can be made based on this test.

\n", - "\n", - "
\n", - "

Tables

\n", - "
\n", - "

Skewness Results for Dataset

\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " Test Result: Unique Rows (validmind.data_validation.UniqueRows)\n", - "
\n", - "
\n", - "
\n", - " \n", - " \n", - "
\n", - "

❌ Unique Rows

\n", - "
\n", - " \n", - "
\n", - "

The UniqueRows test evaluates the diversity of the dataset by measuring the proportion of unique values in each column relative to the total row count. The results table presents, for each column, the number and percentage of unique values, along with a pass/fail outcome based on a predefined uniqueness threshold. All feature columns except the target variable exhibit 100% unique values, while the target column shows a markedly lower percentage of unique values and does not meet the threshold.

\n", - "

Key insights:

\n", - "
    \n", - "
  • All feature columns exhibit maximum uniqueness: Each feature column (e.g., age, tenure, balance) contains 400 unique values, corresponding to 100% uniqueness, and passes the test threshold.
  • \n", - "
  • Target variable fails uniqueness threshold: The target column contains only 2 unique values, representing 0.5% uniqueness, and fails the test.
  • \n", - "
\n", - "

The dataset demonstrates complete row-level uniqueness across all feature columns, indicating high data diversity and minimal duplication in the input features. The target variable, as expected for a binary outcome, contains only two unique values and does not meet the uniqueness threshold, resulting in a fail outcome for this column. Overall, the feature set passes the UniqueRows test, confirming strong data variety in the predictors.

\n", - "\n", - "
\n", - "

Tables

\n", - "
\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
ColumnNumber of Unique ValuesPercentage of Unique Values (%)Pass/Fail
age400100.0Pass
tenure400100.0Pass
balance400100.0Pass
geo_encoded400100.0Pass
credit_score400100.0Pass
num_products400100.0Pass
gender_encoded400100.0Pass
has_credit_card400100.0Pass
estimated_salary400100.0Pass
is_active_member400100.0Pass
target20.5Fail
\n", - "
\n", - "
\n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " Test Result: Too Many Zero Values (validmind.data_validation.TooManyZeroValues)\n", - "
\n", - "
\n", - "
\n", - " \n", - " \n", - "
\n", - "

✅ Too Many Zero Values

\n", - "
\n", - " \n", - "
\n", - "

The TooManyZeroValues test identifies numerical columns in the dataset that contain a proportion of zero values exceeding a predefined threshold, with the intent to highlight potential data sparsity or lack of variation. The results are presented in a tabular format, summarizing the count and percentage of zero values for each numerical column, and indicating whether each column passes or fails the threshold criterion. In this test execution, the results table is empty, indicating that no numerical columns were identified or assessed for excessive zero values.

\n", - "

Key insights:

\n", - "
    \n", - "
  • No numerical columns evaluated: The results table contains no entries, indicating that the dataset did not include any numerical columns for assessment.
  • \n", - "
  • No excessive zero values detected: With no numerical columns present, there are no instances of columns exceeding the zero value threshold.
  • \n", - "
\n", - "

The absence of numerical columns in the dataset precluded the evaluation of zero value prevalence. As a result, the test did not identify any columns with excessive zero values, and no data sparsity concerns were observed in this context.

\n", - "\n", - "
\n", - "

Tables

\n", - "
\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - " \n", - "
\n", - "\n", - " \n", - "
\n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " Classifier Metrics\n", - "
\n", - "
\n", - "
\n", - " \n", - "
Test suite for sklearn classifier metrics
\n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " Test Result: Model Metadata (validmind.model_validation.ModelMetadata)\n", - "
\n", - "
\n", - "
\n", - " \n", - " \n", - "
\n", - "

Model Metadata

\n", - "
\n", - " \n", - "
\n", - "

The ModelMetadata test compares key metadata attributes across models to assess consistency and completeness in model documentation. The resulting summary table presents the modeling technique, framework, framework version, and programming language for the evaluated model. This information enables a structured review of model architecture and implementation characteristics.

\n", - "

Key insights:

\n", - "
    \n", - "
  • Consistent use of SKlearnModel technique: The model utilizes the SKlearnModel technique, indicating a standardized modeling approach.
  • \n", - "
  • Uniform framework and version: The modeling framework is sklearn, with version 1.8.0 specified, providing clarity on the software environment.
  • \n", - "
  • Single programming language identified: Python is the sole programming language reported, supporting consistency in codebase and deployment.
  • \n", - "
\n", - "

The metadata comparison reveals a single, well-documented model instance with clear specification of modeling technique, framework, version, and programming language. No inconsistencies or missing fields are observed in the reported metadata.

\n", - "\n", - "
\n", - "

Tables

\n", - "
\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
Modeling TechniqueModeling FrameworkFramework VersionProgramming Language
SKlearnModelsklearn1.8.0Python
\n", - "
\n", - "
\n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " ❌ Failed Test: Dataset Split (validmind.data_validation.DatasetSplit)\n", - "
\n", - "
\n", - " \n", - " \n", - " \n", - " \n", - "
\n", - "

Failed to run 'Dataset Split'

\n", - "

Missing required input: datasets.

\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " Test Result: Confusion Matrix (validmind.model_validation.sklearn.ConfusionMatrix)\n", - "
\n", - "
\n", - "
\n", - " \n", - " \n", - "
\n", - "

Confusion Matrix

\n", - "
\n", - " \n", - "
\n", - "

The Confusion Matrix test evaluates the classification performance of the model by comparing predicted and actual class labels, providing a breakdown of true positives, true negatives, false positives, and false negatives. The resulting matrix visually displays the distribution of correct and incorrect predictions for each class. The matrix for this model shows the counts for each outcome, enabling assessment of both overall accuracy and the types of errors made.

\n", - "

Key insights:

\n", - "
    \n", - "
  • High true positive and true negative counts: The model correctly identified 167 true positives and 198 true negatives, indicating strong performance in both classes.
  • \n", - "
  • Low false positive and false negative rates: There are 17 false positives and 18 false negatives, reflecting a low rate of misclassification for both error types.
  • \n", - "
  • Balanced error distribution: The numbers of false positives and false negatives are similar, suggesting no significant bias toward one type of misclassification.
  • \n", - "
\n", - "

The confusion matrix indicates that the model demonstrates strong classification accuracy, with high counts of correct predictions and low, balanced rates of both false positives and false negatives. This distribution suggests effective discrimination between classes and a low incidence of misclassification.

\n", - "\n", - "
\n", - "

Figures

\n", - "
\n", - "
\n", - "
\"ValidMind
\n", - " \n", - " \n", - " \n", - "
\n", - "
\n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " Test Result: Classifier Performance (validmind.model_validation.sklearn.ClassifierPerformance)\n", - "
\n", - "
\n", - "
\n", - " \n", - " \n", - "
\n", - "

Classifier Performance

\n", - "
\n", - " \n", - "
\n", - "

The Classifier Performance test evaluates the predictive effectiveness of classification models by reporting precision, recall, F1-score, accuracy, and ROC AUC metrics. The results are presented in tabular format, with class-specific and aggregate (macro and weighted) averages for precision, recall, and F1, as well as overall accuracy and ROC AUC values. The tables provide a comprehensive view of the model's ability to correctly classify both classes and summarize overall discriminative power.

\n", - "

Key insights:

\n", - "
    \n", - "
  • Consistently high precision and recall across classes: Precision and recall values for both Class 0 (precision: 0.9167, recall: 0.9209) and Class 1 (precision: 0.9076, recall: 0.9027) are closely aligned, indicating balanced performance.
  • \n", - "
  • Strong aggregate performance metrics: Weighted and macro averages for precision, recall, and F1-score are all approximately 0.912, reflecting uniform effectiveness across classes.
  • \n", - "
  • High overall accuracy and ROC AUC: The model achieves an accuracy of 0.9125 and a ROC AUC of 0.973, demonstrating strong overall classification accuracy and excellent discriminative capability.
  • \n", - "
\n", - "

The results indicate that the model delivers robust and balanced classification performance, with high precision, recall, and F1-scores for both classes. The elevated accuracy and ROC AUC values further confirm the model's strong ability to distinguish between classes, with no evidence of class imbalance or performance degradation.

\n", - "\n", - "
\n", - "

Tables

\n", - "
\n", - "

Precision, Recall, and F1

\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
ClassPrecisionRecallF1
00.91670.92090.9188
10.90760.90270.9051
Weighted Average0.91250.91250.9125
Macro Average0.91210.91180.9120
\n", - "
\n", - " \n", - "
\n", - "

Accuracy and ROC AUC

\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
MetricValue
Accuracy0.9125
ROC AUC0.9730
\n", - "
\n", - "
\n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " Test Result: Permutation Feature Importance (validmind.model_validation.sklearn.PermutationFeatureImportance)\n", - "
\n", - "
\n", - "
\n", - " \n", - " \n", - "
\n", - "

Permutation Feature Importance

\n", - "
\n", - " \n", - "
\n", - "

The Permutation Feature Importance test evaluates the relative importance of each input feature by measuring the decrease in model performance when the feature's values are randomly permuted. The resulting plot displays the permutation importance scores for all features, with higher values indicating greater influence on the model's predictive accuracy. The horizontal bars represent the magnitude of importance for each feature, allowing for direct comparison of their contributions.

\n", - "

Key insights:

\n", - "
    \n", - "
  • Dominant influence of is_active_member: The is_active_member feature exhibits the highest permutation importance, with a score exceeding 0.25, indicating it is the most influential variable in the model.
  • \n", - "
  • Substantial contribution from num_products: num_products is the second most important feature, with an importance score above 0.10, reflecting a significant impact on model predictions.
  • \n", - "
  • Moderate importance for has_credit_card and geo_encoded: has_credit_card and geo_encoded display intermediate importance scores, each contributing meaningfully but less than the top two features.
  • \n", - "
  • Minimal impact from remaining features: credit_score, gender_encoded, age, balance, estimated_salary, and tenure all show low permutation importance scores, indicating limited influence on model performance.
  • \n", - "
\n", - "

The permutation importance results indicate that model predictions are primarily driven by is_active_member and num_products, with moderate contributions from has_credit_card and geo_encoded. The remaining features have minimal impact on predictive accuracy, suggesting a concentrated reliance on a small subset of variables. This distribution of importance highlights the model's dependence on a few key features, with most others playing a limited role in prediction.

\n", - "\n", - "
\n", - "

Figures

\n", - "
\n", - "
\n", - "
\"ValidMind
\n", - " \n", - " \n", - " \n", - "
\n", - "
\n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " ❌ Failed Test: Precision Recall Curve (validmind.model_validation.sklearn.PrecisionRecallCurve)\n", - "
\n", - "
\n", - " \n", - " \n", - " \n", - " \n", - "
\n", - "

Failed to run 'Precision Recall Curve'

\n", - "

y_true takes value in {'0', '1'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.

\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " Test Result: ROC Curve (validmind.model_validation.sklearn.ROCCurve)\n", - "
\n", - "
\n", - "
\n", - " \n", - " \n", - "
\n", - "

ROC Curve

\n", - "
\n", - " \n", - "
\n", - "

The ROC Curve test evaluates the binary classification performance of the model by plotting the Receiver Operating Characteristic (ROC) curve and calculating the Area Under the Curve (AUC) score. The resulting plot displays the trade-off between the true positive rate and false positive rate across various thresholds, with the model's ROC curve compared against a baseline representing random classification. The AUC value is provided as a quantitative summary of the model's discriminative ability.

\n", - "

Key insights:

\n", - "
    \n", - "
  • High AUC score observed: The ROC curve yields an AUC of 0.97, indicating strong separation between the positive and negative classes.
  • \n", - "
  • ROC curve consistently above random baseline: The model's ROC curve remains well above the diagonal line representing random performance (AUC = 0.5) across all thresholds.
  • \n", - "
  • Steep initial rise in true positive rate: The curve demonstrates a rapid increase in true positive rate at low false positive rates, reflecting effective early discrimination.
  • \n", - "
\n", - "

The test results demonstrate that the model exhibits robust discriminative performance on the test dataset, as evidenced by the high AUC value and the ROC curve's consistent dominance over the random baseline. The observed curve shape and magnitude of the AUC indicate effective binary classification capability across a range of thresholds.

\n", - "\n", - "
\n", - "

Figures

\n", - "
\n", - "
\n", - "
\"ValidMind
\n", - " \n", - " \n", - " \n", - "
\n", - "
\n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " ❌ Failed Test: Population Stability Index (validmind.model_validation.sklearn.PopulationStabilityIndex)\n", - "
\n", - "
\n", - " \n", - " \n", - " \n", - " \n", - "
\n", - "

Failed to run 'Population Stability Index'

\n", - "

Missing required input: datasets.

\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " ❌ Failed Test: SHAP Global Importance (validmind.model_validation.sklearn.SHAPGlobalImportance)\n", - "
\n", - "
\n", - " \n", - " \n", - " \n", - " \n", - "
\n", - "

Failed to run 'SHAP Global Importance'

\n", - "

Unable to load test 'validmind.model_validation.sklearn.SHAPGlobalImportance' from validmind test provider

\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - "
\n", - "\n", - " \n", - "
\n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " Classifier Validation\n", - "
\n", - "
\n", - "
\n", - " \n", - "
Test suite for sklearn classifier models
\n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " Test Result: Minimum Accuracy (validmind.model_validation.sklearn.MinimumAccuracy)\n", - "
\n", - "
\n", - "
\n", - " \n", - " \n", - "
\n", - "

✅ Minimum Accuracy

\n", - "
\n", - " \n", - "
\n", - "

The Minimum Accuracy test evaluates whether the model's prediction accuracy meets or exceeds a specified threshold, providing a direct measure of overall model correctness. The results table presents the model's achieved accuracy score, the threshold applied, and the corresponding pass/fail outcome. The accuracy score is compared against the threshold to determine if the model's performance is sufficient according to the test criteria.

\n", - "

Key insights:

\n", - "
    \n", - "
  • Accuracy score exceeds threshold: The model achieved an accuracy score of 0.9125, which is above the minimum threshold of 0.7.
  • \n", - "
  • Test outcome is Pass: The test result is marked as "Pass," indicating that the model's accuracy meets the required standard for this evaluation.
  • \n", - "
\n", - "

The results demonstrate that the model's prediction accuracy substantially surpasses the minimum threshold established for this test. The observed accuracy score indicates strong overall model performance in terms of correct predictions relative to the total, as measured by the test methodology.

\n", - "\n", - "
\n", - "

Tables

\n", - "
\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
ScoreThresholdPass/Fail
0.91250.7Pass
\n", - "
\n", - "
\n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " ❌ Failed Test: Minimum F1 Score (validmind.model_validation.sklearn.MinimumF1Score)\n", - "
\n", - "
\n", - " \n", - " \n", - " \n", - " \n", - "
\n", - "

Failed to run 'Minimum F1 Score'

\n", - "

pos_label=1 is not a valid label. It should be one of ['0', '1']

\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " Test Result: Minimum ROCAUC Score (validmind.model_validation.sklearn.MinimumROCAUCScore)\n", - "
\n", - "
\n", - "
\n", - " \n", - " \n", - "
\n", - "

✅ Minimum ROCAUC Score

\n", - "
\n", - " \n", - "
\n", - "

The Minimum ROC AUC Score test evaluates whether the model's multiclass ROC AUC score on the validation dataset meets or exceeds a predefined threshold, serving as an indicator of the model's ability to distinguish between classes. The results table presents the calculated ROC AUC score, the threshold applied, and the corresponding pass/fail outcome. The observed ROC AUC score is 0.973, with a threshold set at 0.5, and the test outcome is recorded as "Pass."

\n", - "

Key insights:

\n", - "
    \n", - "
  • ROC AUC score substantially exceeds threshold: The model achieved a ROC AUC score of 0.973, which is significantly higher than the minimum threshold of 0.5.
  • \n", - "
  • Test outcome is a clear pass: The pass/fail indicator confirms that the model's performance on this metric meets the required standard.
  • \n", - "
\n", - "

The results demonstrate that the model exhibits strong discriminatory power between classes, as evidenced by the high ROC AUC score relative to the threshold. The test outcome indicates that the model satisfies the minimum performance criterion established for this evaluation.

\n", - "\n", - "
\n", - "

Tables

\n", - "
\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
ScoreThresholdPass/Fail
0.9730.5Pass
\n", - "
\n", - "
\n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " ❌ Failed Test: Training Test Degradation (validmind.model_validation.sklearn.TrainingTestDegradation)\n", - "
\n", - "
\n", - " \n", - " \n", - " \n", - " \n", - "
\n", - "

Failed to run 'Training Test Degradation'

\n", - "

Missing required input: datasets.

\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " ❌ Failed Test: Models Performance Comparison (validmind.model_validation.sklearn.ModelsPerformanceComparison)\n", - "
\n", - "
\n", - " \n", - " \n", - " \n", - " \n", - "
\n", - "

Failed to run 'Models Performance Comparison'

\n", - "

Missing required input: models.

\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - "
\n", - "\n", - " \n", - "
\n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " Classifier Model Diagnosis\n", - "
\n", - "
\n", - "
\n", - " \n", - "
Test suite for sklearn classifier model diagnosis tests
\n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " ❌ Failed Test: Overfit Diagnosis (validmind.model_validation.sklearn.OverfitDiagnosis)\n", - "
\n", - "
\n", - " \n", - " \n", - " \n", - " \n", - "
\n", - "

Failed to run 'Overfit Diagnosis'

\n", - "

Missing required input: datasets.

\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " ❌ Failed Test: Weakspots Diagnosis (validmind.model_validation.sklearn.WeakspotsDiagnosis)\n", - "
\n", - "
\n", - " \n", - " \n", - " \n", - " \n", - "
\n", - "

Failed to run 'Weakspots Diagnosis'

\n", - "

Missing required input: datasets.

\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - " ❌ Failed Test: Robustness Diagnosis (validmind.model_validation.sklearn.RobustnessDiagnosis)\n", - "
\n", - "
\n", - " \n", - " \n", - " \n", - " \n", - "
\n", - "

Failed to run 'Robustness Diagnosis'

\n", - "

Missing required input: datasets.

\n", - "
\n", - " \n", - "
\n", - "
\n", - " \n", - "
\n", - "\n", - " \n", - "
\n", - "
\n", - "
\n", - " \n", - "
\n", - "\n", - " \n", - "
" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Full test suite completed and results sent to ValidMind Platform.\n" - ] - } - ], + "outputs": [], "source": [ "test_suite_result = vm.run_test_suite(\n", " \"classifier_full_suite\",\n", From d6d52c22972016f1f729cfe2341b39e49737d599 Mon Sep 17 00:00:00 2001 From: Nate Shim Date: Thu, 7 May 2026 12:44:40 -0700 Subject: [PATCH 5/7] edits --- .../validmind_databricks_quickstart.ipynb | 13 ++----------- 1 file changed, 2 insertions(+), 11 deletions(-) diff --git a/notebooks/databricks/validmind_databricks_quickstart.ipynb b/notebooks/databricks/validmind_databricks_quickstart.ipynb index 5887f9d04..ed9b9ba4c 100644 --- a/notebooks/databricks/validmind_databricks_quickstart.ipynb +++ b/notebooks/databricks/validmind_databricks_quickstart.ipynb @@ -62,18 +62,9 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ValidMind Library version: 2.12.5\n", - "Installation successful!\n" - ] - } - ], + "outputs": [], "source": [ "import importlib.metadata\n", "version = importlib.metadata.version('validmind')\n", From 944e398817fbfc31d3221a02d9bcdcf7536b6bee Mon Sep 17 00:00:00 2001 From: Nik Richers Date: Thu, 7 May 2026 16:04:25 -0700 Subject: [PATCH 6/7] Light edit pass --- .../validmind_databricks_quickstart.ipynb | 1044 +++++++++-------- 1 file changed, 535 insertions(+), 509 deletions(-) diff --git a/notebooks/databricks/validmind_databricks_quickstart.ipynb b/notebooks/databricks/validmind_databricks_quickstart.ipynb index ed9b9ba4c..bfaa310b6 100644 --- a/notebooks/databricks/validmind_databricks_quickstart.ipynb +++ b/notebooks/databricks/validmind_databricks_quickstart.ipynb @@ -1,512 +1,538 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# ValidMind + Databricks Quickstart\n", - "\n", - "This notebook validates that the ValidMind Library works correctly within a Databricks Collaborative Notebook environment. It demonstrates:\n", - "\n", - "- Installing and initializing the ValidMind Library\n", - "- Loading data from a Unity Catalog table via Spark\n", - "- Training a simple classification model\n", - "- Running ValidMind tests and sending results to the ValidMind Platform\n", - "\n", - "## Before you begin\n", - "\n", - "You will need:\n", - "1. A running Databricks workspace with Unity Catalog enabled\n", - "2. A ValidMind account with a registered model\n", - "3. Your ValidMind API credentials (API key, API secret, model identifier)\n", - "\n", - "To get your credentials: log in to ValidMind → **Model Inventory** → select your model → **Getting Started** → **Copy snippet to clipboard**.\n", - "\n", - "> **Note:** If you don't have a UC table ready, this notebook includes a fallback that generates synthetic data so you can still validate the full workflow." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# ValidMind + Databricks Quickstart\n", + "\n", + "Use this notebook to install and run the ValidMind Library inside a Databricks Collaborative Notebook, load data from a Unity Catalog table linked to your model in ValidMind, train a simple classification model, and send the results to the ValidMind Platform.\n", + "\n", + "In this notebook, you will:\n", + "\n", + "- Install and initialize the ValidMind Library\n", + "- Load data from a Unity Catalog table linked to your model in ValidMind\n", + "- Train a simple classification model\n", + "- Run ValidMind tests and send the results to the ValidMind Platform\n", + "\n", + "## Before you begin\n", + "\n", + "You will need:\n", + "1. A running Databricks workspace with Unity Catalog enabled\n", + "2. A ValidMind account with a registered model\n", + "3. Your ValidMind API credentials (API key, API secret, model identifier)\n", + "\n", + "To get your credentials: log in to ValidMind → **Model Inventory** → select your model → **Getting Started** → **Copy snippet to clipboard**.\n", + "\n", + "For step-by-step instructions on setting up the Databricks integration and linking a Unity Catalog table to your model, refer to [Synchronize with Databricks](https://docs.validmind.ai/guide/integrations/integrations-examples/synchronize-with-databricks.html).\n", + "\n", + "> **Note:** If you don't have a Unity Catalog table linked to your model yet, this notebook includes a synthetic-data fallback so you can still run through the full workflow." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 1 — Install the ValidMind Library\n", + "\n", + "Run this cell first. Databricks requires a Python restart after `%pip install`." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "%pip install -q validmind" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "# Restart Python kernel to pick up newly installed packages\n", + "dbutils.library.restartPython()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 2 — Verify installation\n", + "\n", + "Confirm that the ValidMind Library installed successfully and check the version available in your notebook environment:" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "import importlib.metadata\n", + "version = importlib.metadata.version('validmind')\n", + "print(f'ValidMind Library version: {version}')\n", + "print('Installation successful!')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 3 — Initialize the ValidMind Library\n", + "\n", + "Initialize the ValidMind Library with the *code snippet* unique to your model so that test results are uploaded to the correct model in the ValidMind Platform.\n", + "\n", + "You can supply your credentials in either of two ways:\n", + "\n", + "- **Databricks widgets**: set widgets named `vm_api_host`, `vm_api_key`, `vm_api_secret`, and `vm_model_cuid` on the notebook. This is convenient when you parameterize the notebook as part of a Databricks job.\n", + "- **Edit the next cell directly**: replace the placeholder values with your own credentials.\n", + "\n", + "To get your credentials:\n", + "\n", + "1. In ValidMind, go to **Model Inventory** and select your model.\n", + "2. Open **Getting Started** and click **Copy snippet to clipboard**.\n", + "3. Paste the values into the next cell, or use them to set the corresponding widgets:" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "import validmind as vm\n", + "\n", + "# ---------------------------------------------------------------------------\n", + "# Credentials are read from Databricks widgets if set. Otherwise, replace the\n", + "# placeholder values below before running this cell.\n", + "# ---------------------------------------------------------------------------\n", + "try:\n", + " api_host = dbutils.widgets.getAll().get(\"vm_api_host\", \"\")\n", + " api_key = dbutils.widgets.getAll().get(\"vm_api_key\", \"\")\n", + " api_secret = dbutils.widgets.getAll().get(\"vm_api_secret\", \"\")\n", + " model_cuid = dbutils.widgets.getAll().get(\"vm_model_cuid\", \"\")\n", + "except NameError:\n", + " # dbutils is not available — running outside Databricks\n", + " api_host = \"\" # replace with your API host\n", + " api_key = \"\" # replace with your API key\n", + " api_secret = \"\" # replace with your API secret\n", + " model_cuid = \"\" # replace with your model CUID\n", + "\n", + "vm.init(\n", + " api_host=api_host,\n", + " api_key=api_key,\n", + " api_secret=api_secret,\n", + " model=model_cuid,\n", + ")\n", + "" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 4 — Load data from your linked Databricks table\n", + "\n", + "Load the data for this notebook from a Unity Catalog table that you've linked to your model in ValidMind. Once a table binding is set up, ValidMind syncs the data and makes it available through the tracking API. You don't need a Spark session or direct Unity Catalog credentials in this notebook.\n", + "\n", + "Before running the next cell, make sure you have:\n", + "\n", + "1. A Databricks integration configured in **Settings → Integrations → Databricks**\n", + "2. A `table` binding created for your model that links a Unity Catalog table to it\n", + "3. At least one successful sync (the initial sync runs automatically when you create the binding)\n", + "\n", + "If you don't have a table binding yet, set `USE_SYNTHETIC_FALLBACK = True` in the next cell to run this notebook with generated data instead." + ] + }, + { + "cell_type": "code", + "metadata": { + "scrolled": true + }, + "source": [ + "import requests\n", + "import pandas as pd\n", + "from validmind import api_client as _vm_client\n", + "\n", + "# Set to True only if you don't have a Databricks table binding set up yet\n", + "USE_SYNTHETIC_FALLBACK = False\n", + "\n", + "# ---------------------------------------------------------------------------\n", + "# Load from ValidMind — uses the linked Databricks table binding for this model\n", + "# ---------------------------------------------------------------------------\n", + "if not USE_SYNTHETIC_FALLBACK:\n", + " _api_host = _vm_client.get_api_host() # same host as vm.init()\n", + " _headers = _vm_client._get_api_headers()\n", + "\n", + " _response = requests.get(\n", + " f\"{_api_host}/integrations/dataset\",\n", + " headers=_headers,\n", + " timeout=30,\n", + " )\n", + "\n", + " if _response.status_code == 200:\n", + " _data = _response.json()\n", + " TABLE_NAME = _data.get(\"table_name\", \"unknown\")\n", + " TARGET_COLUMN = \"target\" # <-- update if your table uses a different column name\n", + " row_data = _data.get(\"row_data\", [])\n", + "\n", + " if not row_data:\n", + " raise RuntimeError(\n", + " f\"Binding found for table '{TABLE_NAME}' but row_data is empty. \"\n", + " \"The sync may still be in progress — wait a moment and re-run this cell.\"\n", + " )\n", + "\n", + " df = pd.DataFrame(row_data)\n", + "\n", + " if TARGET_COLUMN not in df.columns:\n", + " raise ValueError(\n", + " f\"Column '{TARGET_COLUMN}' not found in synced data. \"\n", + " f\"Available columns: {list(df.columns)}. \"\n", + " \"Update TARGET_COLUMN above to match your table's target column.\"\n", + " )\n", + "\n", + " print(f\"Loaded {len(df):,} rows, {len(df.columns)} columns from {TABLE_NAME}\")\n", + " print(f\"Last synced: {_data.get('last_synced_at', 'unknown')}\")\n", + " print(f\"Target distribution: {df[TARGET_COLUMN].value_counts().to_dict()}\")\n", + " display(df.head())\n", + "\n", + " elif _response.status_code == 404:\n", + " raise RuntimeError(\n", + " \"No active Databricks table binding found for this model.\\n\\n\"\n", + " \"To fix:\\n\"\n", + " \" 1. Go to ValidMind → Settings → Integrations → Databricks\\n\"\n", + " \" 2. Open the model binding browser and select a Unity Catalog table\\n\"\n", + " \" 3. Wait ~30 seconds for the initial sync to complete\\n\"\n", + " \" 4. Re-run this cell\\n\\n\"\n", + " \"Or set USE_SYNTHETIC_FALLBACK = True above to continue with generated data.\"\n", + " )\n", + " else:\n", + " raise RuntimeError(\n", + " f\"Unexpected error loading dataset from ValidMind: \"\n", + " f\"{_response.status_code} — {_response.text}\"\n", + " )" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "# ---------------------------------------------------------------------------\n", + "# Synthetic data fallback — runs when USE_SYNTHETIC_FALLBACK = True\n", + "# Uses the Bank Customer Churn dataset pattern from ValidMind examples\n", + "# ---------------------------------------------------------------------------\n", + "if USE_SYNTHETIC_FALLBACK:\n", + " import numpy as np\n", + " from sklearn.datasets import make_classification\n", + "\n", + " np.random.seed(42)\n", + " X, y = make_classification(\n", + " n_samples=1000,\n", + " n_features=10,\n", + " n_informative=6,\n", + " n_redundant=2,\n", + " random_state=42,\n", + " )\n", + " feature_names = [\n", + " \"credit_score\", \"age\", \"tenure\", \"balance\",\n", + " \"num_products\", \"has_credit_card\", \"is_active_member\",\n", + " \"estimated_salary\", \"geography_encoded\", \"gender_encoded\",\n", + " ]\n", + " df = pd.DataFrame(X, columns=feature_names)\n", + " df[\"target\"] = y\n", + " TARGET_COLUMN = \"target\"\n", + " TABLE_NAME = \"synthetic\"\n", + "\n", + " print(f\"Using synthetic dataset: {len(df):,} rows, {len(df.columns)} columns\")\n", + " print(f\"Target distribution: {df[TARGET_COLUMN].value_counts().to_dict()}\")\n", + " display(df.head())" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 5 — Prepare train/test split\n", + "\n", + "Split the dataset into a training set and a test set so you can train the model on one slice of the data and evaluate how it generalizes on data it hasn't seen:" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "from sklearn.model_selection import train_test_split\n", + "\n", + "feature_columns = [c for c in df.columns if c != TARGET_COLUMN]\n", + "\n", + "train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)\n", + "\n", + "print(f'Train set: {len(train_df):,} rows')\n", + "print(f'Test set: {len(test_df):,} rows')\n", + "print(f'Features: {feature_columns}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 6 — Train a simple model\n", + "\n", + "Train a gradient boosting classifier on the training set. This is a small, fast model that's well-suited to a quickstart. The goal here is to produce something documentable end-to-end, not to tune for accuracy." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "from sklearn.ensemble import GradientBoostingClassifier\n", + "\n", + "model = GradientBoostingClassifier(n_estimators=100, random_state=42)\n", + "model.fit(train_df[feature_columns], train_df[TARGET_COLUMN])\n", + "\n", + "train_accuracy = model.score(train_df[feature_columns], train_df[TARGET_COLUMN])\n", + "test_accuracy = model.score(test_df[feature_columns], test_df[TARGET_COLUMN])\n", + "\n", + "print(f'Train accuracy: {train_accuracy:.4f}')\n", + "print(f'Test accuracy: {test_accuracy:.4f}')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 7 — Register datasets and model with ValidMind\n", + "\n", + "Before you can run tests, ValidMind needs to know about your datasets and your model. Wrap the training and test DataFrames with [`vm.init_dataset()`](https://docs.validmind.ai/validmind/validmind.html#init_dataset) and the trained classifier with [`vm.init_model()`](https://docs.validmind.ai/validmind/validmind.html#init_model). Each call returns a ValidMind object that the test functions accept as input.\n", + "\n", + "The `input_id` you pass identifies each input when results are sent to the ValidMind Platform." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "vm_train_ds = vm.init_dataset(\n", + " dataset=train_df,\n", + " input_id=\"train_dataset\",\n", + " target_column=TARGET_COLUMN,\n", + ")\n", + "\n", + "vm_test_ds = vm.init_dataset(\n", + " dataset=test_df,\n", + " input_id=\"test_dataset\",\n", + " target_column=TARGET_COLUMN,\n", + ")\n", + "\n", + "vm_model = vm.init_model(\n", + " model=model,\n", + " input_id=\"gradient_boosting_model\",\n", + ")\n", + "\n", + "print('Datasets and model registered with ValidMind.')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 8 — Assign predictions to datasets\n", + "\n", + "Many tests compare predicted values against actual values, so ValidMind needs the model's predictions attached to each dataset. The [`assign_predictions()` method](https://docs.validmind.ai/validmind/validmind/vm_models.html#assign_predictions) computes predictions from your model and links them to the dataset object, once for the training set and once for the test set:" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "vm_train_ds.assign_predictions(model=vm_model)\n", + "vm_test_ds.assign_predictions(model=vm_model)\n", + "\n", + "print('Predictions assigned.')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 9 — Run individual tests\n", + "\n", + "Run a few individual tests against your registered datasets and model to get familiar with how ValidMind tests work before running the full suite. Each [`vm.tests.run_test()`](https://docs.validmind.ai/validmind/validmind/tests.html#run_test) call executes one test, renders the result inline in this notebook, and `result.log()` sends the result to the ValidMind Platform:" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "# Dataset statistics — validates data documentation capability\n", + "result = vm.tests.run_test(\n", + " \"validmind.data_validation.DatasetDescription\",\n", + " inputs={\"dataset\": vm_train_ds},\n", + ")\n", + "result.log()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "# Class imbalance check\n", + "result = vm.tests.run_test(\n", + " \"validmind.data_validation.ClassImbalance\",\n", + " inputs={\"dataset\": vm_train_ds},\n", + ")\n", + "result.log()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "# Confusion matrix — validates model performance visualization\n", + "result = vm.tests.run_test(\n", + " \"validmind.model_validation.sklearn.ConfusionMatrix\",\n", + " inputs={\"dataset\": vm_test_ds, \"model\": vm_model},\n", + ")\n", + "result.log()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "# ROC curve\n", + "result = vm.tests.run_test(\n", + " \"validmind.model_validation.sklearn.ROCCurve\",\n", + " inputs={\"dataset\": vm_test_ds, \"model\": vm_model},\n", + ")\n", + "result.log()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "# Feature importance\n", + "result = vm.tests.run_test(\n", + " \"validmind.model_validation.sklearn.FeatureImportance\",\n", + " inputs={\"dataset\": vm_train_ds, \"model\": vm_model},\n", + ")\n", + "result.log()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 10 — Run the full test suite\n", + "\n", + "Run the complete classifier documentation suite. This single call executes every test in the suite and sends all results to the ValidMind Platform, where they populate the corresponding sections of your model documentation:" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "test_suite_result = vm.run_test_suite(\n", + " \"classifier_full_suite\",\n", + " inputs={\n", + " \"dataset\": vm_test_ds,\n", + " \"model\": vm_model,\n", + " \"train_dataset\": vm_train_ds,\n", + " \"test_dataset\": vm_test_ds,\n", + " },\n", + ")\n", + "print('Full test suite completed and results sent to ValidMind Platform.')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 11 — Verify results on the platform\n", + "\n", + "To see the results of this notebook in the ValidMind Platform:\n", + "\n", + "1. Go to the [ValidMind Platform](https://app.prod.validmind.ai) (or your ValidMind instance).\n", + "2. Navigate to **Model Inventory** and select your model.\n", + "3. Open the **Documentation** tab.\n", + "4. Confirm that the test results from this notebook appear in the relevant sections.\n", + "\n", + "After a successful run, you should see the following results in your model's documentation:\n", + "\n", + "- Dataset Description table\n", + "- Class Imbalance chart\n", + "- Confusion Matrix\n", + "- ROC Curve\n", + "- Feature Importance chart\n", + "- Full classifier suite results\n", + "\n", + "---\n", + "\n", + "## Troubleshooting\n", + "\n", + "If you run into any of the issues below, the table lists the likely fix:\n", + "\n", + "| Issue | Fix |\n", + "|-------|-----|\n", + "| `ModuleNotFoundError` after install | Re-run the `dbutils.library.restartPython()` cell. |\n", + "| `ConnectionError` on `vm.init()` | Your workspace may block outbound traffic. Check your network policy, or use a cluster with internet access. |\n", + "| `401 Unauthorized` on `vm.init()` | The API key or secret is incorrect. Copy your credentials again from the ValidMind Platform. |\n", + "| `numpy` version conflict | Pin a compatible version with `%pip install -q validmind \"numpy>=1.23,<2.0.0\"`. |\n", + "| `404` on dataset load | No Databricks table binding was found. Create one in **Settings → Integrations → Databricks**, then wait for the initial sync to complete. |\n", + "| `row_data is empty` after binding created | The initial sync is still running. Wait about 30 seconds and re-run Step 4. |\n", + "| Wrong columns or target not found | Update `TARGET_COLUMN` in Step 4 to match the target column in your Unity Catalog table. |\n", + "| Want to try the notebook without a binding | Set `USE_SYNTHETIC_FALLBACK = True` in Step 4 to use generated data. |" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.7" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Step 1 — Install the ValidMind Library\n", - "\n", - "Run this cell first. Databricks requires a Python restart after `%pip install`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%pip install -q validmind" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Restart Python kernel to pick up newly installed packages\n", - "dbutils.library.restartPython()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Step 2 — Verify installation" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import importlib.metadata\n", - "version = importlib.metadata.version('validmind')\n", - "print(f'ValidMind Library version: {version}')\n", - "print('Installation successful!')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Step 3 — Initialize the ValidMind Library\n", - "\n", - "Replace the placeholder values below with your actual credentials from the ValidMind Platform.\n", - "\n", - "For development, use `https://api.dev.vm.validmind.ai/api/v1/tracking` as the `api_host`.\n", - "For production, use `https://app.prod.validmind.ai/api/v1/tracking/tracking`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import validmind as vm\n", - "\n", - "# ---------------------------------------------------------------------------\n", - "# Credentials — injected by ValidMind Platform when run via \"Run Tests\",\n", - "# or replace the fallback values to run this notebook manually.\n", - "# ---------------------------------------------------------------------------\n", - "try:\n", - " api_host = dbutils.widgets.getAll().get(\"vm_api_host\", \"https://api.dev.vm.validmind.ai/api/v1/tracking\")\n", - " api_key = dbutils.widgets.getAll().get(\"vm_api_key\", \"YOUR_API_KEY\")\n", - " api_secret = dbutils.widgets.getAll().get(\"vm_api_secret\", \"YOUR_API_SECRET\")\n", - " model_cuid = dbutils.widgets.getAll().get(\"vm_model_cuid\", \"YOUR_MODEL_CUID\")\n", - "except NameError:\n", - " # dbutils is not available — running outside Databricks\n", - " api_host = \"https://api.dev.vm.validmind.ai/api/v1/tracking\"\n", - " api_key = \"YOUR_API_KEY\" # replace with your API key\n", - " api_secret = \"YOUR_API_SECRET\" # replace with your API secret\n", - " model_cuid = \"YOUR_MODEL_CUID\" # replace with your model CUID\n", - "\n", - "vm.init(\n", - " api_host=api_host,\n", - " api_key=api_key,\n", - " api_secret=api_secret,\n", - " model=model_cuid,\n", - ")\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Step 4 — Load data from your linked Databricks table\n", - "\n", - "Instead of querying Databricks directly, this notebook loads data through ValidMind.\n", - "ValidMind fetches and syncs the Unity Catalog table data when you create a binding in\n", - "**Settings → Integrations → Databricks**, so the same dataset is available here via the\n", - "tracking API — no Spark session or direct UC credentials needed.\n", - "\n", - "**Prerequisites:**\n", - "1. A Databricks integration configured in ValidMind Settings\n", - "2. A `table` binding created for this model (link a Unity Catalog table to this model)\n", - "3. At least one successful sync (the initial sync triggers automatically on binding creation)\n", - "\n", - "If no binding exists yet, set `USE_SYNTHETIC_FALLBACK = True` to run with generated data." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true - }, - "outputs": [], - "source": [ - "import requests\n", - "import pandas as pd\n", - "from validmind import api_client as _vm_client\n", - "\n", - "# Set to True only if you don't have a Databricks table binding set up yet\n", - "USE_SYNTHETIC_FALLBACK = False\n", - "\n", - "# ---------------------------------------------------------------------------\n", - "# Load from ValidMind — uses the linked Databricks table binding for this model\n", - "# ---------------------------------------------------------------------------\n", - "if not USE_SYNTHETIC_FALLBACK:\n", - " _api_host = _vm_client.get_api_host() # same host as vm.init()\n", - " _headers = _vm_client._get_api_headers()\n", - "\n", - " _response = requests.get(\n", - " f\"{_api_host}/integrations/dataset\",\n", - " headers=_headers,\n", - " timeout=30,\n", - " )\n", - "\n", - " if _response.status_code == 200:\n", - " _data = _response.json()\n", - " TABLE_NAME = _data.get(\"table_name\", \"unknown\")\n", - " TARGET_COLUMN = \"target\" # <-- update if your table uses a different column name\n", - " row_data = _data.get(\"row_data\", [])\n", - "\n", - " if not row_data:\n", - " raise RuntimeError(\n", - " f\"Binding found for table '{TABLE_NAME}' but row_data is empty. \"\n", - " \"The sync may still be in progress — wait a moment and re-run this cell.\"\n", - " )\n", - "\n", - " df = pd.DataFrame(row_data)\n", - "\n", - " if TARGET_COLUMN not in df.columns:\n", - " raise ValueError(\n", - " f\"Column '{TARGET_COLUMN}' not found in synced data. \"\n", - " f\"Available columns: {list(df.columns)}. \"\n", - " \"Update TARGET_COLUMN above to match your table's target column.\"\n", - " )\n", - "\n", - " print(f\"Loaded {len(df):,} rows, {len(df.columns)} columns from {TABLE_NAME}\")\n", - " print(f\"Last synced: {_data.get('last_synced_at', 'unknown')}\")\n", - " print(f\"Target distribution: {df[TARGET_COLUMN].value_counts().to_dict()}\")\n", - " display(df.head())\n", - "\n", - " elif _response.status_code == 404:\n", - " raise RuntimeError(\n", - " \"No active Databricks table binding found for this model.\\n\\n\"\n", - " \"To fix:\\n\"\n", - " \" 1. Go to ValidMind → Settings → Integrations → Databricks\\n\"\n", - " \" 2. Open the model binding browser and select a Unity Catalog table\\n\"\n", - " \" 3. Wait ~30 seconds for the initial sync to complete\\n\"\n", - " \" 4. Re-run this cell\\n\\n\"\n", - " \"Or set USE_SYNTHETIC_FALLBACK = True above to continue with generated data.\"\n", - " )\n", - " else:\n", - " raise RuntimeError(\n", - " f\"Unexpected error loading dataset from ValidMind: \"\n", - " f\"{_response.status_code} — {_response.text}\"\n", - " )" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# ---------------------------------------------------------------------------\n", - "# Synthetic data fallback — runs when USE_SYNTHETIC_FALLBACK = True\n", - "# Uses the Bank Customer Churn dataset pattern from ValidMind examples\n", - "# ---------------------------------------------------------------------------\n", - "if USE_SYNTHETIC_FALLBACK:\n", - " import numpy as np\n", - " from sklearn.datasets import make_classification\n", - "\n", - " np.random.seed(42)\n", - " X, y = make_classification(\n", - " n_samples=1000,\n", - " n_features=10,\n", - " n_informative=6,\n", - " n_redundant=2,\n", - " random_state=42,\n", - " )\n", - " feature_names = [\n", - " \"credit_score\", \"age\", \"tenure\", \"balance\",\n", - " \"num_products\", \"has_credit_card\", \"is_active_member\",\n", - " \"estimated_salary\", \"geography_encoded\", \"gender_encoded\",\n", - " ]\n", - " df = pd.DataFrame(X, columns=feature_names)\n", - " df[\"target\"] = y\n", - " TARGET_COLUMN = \"target\"\n", - " TABLE_NAME = \"synthetic\"\n", - "\n", - " print(f\"Using synthetic dataset: {len(df):,} rows, {len(df.columns)} columns\")\n", - " print(f\"Target distribution: {df[TARGET_COLUMN].value_counts().to_dict()}\")\n", - " display(df.head())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Step 5 — Prepare train/test split" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.model_selection import train_test_split\n", - "\n", - "feature_columns = [c for c in df.columns if c != TARGET_COLUMN]\n", - "\n", - "train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)\n", - "\n", - "print(f'Train set: {len(train_df):,} rows')\n", - "print(f'Test set: {len(test_df):,} rows')\n", - "print(f'Features: {feature_columns}')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Step 6 — Train a simple model" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.ensemble import GradientBoostingClassifier\n", - "\n", - "model = GradientBoostingClassifier(n_estimators=100, random_state=42)\n", - "model.fit(train_df[feature_columns], train_df[TARGET_COLUMN])\n", - "\n", - "train_accuracy = model.score(train_df[feature_columns], train_df[TARGET_COLUMN])\n", - "test_accuracy = model.score(test_df[feature_columns], test_df[TARGET_COLUMN])\n", - "\n", - "print(f'Train accuracy: {train_accuracy:.4f}')\n", - "print(f'Test accuracy: {test_accuracy:.4f}')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Step 7 — Register datasets and model with ValidMind" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "vm_train_ds = vm.init_dataset(\n", - " dataset=train_df,\n", - " input_id=\"train_dataset\",\n", - " target_column=TARGET_COLUMN,\n", - ")\n", - "\n", - "vm_test_ds = vm.init_dataset(\n", - " dataset=test_df,\n", - " input_id=\"test_dataset\",\n", - " target_column=TARGET_COLUMN,\n", - ")\n", - "\n", - "vm_model = vm.init_model(\n", - " model=model,\n", - " input_id=\"gradient_boosting_model\",\n", - ")\n", - "\n", - "print('Datasets and model registered with ValidMind.')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Step 8 — Assign predictions to datasets" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "vm_train_ds.assign_predictions(model=vm_model)\n", - "vm_test_ds.assign_predictions(model=vm_model)\n", - "\n", - "print('Predictions assigned.')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Step 9 — Run individual tests\n", - "\n", - "These tests validate that results render correctly in the notebook and are sent to the ValidMind Platform." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Dataset statistics — validates data documentation capability\n", - "result = vm.tests.run_test(\n", - " \"validmind.data_validation.DatasetDescription\",\n", - " inputs={\"dataset\": vm_train_ds},\n", - ")\n", - "result.log()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Class imbalance check\n", - "result = vm.tests.run_test(\n", - " \"validmind.data_validation.ClassImbalance\",\n", - " inputs={\"dataset\": vm_train_ds},\n", - ")\n", - "result.log()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Confusion matrix — validates model performance visualization\n", - "result = vm.tests.run_test(\n", - " \"validmind.model_validation.sklearn.ConfusionMatrix\",\n", - " inputs={\"dataset\": vm_test_ds, \"model\": vm_model},\n", - ")\n", - "result.log()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# ROC curve\n", - "result = vm.tests.run_test(\n", - " \"validmind.model_validation.sklearn.ROCCurve\",\n", - " inputs={\"dataset\": vm_test_ds, \"model\": vm_model},\n", - ")\n", - "result.log()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Feature importance\n", - "result = vm.tests.run_test(\n", - " \"validmind.model_validation.sklearn.FeatureImportance\",\n", - " inputs={\"dataset\": vm_train_ds, \"model\": vm_model},\n", - ")\n", - "result.log()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Step 10 — Run the full test suite\n", - "\n", - "This runs the complete classifier documentation suite and sends all results to ValidMind in one call.\n", - "\n", - "> This is the primary validation that results can be sent from a Databricks notebook environment." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "test_suite_result = vm.run_test_suite(\n", - " \"classifier_full_suite\",\n", - " inputs={\n", - " \"dataset\": vm_test_ds,\n", - " \"model\": vm_model,\n", - " \"train_dataset\": vm_train_ds,\n", - " \"test_dataset\": vm_test_ds,\n", - " },\n", - ")\n", - "print('Full test suite completed and results sent to ValidMind Platform.')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Step 11 — Verify results on the platform\n", - "\n", - "1. Go to [ValidMind Platform](https://app.prod.validmind.ai) (or your local instance)\n", - "2. Navigate to **Model Inventory** → your model\n", - "3. Open the **Documentation** tab\n", - "4. Confirm that test results from this notebook appear\n", - "\n", - "**Expected results visible on platform:**\n", - "- Dataset Description table\n", - "- Class Imbalance chart\n", - "- Confusion Matrix\n", - "- ROC Curve\n", - "- Feature Importance chart\n", - "- Full classifier suite results\n", - "\n", - "---\n", - "\n", - "## Troubleshooting\n", - "\n", - "| Issue | Fix |\n", - "|-------|-----|\n", - "| `ModuleNotFoundError` after install | Re-run the `dbutils.library.restartPython()` cell |\n", - "| `ConnectionError` on `vm.init()` | Workspace may block outbound traffic — check network policy or use a cluster with internet access |\n", - "| `401 Unauthorized` on `vm.init()` | API key/secret are incorrect — copy credentials fresh from the platform |\n", - "| `numpy` version conflict | Pin with `%pip install -q validmind \"numpy>=1.23,<2.0.0\"` |\n", - "| `404` on dataset load | No Databricks table binding found — create one in Settings → Integrations → Databricks, then wait for sync |\n", - "| `row_data is empty` after binding created | Initial sync is still running — wait ~30 seconds and re-run Step 4 |\n", - "| Wrong columns / target not found | Update `TARGET_COLUMN` in Step 4 to match the actual target column in your UC table |\n", - "| Want to test without a binding | Set `USE_SYNTHETIC_FALLBACK = True` in Step 4 |" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.11.7" - } - }, - "nbformat": 4, - "nbformat_minor": 4 + "nbformat": 4, + "nbformat_minor": 4 } From bc59a2b5502bff8e08be8030f7cd963546750934 Mon Sep 17 00:00:00 2001 From: Nik Richers Date: Thu, 7 May 2026 16:10:21 -0700 Subject: [PATCH 7/7] Run make copyright --- .../validmind_databricks_quickstart.ipynb | 1085 ++++---- .../run_tests/2_run_comparison_tests.ipynb | 2181 +++++++++-------- 2 files changed, 1641 insertions(+), 1625 deletions(-) diff --git a/notebooks/databricks/validmind_databricks_quickstart.ipynb b/notebooks/databricks/validmind_databricks_quickstart.ipynb index bfaa310b6..c54c52ae8 100644 --- a/notebooks/databricks/validmind_databricks_quickstart.ipynb +++ b/notebooks/databricks/validmind_databricks_quickstart.ipynb @@ -1,538 +1,553 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# ValidMind + Databricks Quickstart\n", - "\n", - "Use this notebook to install and run the ValidMind Library inside a Databricks Collaborative Notebook, load data from a Unity Catalog table linked to your model in ValidMind, train a simple classification model, and send the results to the ValidMind Platform.\n", - "\n", - "In this notebook, you will:\n", - "\n", - "- Install and initialize the ValidMind Library\n", - "- Load data from a Unity Catalog table linked to your model in ValidMind\n", - "- Train a simple classification model\n", - "- Run ValidMind tests and send the results to the ValidMind Platform\n", - "\n", - "## Before you begin\n", - "\n", - "You will need:\n", - "1. A running Databricks workspace with Unity Catalog enabled\n", - "2. A ValidMind account with a registered model\n", - "3. Your ValidMind API credentials (API key, API secret, model identifier)\n", - "\n", - "To get your credentials: log in to ValidMind → **Model Inventory** → select your model → **Getting Started** → **Copy snippet to clipboard**.\n", - "\n", - "For step-by-step instructions on setting up the Databricks integration and linking a Unity Catalog table to your model, refer to [Synchronize with Databricks](https://docs.validmind.ai/guide/integrations/integrations-examples/synchronize-with-databricks.html).\n", - "\n", - "> **Note:** If you don't have a Unity Catalog table linked to your model yet, this notebook includes a synthetic-data fallback so you can still run through the full workflow." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Step 1 — Install the ValidMind Library\n", - "\n", - "Run this cell first. Databricks requires a Python restart after `%pip install`." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "%pip install -q validmind" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "# Restart Python kernel to pick up newly installed packages\n", - "dbutils.library.restartPython()" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Step 2 — Verify installation\n", - "\n", - "Confirm that the ValidMind Library installed successfully and check the version available in your notebook environment:" - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "import importlib.metadata\n", - "version = importlib.metadata.version('validmind')\n", - "print(f'ValidMind Library version: {version}')\n", - "print('Installation successful!')" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Step 3 — Initialize the ValidMind Library\n", - "\n", - "Initialize the ValidMind Library with the *code snippet* unique to your model so that test results are uploaded to the correct model in the ValidMind Platform.\n", - "\n", - "You can supply your credentials in either of two ways:\n", - "\n", - "- **Databricks widgets**: set widgets named `vm_api_host`, `vm_api_key`, `vm_api_secret`, and `vm_model_cuid` on the notebook. This is convenient when you parameterize the notebook as part of a Databricks job.\n", - "- **Edit the next cell directly**: replace the placeholder values with your own credentials.\n", - "\n", - "To get your credentials:\n", - "\n", - "1. In ValidMind, go to **Model Inventory** and select your model.\n", - "2. Open **Getting Started** and click **Copy snippet to clipboard**.\n", - "3. Paste the values into the next cell, or use them to set the corresponding widgets:" - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "import validmind as vm\n", - "\n", - "# ---------------------------------------------------------------------------\n", - "# Credentials are read from Databricks widgets if set. Otherwise, replace the\n", - "# placeholder values below before running this cell.\n", - "# ---------------------------------------------------------------------------\n", - "try:\n", - " api_host = dbutils.widgets.getAll().get(\"vm_api_host\", \"\")\n", - " api_key = dbutils.widgets.getAll().get(\"vm_api_key\", \"\")\n", - " api_secret = dbutils.widgets.getAll().get(\"vm_api_secret\", \"\")\n", - " model_cuid = dbutils.widgets.getAll().get(\"vm_model_cuid\", \"\")\n", - "except NameError:\n", - " # dbutils is not available — running outside Databricks\n", - " api_host = \"\" # replace with your API host\n", - " api_key = \"\" # replace with your API key\n", - " api_secret = \"\" # replace with your API secret\n", - " model_cuid = \"\" # replace with your model CUID\n", - "\n", - "vm.init(\n", - " api_host=api_host,\n", - " api_key=api_key,\n", - " api_secret=api_secret,\n", - " model=model_cuid,\n", - ")\n", - "" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Step 4 — Load data from your linked Databricks table\n", - "\n", - "Load the data for this notebook from a Unity Catalog table that you've linked to your model in ValidMind. Once a table binding is set up, ValidMind syncs the data and makes it available through the tracking API. You don't need a Spark session or direct Unity Catalog credentials in this notebook.\n", - "\n", - "Before running the next cell, make sure you have:\n", - "\n", - "1. A Databricks integration configured in **Settings → Integrations → Databricks**\n", - "2. A `table` binding created for your model that links a Unity Catalog table to it\n", - "3. At least one successful sync (the initial sync runs automatically when you create the binding)\n", - "\n", - "If you don't have a table binding yet, set `USE_SYNTHETIC_FALLBACK = True` in the next cell to run this notebook with generated data instead." - ] - }, - { - "cell_type": "code", - "metadata": { - "scrolled": true - }, - "source": [ - "import requests\n", - "import pandas as pd\n", - "from validmind import api_client as _vm_client\n", - "\n", - "# Set to True only if you don't have a Databricks table binding set up yet\n", - "USE_SYNTHETIC_FALLBACK = False\n", - "\n", - "# ---------------------------------------------------------------------------\n", - "# Load from ValidMind — uses the linked Databricks table binding for this model\n", - "# ---------------------------------------------------------------------------\n", - "if not USE_SYNTHETIC_FALLBACK:\n", - " _api_host = _vm_client.get_api_host() # same host as vm.init()\n", - " _headers = _vm_client._get_api_headers()\n", - "\n", - " _response = requests.get(\n", - " f\"{_api_host}/integrations/dataset\",\n", - " headers=_headers,\n", - " timeout=30,\n", - " )\n", - "\n", - " if _response.status_code == 200:\n", - " _data = _response.json()\n", - " TABLE_NAME = _data.get(\"table_name\", \"unknown\")\n", - " TARGET_COLUMN = \"target\" # <-- update if your table uses a different column name\n", - " row_data = _data.get(\"row_data\", [])\n", - "\n", - " if not row_data:\n", - " raise RuntimeError(\n", - " f\"Binding found for table '{TABLE_NAME}' but row_data is empty. \"\n", - " \"The sync may still be in progress — wait a moment and re-run this cell.\"\n", - " )\n", - "\n", - " df = pd.DataFrame(row_data)\n", - "\n", - " if TARGET_COLUMN not in df.columns:\n", - " raise ValueError(\n", - " f\"Column '{TARGET_COLUMN}' not found in synced data. \"\n", - " f\"Available columns: {list(df.columns)}. \"\n", - " \"Update TARGET_COLUMN above to match your table's target column.\"\n", - " )\n", - "\n", - " print(f\"Loaded {len(df):,} rows, {len(df.columns)} columns from {TABLE_NAME}\")\n", - " print(f\"Last synced: {_data.get('last_synced_at', 'unknown')}\")\n", - " print(f\"Target distribution: {df[TARGET_COLUMN].value_counts().to_dict()}\")\n", - " display(df.head())\n", - "\n", - " elif _response.status_code == 404:\n", - " raise RuntimeError(\n", - " \"No active Databricks table binding found for this model.\\n\\n\"\n", - " \"To fix:\\n\"\n", - " \" 1. Go to ValidMind → Settings → Integrations → Databricks\\n\"\n", - " \" 2. Open the model binding browser and select a Unity Catalog table\\n\"\n", - " \" 3. Wait ~30 seconds for the initial sync to complete\\n\"\n", - " \" 4. Re-run this cell\\n\\n\"\n", - " \"Or set USE_SYNTHETIC_FALLBACK = True above to continue with generated data.\"\n", - " )\n", - " else:\n", - " raise RuntimeError(\n", - " f\"Unexpected error loading dataset from ValidMind: \"\n", - " f\"{_response.status_code} — {_response.text}\"\n", - " )" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "# ---------------------------------------------------------------------------\n", - "# Synthetic data fallback — runs when USE_SYNTHETIC_FALLBACK = True\n", - "# Uses the Bank Customer Churn dataset pattern from ValidMind examples\n", - "# ---------------------------------------------------------------------------\n", - "if USE_SYNTHETIC_FALLBACK:\n", - " import numpy as np\n", - " from sklearn.datasets import make_classification\n", - "\n", - " np.random.seed(42)\n", - " X, y = make_classification(\n", - " n_samples=1000,\n", - " n_features=10,\n", - " n_informative=6,\n", - " n_redundant=2,\n", - " random_state=42,\n", - " )\n", - " feature_names = [\n", - " \"credit_score\", \"age\", \"tenure\", \"balance\",\n", - " \"num_products\", \"has_credit_card\", \"is_active_member\",\n", - " \"estimated_salary\", \"geography_encoded\", \"gender_encoded\",\n", - " ]\n", - " df = pd.DataFrame(X, columns=feature_names)\n", - " df[\"target\"] = y\n", - " TARGET_COLUMN = \"target\"\n", - " TABLE_NAME = \"synthetic\"\n", - "\n", - " print(f\"Using synthetic dataset: {len(df):,} rows, {len(df.columns)} columns\")\n", - " print(f\"Target distribution: {df[TARGET_COLUMN].value_counts().to_dict()}\")\n", - " display(df.head())" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Step 5 — Prepare train/test split\n", - "\n", - "Split the dataset into a training set and a test set so you can train the model on one slice of the data and evaluate how it generalizes on data it hasn't seen:" - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "from sklearn.model_selection import train_test_split\n", - "\n", - "feature_columns = [c for c in df.columns if c != TARGET_COLUMN]\n", - "\n", - "train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)\n", - "\n", - "print(f'Train set: {len(train_df):,} rows')\n", - "print(f'Test set: {len(test_df):,} rows')\n", - "print(f'Features: {feature_columns}')" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Step 6 — Train a simple model\n", - "\n", - "Train a gradient boosting classifier on the training set. This is a small, fast model that's well-suited to a quickstart. The goal here is to produce something documentable end-to-end, not to tune for accuracy." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "from sklearn.ensemble import GradientBoostingClassifier\n", - "\n", - "model = GradientBoostingClassifier(n_estimators=100, random_state=42)\n", - "model.fit(train_df[feature_columns], train_df[TARGET_COLUMN])\n", - "\n", - "train_accuracy = model.score(train_df[feature_columns], train_df[TARGET_COLUMN])\n", - "test_accuracy = model.score(test_df[feature_columns], test_df[TARGET_COLUMN])\n", - "\n", - "print(f'Train accuracy: {train_accuracy:.4f}')\n", - "print(f'Test accuracy: {test_accuracy:.4f}')" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Step 7 — Register datasets and model with ValidMind\n", - "\n", - "Before you can run tests, ValidMind needs to know about your datasets and your model. Wrap the training and test DataFrames with [`vm.init_dataset()`](https://docs.validmind.ai/validmind/validmind.html#init_dataset) and the trained classifier with [`vm.init_model()`](https://docs.validmind.ai/validmind/validmind.html#init_model). Each call returns a ValidMind object that the test functions accept as input.\n", - "\n", - "The `input_id` you pass identifies each input when results are sent to the ValidMind Platform." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "vm_train_ds = vm.init_dataset(\n", - " dataset=train_df,\n", - " input_id=\"train_dataset\",\n", - " target_column=TARGET_COLUMN,\n", - ")\n", - "\n", - "vm_test_ds = vm.init_dataset(\n", - " dataset=test_df,\n", - " input_id=\"test_dataset\",\n", - " target_column=TARGET_COLUMN,\n", - ")\n", - "\n", - "vm_model = vm.init_model(\n", - " model=model,\n", - " input_id=\"gradient_boosting_model\",\n", - ")\n", - "\n", - "print('Datasets and model registered with ValidMind.')" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Step 8 — Assign predictions to datasets\n", - "\n", - "Many tests compare predicted values against actual values, so ValidMind needs the model's predictions attached to each dataset. The [`assign_predictions()` method](https://docs.validmind.ai/validmind/validmind/vm_models.html#assign_predictions) computes predictions from your model and links them to the dataset object, once for the training set and once for the test set:" - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "vm_train_ds.assign_predictions(model=vm_model)\n", - "vm_test_ds.assign_predictions(model=vm_model)\n", - "\n", - "print('Predictions assigned.')" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Step 9 — Run individual tests\n", - "\n", - "Run a few individual tests against your registered datasets and model to get familiar with how ValidMind tests work before running the full suite. Each [`vm.tests.run_test()`](https://docs.validmind.ai/validmind/validmind/tests.html#run_test) call executes one test, renders the result inline in this notebook, and `result.log()` sends the result to the ValidMind Platform:" - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "# Dataset statistics — validates data documentation capability\n", - "result = vm.tests.run_test(\n", - " \"validmind.data_validation.DatasetDescription\",\n", - " inputs={\"dataset\": vm_train_ds},\n", - ")\n", - "result.log()" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "# Class imbalance check\n", - "result = vm.tests.run_test(\n", - " \"validmind.data_validation.ClassImbalance\",\n", - " inputs={\"dataset\": vm_train_ds},\n", - ")\n", - "result.log()" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "# Confusion matrix — validates model performance visualization\n", - "result = vm.tests.run_test(\n", - " \"validmind.model_validation.sklearn.ConfusionMatrix\",\n", - " inputs={\"dataset\": vm_test_ds, \"model\": vm_model},\n", - ")\n", - "result.log()" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "# ROC curve\n", - "result = vm.tests.run_test(\n", - " \"validmind.model_validation.sklearn.ROCCurve\",\n", - " inputs={\"dataset\": vm_test_ds, \"model\": vm_model},\n", - ")\n", - "result.log()" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "# Feature importance\n", - "result = vm.tests.run_test(\n", - " \"validmind.model_validation.sklearn.FeatureImportance\",\n", - " inputs={\"dataset\": vm_train_ds, \"model\": vm_model},\n", - ")\n", - "result.log()" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Step 10 — Run the full test suite\n", - "\n", - "Run the complete classifier documentation suite. This single call executes every test in the suite and sends all results to the ValidMind Platform, where they populate the corresponding sections of your model documentation:" - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "test_suite_result = vm.run_test_suite(\n", - " \"classifier_full_suite\",\n", - " inputs={\n", - " \"dataset\": vm_test_ds,\n", - " \"model\": vm_model,\n", - " \"train_dataset\": vm_train_ds,\n", - " \"test_dataset\": vm_test_ds,\n", - " },\n", - ")\n", - "print('Full test suite completed and results sent to ValidMind Platform.')" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Step 11 — Verify results on the platform\n", - "\n", - "To see the results of this notebook in the ValidMind Platform:\n", - "\n", - "1. Go to the [ValidMind Platform](https://app.prod.validmind.ai) (or your ValidMind instance).\n", - "2. Navigate to **Model Inventory** and select your model.\n", - "3. Open the **Documentation** tab.\n", - "4. Confirm that the test results from this notebook appear in the relevant sections.\n", - "\n", - "After a successful run, you should see the following results in your model's documentation:\n", - "\n", - "- Dataset Description table\n", - "- Class Imbalance chart\n", - "- Confusion Matrix\n", - "- ROC Curve\n", - "- Feature Importance chart\n", - "- Full classifier suite results\n", - "\n", - "---\n", - "\n", - "## Troubleshooting\n", - "\n", - "If you run into any of the issues below, the table lists the likely fix:\n", - "\n", - "| Issue | Fix |\n", - "|-------|-----|\n", - "| `ModuleNotFoundError` after install | Re-run the `dbutils.library.restartPython()` cell. |\n", - "| `ConnectionError` on `vm.init()` | Your workspace may block outbound traffic. Check your network policy, or use a cluster with internet access. |\n", - "| `401 Unauthorized` on `vm.init()` | The API key or secret is incorrect. Copy your credentials again from the ValidMind Platform. |\n", - "| `numpy` version conflict | Pin a compatible version with `%pip install -q validmind \"numpy>=1.23,<2.0.0\"`. |\n", - "| `404` on dataset load | No Databricks table binding was found. Create one in **Settings → Integrations → Databricks**, then wait for the initial sync to complete. |\n", - "| `row_data is empty` after binding created | The initial sync is still running. Wait about 30 seconds and re-run Step 4. |\n", - "| Wrong columns or target not found | Update `TARGET_COLUMN` in Step 4 to match the target column in your Unity Catalog table. |\n", - "| Want to try the notebook without a binding | Set `USE_SYNTHETIC_FALLBACK = True` in Step 4 to use generated data. |" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.11.7" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# ValidMind + Databricks Quickstart\n", + "\n", + "Use this notebook to install and run the ValidMind Library inside a Databricks Collaborative Notebook, load data from a Unity Catalog table linked to your model in ValidMind, train a simple classification model, and send the results to the ValidMind Platform.\n", + "\n", + "In this notebook, you will:\n", + "\n", + "- Install and initialize the ValidMind Library\n", + "- Load data from a Unity Catalog table linked to your model in ValidMind\n", + "- Train a simple classification model\n", + "- Run ValidMind tests and send the results to the ValidMind Platform\n", + "\n", + "## Before you begin\n", + "\n", + "You will need:\n", + "1. A running Databricks workspace with Unity Catalog enabled\n", + "2. A ValidMind account with a registered model\n", + "3. Your ValidMind API credentials (API key, API secret, model identifier)\n", + "\n", + "To get your credentials: log in to ValidMind → **Model Inventory** → select your model → **Getting Started** → **Copy snippet to clipboard**.\n", + "\n", + "For step-by-step instructions on setting up the Databricks integration and linking a Unity Catalog table to your model, refer to [Synchronize with Databricks](https://docs.validmind.ai/guide/integrations/integrations-examples/synchronize-with-databricks.html).\n", + "\n", + "> **Note:** If you don't have a Unity Catalog table linked to your model yet, this notebook includes a synthetic-data fallback so you can still run through the full workflow." + ] }, - "nbformat": 4, - "nbformat_minor": 4 + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 1 — Install the ValidMind Library\n", + "\n", + "Run this cell first. Databricks requires a Python restart after `%pip install`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%pip install -q validmind" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Restart Python kernel to pick up newly installed packages\n", + "dbutils.library.restartPython()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 2 — Verify installation\n", + "\n", + "Confirm that the ValidMind Library installed successfully and check the version available in your notebook environment:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import importlib.metadata\n", + "version = importlib.metadata.version('validmind')\n", + "print(f'ValidMind Library version: {version}')\n", + "print('Installation successful!')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 3 — Initialize the ValidMind Library\n", + "\n", + "Initialize the ValidMind Library with the *code snippet* unique to your model so that test results are uploaded to the correct model in the ValidMind Platform.\n", + "\n", + "You can supply your credentials in either of two ways:\n", + "\n", + "- **Databricks widgets**: set widgets named `vm_api_host`, `vm_api_key`, `vm_api_secret`, and `vm_model_cuid` on the notebook. This is convenient when you parameterize the notebook as part of a Databricks job.\n", + "- **Edit the next cell directly**: replace the placeholder values with your own credentials.\n", + "\n", + "To get your credentials:\n", + "\n", + "1. In ValidMind, go to **Model Inventory** and select your model.\n", + "2. Open **Getting Started** and click **Copy snippet to clipboard**.\n", + "3. Paste the values into the next cell, or use them to set the corresponding widgets:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import validmind as vm\n", + "\n", + "# ---------------------------------------------------------------------------\n", + "# Credentials are read from Databricks widgets if set. Otherwise, replace the\n", + "# placeholder values below before running this cell.\n", + "# ---------------------------------------------------------------------------\n", + "try:\n", + " api_host = dbutils.widgets.getAll().get(\"vm_api_host\", \"\")\n", + " api_key = dbutils.widgets.getAll().get(\"vm_api_key\", \"\")\n", + " api_secret = dbutils.widgets.getAll().get(\"vm_api_secret\", \"\")\n", + " model_cuid = dbutils.widgets.getAll().get(\"vm_model_cuid\", \"\")\n", + "except NameError:\n", + " # dbutils is not available — running outside Databricks\n", + " api_host = \"\" # replace with your API host\n", + " api_key = \"\" # replace with your API key\n", + " api_secret = \"\" # replace with your API secret\n", + " model_cuid = \"\" # replace with your model CUID\n", + "\n", + "vm.init(\n", + " api_host=api_host,\n", + " api_key=api_key,\n", + " api_secret=api_secret,\n", + " model=model_cuid,\n", + ")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 4 — Load data from your linked Databricks table\n", + "\n", + "Load the data for this notebook from a Unity Catalog table that you've linked to your model in ValidMind. Once a table binding is set up, ValidMind syncs the data and makes it available through the tracking API. You don't need a Spark session or direct Unity Catalog credentials in this notebook.\n", + "\n", + "Before running the next cell, make sure you have:\n", + "\n", + "1. A Databricks integration configured in **Settings → Integrations → Databricks**\n", + "2. A `table` binding created for your model that links a Unity Catalog table to it\n", + "3. At least one successful sync (the initial sync runs automatically when you create the binding)\n", + "\n", + "If you don't have a table binding yet, set `USE_SYNTHETIC_FALLBACK = True` in the next cell to run this notebook with generated data instead." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "import requests\n", + "import pandas as pd\n", + "from validmind import api_client as _vm_client\n", + "\n", + "# Set to True only if you don't have a Databricks table binding set up yet\n", + "USE_SYNTHETIC_FALLBACK = False\n", + "\n", + "# ---------------------------------------------------------------------------\n", + "# Load from ValidMind — uses the linked Databricks table binding for this model\n", + "# ---------------------------------------------------------------------------\n", + "if not USE_SYNTHETIC_FALLBACK:\n", + " _api_host = _vm_client.get_api_host() # same host as vm.init()\n", + " _headers = _vm_client._get_api_headers()\n", + "\n", + " _response = requests.get(\n", + " f\"{_api_host}/integrations/dataset\",\n", + " headers=_headers,\n", + " timeout=30,\n", + " )\n", + "\n", + " if _response.status_code == 200:\n", + " _data = _response.json()\n", + " TABLE_NAME = _data.get(\"table_name\", \"unknown\")\n", + " TARGET_COLUMN = \"target\" # <-- update if your table uses a different column name\n", + " row_data = _data.get(\"row_data\", [])\n", + "\n", + " if not row_data:\n", + " raise RuntimeError(\n", + " f\"Binding found for table '{TABLE_NAME}' but row_data is empty. \"\n", + " \"The sync may still be in progress — wait a moment and re-run this cell.\"\n", + " )\n", + "\n", + " df = pd.DataFrame(row_data)\n", + "\n", + " if TARGET_COLUMN not in df.columns:\n", + " raise ValueError(\n", + " f\"Column '{TARGET_COLUMN}' not found in synced data. \"\n", + " f\"Available columns: {list(df.columns)}. \"\n", + " \"Update TARGET_COLUMN above to match your table's target column.\"\n", + " )\n", + "\n", + " print(f\"Loaded {len(df):,} rows, {len(df.columns)} columns from {TABLE_NAME}\")\n", + " print(f\"Last synced: {_data.get('last_synced_at', 'unknown')}\")\n", + " print(f\"Target distribution: {df[TARGET_COLUMN].value_counts().to_dict()}\")\n", + " display(df.head())\n", + "\n", + " elif _response.status_code == 404:\n", + " raise RuntimeError(\n", + " \"No active Databricks table binding found for this model.\\n\\n\"\n", + " \"To fix:\\n\"\n", + " \" 1. Go to ValidMind → Settings → Integrations → Databricks\\n\"\n", + " \" 2. Open the model binding browser and select a Unity Catalog table\\n\"\n", + " \" 3. Wait ~30 seconds for the initial sync to complete\\n\"\n", + " \" 4. Re-run this cell\\n\\n\"\n", + " \"Or set USE_SYNTHETIC_FALLBACK = True above to continue with generated data.\"\n", + " )\n", + " else:\n", + " raise RuntimeError(\n", + " f\"Unexpected error loading dataset from ValidMind: \"\n", + " f\"{_response.status_code} — {_response.text}\"\n", + " )" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# ---------------------------------------------------------------------------\n", + "# Synthetic data fallback — runs when USE_SYNTHETIC_FALLBACK = True\n", + "# Uses the Bank Customer Churn dataset pattern from ValidMind examples\n", + "# ---------------------------------------------------------------------------\n", + "if USE_SYNTHETIC_FALLBACK:\n", + " import numpy as np\n", + " from sklearn.datasets import make_classification\n", + "\n", + " np.random.seed(42)\n", + " X, y = make_classification(\n", + " n_samples=1000,\n", + " n_features=10,\n", + " n_informative=6,\n", + " n_redundant=2,\n", + " random_state=42,\n", + " )\n", + " feature_names = [\n", + " \"credit_score\", \"age\", \"tenure\", \"balance\",\n", + " \"num_products\", \"has_credit_card\", \"is_active_member\",\n", + " \"estimated_salary\", \"geography_encoded\", \"gender_encoded\",\n", + " ]\n", + " df = pd.DataFrame(X, columns=feature_names)\n", + " df[\"target\"] = y\n", + " TARGET_COLUMN = \"target\"\n", + " TABLE_NAME = \"synthetic\"\n", + "\n", + " print(f\"Using synthetic dataset: {len(df):,} rows, {len(df.columns)} columns\")\n", + " print(f\"Target distribution: {df[TARGET_COLUMN].value_counts().to_dict()}\")\n", + " display(df.head())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 5 — Prepare train/test split\n", + "\n", + "Split the dataset into a training set and a test set so you can train the model on one slice of the data and evaluate how it generalizes on data it hasn't seen:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.model_selection import train_test_split\n", + "\n", + "feature_columns = [c for c in df.columns if c != TARGET_COLUMN]\n", + "\n", + "train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)\n", + "\n", + "print(f'Train set: {len(train_df):,} rows')\n", + "print(f'Test set: {len(test_df):,} rows')\n", + "print(f'Features: {feature_columns}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 6 — Train a simple model\n", + "\n", + "Train a gradient boosting classifier on the training set. This is a small, fast model that's well-suited to a quickstart. The goal here is to produce something documentable end-to-end, not to tune for accuracy." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.ensemble import GradientBoostingClassifier\n", + "\n", + "model = GradientBoostingClassifier(n_estimators=100, random_state=42)\n", + "model.fit(train_df[feature_columns], train_df[TARGET_COLUMN])\n", + "\n", + "train_accuracy = model.score(train_df[feature_columns], train_df[TARGET_COLUMN])\n", + "test_accuracy = model.score(test_df[feature_columns], test_df[TARGET_COLUMN])\n", + "\n", + "print(f'Train accuracy: {train_accuracy:.4f}')\n", + "print(f'Test accuracy: {test_accuracy:.4f}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 7 — Register datasets and model with ValidMind\n", + "\n", + "Before you can run tests, ValidMind needs to know about your datasets and your model. Wrap the training and test DataFrames with [`vm.init_dataset()`](https://docs.validmind.ai/validmind/validmind.html#init_dataset) and the trained classifier with [`vm.init_model()`](https://docs.validmind.ai/validmind/validmind.html#init_model). Each call returns a ValidMind object that the test functions accept as input.\n", + "\n", + "The `input_id` you pass identifies each input when results are sent to the ValidMind Platform." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "vm_train_ds = vm.init_dataset(\n", + " dataset=train_df,\n", + " input_id=\"train_dataset\",\n", + " target_column=TARGET_COLUMN,\n", + ")\n", + "\n", + "vm_test_ds = vm.init_dataset(\n", + " dataset=test_df,\n", + " input_id=\"test_dataset\",\n", + " target_column=TARGET_COLUMN,\n", + ")\n", + "\n", + "vm_model = vm.init_model(\n", + " model=model,\n", + " input_id=\"gradient_boosting_model\",\n", + ")\n", + "\n", + "print('Datasets and model registered with ValidMind.')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 8 — Assign predictions to datasets\n", + "\n", + "Many tests compare predicted values against actual values, so ValidMind needs the model's predictions attached to each dataset. The [`assign_predictions()` method](https://docs.validmind.ai/validmind/validmind/vm_models.html#assign_predictions) computes predictions from your model and links them to the dataset object, once for the training set and once for the test set:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "vm_train_ds.assign_predictions(model=vm_model)\n", + "vm_test_ds.assign_predictions(model=vm_model)\n", + "\n", + "print('Predictions assigned.')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 9 — Run individual tests\n", + "\n", + "Run a few individual tests against your registered datasets and model to get familiar with how ValidMind tests work before running the full suite. Each [`vm.tests.run_test()`](https://docs.validmind.ai/validmind/validmind/tests.html#run_test) call executes one test, renders the result inline in this notebook, and `result.log()` sends the result to the ValidMind Platform:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Dataset statistics — validates data documentation capability\n", + "result = vm.tests.run_test(\n", + " \"validmind.data_validation.DatasetDescription\",\n", + " inputs={\"dataset\": vm_train_ds},\n", + ")\n", + "result.log()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Class imbalance check\n", + "result = vm.tests.run_test(\n", + " \"validmind.data_validation.ClassImbalance\",\n", + " inputs={\"dataset\": vm_train_ds},\n", + ")\n", + "result.log()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Confusion matrix — validates model performance visualization\n", + "result = vm.tests.run_test(\n", + " \"validmind.model_validation.sklearn.ConfusionMatrix\",\n", + " inputs={\"dataset\": vm_test_ds, \"model\": vm_model},\n", + ")\n", + "result.log()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# ROC curve\n", + "result = vm.tests.run_test(\n", + " \"validmind.model_validation.sklearn.ROCCurve\",\n", + " inputs={\"dataset\": vm_test_ds, \"model\": vm_model},\n", + ")\n", + "result.log()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Feature importance\n", + "result = vm.tests.run_test(\n", + " \"validmind.model_validation.sklearn.FeatureImportance\",\n", + " inputs={\"dataset\": vm_train_ds, \"model\": vm_model},\n", + ")\n", + "result.log()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 10 — Run the full test suite\n", + "\n", + "Run the complete classifier documentation suite. This single call executes every test in the suite and sends all results to the ValidMind Platform, where they populate the corresponding sections of your model documentation:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "test_suite_result = vm.run_test_suite(\n", + " \"classifier_full_suite\",\n", + " inputs={\n", + " \"dataset\": vm_test_ds,\n", + " \"model\": vm_model,\n", + " \"train_dataset\": vm_train_ds,\n", + " \"test_dataset\": vm_test_ds,\n", + " },\n", + ")\n", + "print('Full test suite completed and results sent to ValidMind Platform.')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 11 — Verify results on the platform\n", + "\n", + "To see the results of this notebook in the ValidMind Platform:\n", + "\n", + "1. Go to the [ValidMind Platform](https://app.prod.validmind.ai) (or your ValidMind instance).\n", + "2. Navigate to **Model Inventory** and select your model.\n", + "3. Open the **Documentation** tab.\n", + "4. Confirm that the test results from this notebook appear in the relevant sections.\n", + "\n", + "After a successful run, you should see the following results in your model's documentation:\n", + "\n", + "- Dataset Description table\n", + "- Class Imbalance chart\n", + "- Confusion Matrix\n", + "- ROC Curve\n", + "- Feature Importance chart\n", + "- Full classifier suite results\n", + "\n", + "---\n", + "\n", + "## Troubleshooting\n", + "\n", + "If you run into any of the issues below, the table lists the likely fix:\n", + "\n", + "| Issue | Fix |\n", + "|-------|-----|\n", + "| `ModuleNotFoundError` after install | Re-run the `dbutils.library.restartPython()` cell. |\n", + "| `ConnectionError` on `vm.init()` | Your workspace may block outbound traffic. Check your network policy, or use a cluster with internet access. |\n", + "| `401 Unauthorized` on `vm.init()` | The API key or secret is incorrect. Copy your credentials again from the ValidMind Platform. |\n", + "| `numpy` version conflict | Pin a compatible version with `%pip install -q validmind \"numpy>=1.23,<2.0.0\"`. |\n", + "| `404` on dataset load | No Databricks table binding was found. Create one in **Settings → Integrations → Databricks**, then wait for the initial sync to complete. |\n", + "| `row_data is empty` after binding created | The initial sync is still running. Wait about 30 seconds and re-run Step 4. |\n", + "| Wrong columns or target not found | Update `TARGET_COLUMN` in Step 4 to match the target column in your Unity Catalog table. |\n", + "| Want to try the notebook without a binding | Set `USE_SYNTHETIC_FALLBACK = True` in Step 4 to use generated data. |" + ] + }, + { + "cell_type": "markdown", + "id": "copyright-08359404300c413f964cfb59cd670f71", + "metadata": {}, + "source": [ + "\n", + "\n", + "\n", + "\n", + "***\n", + "\n", + "Copyright © 2023-2026 ValidMind Inc. All rights reserved.
\n", + "Refer to [LICENSE](https://github.com/validmind/validmind-library/blob/main/LICENSE) for details.
\n", + "SPDX-License-Identifier: AGPL-3.0 AND ValidMind Commercial
" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.7" + } + }, + "nbformat": 4, + "nbformat_minor": 4 } diff --git a/notebooks/how_to/tests/run_tests/2_run_comparison_tests.ipynb b/notebooks/how_to/tests/run_tests/2_run_comparison_tests.ipynb index 1ba4627bc..a8fe3701c 100644 --- a/notebooks/how_to/tests/run_tests/2_run_comparison_tests.ipynb +++ b/notebooks/how_to/tests/run_tests/2_run_comparison_tests.ipynb @@ -1,1094 +1,1095 @@ { - "cells": [ - { - "cell_type": "markdown", - "id": "1d29276f", - "metadata": {}, - "source": [ - "# Run comparison tests\n", - "\n", - "Learn how to use the ValidMind Library to run comparison tests that take any datasets or models as inputs. Identify comparison tests to run, initialize ValidMind dataset and model objects in preparation for passing them to tests, and then run tests — generating outputs automatically logged to your model's documentation in the ValidMind Platform.\n", - "\n", - "
We recommend that you first complete our introductory notebook on running tests.\n", - "

\n", - "Run dataset-based tests
" - ] - }, - { - "cell_type": "markdown", - "id": "61065444", - "metadata": {}, - "source": [ - "::: {.content-hidden when-format=\"html\"}\n", - "## Contents \n", - "- [About ValidMind](#toc1__) \n", - " - [Before you begin](#toc1_1__) \n", - " - [New to ValidMind?](#toc1_2__) \n", - " - [Key concepts](#toc1_3__) \n", - "- [Setting up](#toc2__) \n", - " - [Install the ValidMind Library](#toc2_1__) \n", - " - [Initialize the ValidMind Library](#toc2_2__) \n", - " - [Register sample model](#toc2_2_1__) \n", - " - [Apply documentation template](#toc2_2_2__) \n", - " - [Get your code snippet](#toc2_2_3__) \n", - " - [Preview the documentation template](#toc2_3__) \n", - " - [Initialize the Python environment](#toc2_4__) \n", - "- [Explore a ValidMind test](#toc3__) \n", - "- [Working with ValidMind datasets](#toc4__) \n", - " - [Import the sample dataset](#toc4_1__) \n", - " - [Split the dataset](#toc4_2__) \n", - " - [Initialize the ValidMind dataset](#toc4_3__) \n", - "- [Working with ValidMind models](#toc5__) \n", - " - [Train a sample model](#toc5_1__) \n", - " - [Initialize the ValidMind model](#toc5_2__) \n", - " - [Assign predictions](#toc5_3__) \n", - "- [Running ValidMind tests](#toc6__) \n", - " - [Run classifier performance test with one model](#toc6_1__) \n", - " - [Run comparison tests](#toc6_2__) \n", - " - [Run classifier performance test with multiple models](#toc6_2_1__) \n", - " - [Run classifier performance test with multiple parameter values](#toc6_2_2__) \n", - " - [Run comparison test with multiple datasets](#toc6_2_3__) \n", - "- [Work with test results](#toc7__) \n", - "- [Next steps](#toc8__) \n", - " - [Discover more learning resources](#toc8_1__) \n", - "- [Upgrade ValidMind](#toc9__) \n", - "\n", - ":::\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "id": "67a4d9dc", - "metadata": {}, - "source": [ - "\n", - "\n", - "## About ValidMind\n", - "\n", - "ValidMind is a suite of tools for managing model risk, including risk associated with AI and statistical models. \n", - "\n", - "You use the ValidMind Library to automate documentation and validation tests, and then use the ValidMind Platform to collaborate on model documentation. Together, these products simplify model risk management, facilitate compliance with regulations and institutional standards, and enhance collaboration between yourself and model validators." - ] - }, - { - "cell_type": "markdown", - "id": "eeb30df8", - "metadata": {}, - "source": [ - "\n", - "\n", - "### Before you begin\n", - "\n", - "This notebook assumes you have basic familiarity with Python, including an understanding of how functions work. If you are new to Python, you can still run the notebook but we recommend further familiarizing yourself with the language. \n", - "\n", - "If you encounter errors due to missing modules in your Python environment, install the modules with `pip install`, and then re-run the notebook. For more help, refer to [Installing Python Modules](https://docs.python.org/3/installing/index.html)." - ] - }, - { - "cell_type": "markdown", - "id": "293c3f98", - "metadata": {}, - "source": [ - "\n", - "\n", - "### New to ValidMind?\n", - "\n", - "If you haven't already seen our documentation on the [ValidMind Library](https://docs.validmind.ai/developer/validmind-library.html), we recommend you begin by exploring the available resources in this section. There, you can learn more about documenting models and running tests, as well as find code samples and our Python Library API reference.\n", - "\n", - "
For access to all features available in this notebook, you'll need access to a ValidMind account.\n", - "

\n", - "Register with ValidMind
" - ] - }, - { - "cell_type": "markdown", - "id": "4fc836d0", - "metadata": {}, - "source": [ - "\n", - "\n", - "### Key concepts\n", - "\n", - "**Model documentation**: A structured and detailed record pertaining to a model, encompassing key components such as its underlying assumptions, methodologies, data sources, inputs, performance metrics, evaluations, limitations, and intended uses. It serves to ensure transparency, adherence to regulatory requirements, and a clear understanding of potential risks associated with the model’s application.\n", - "\n", - "**Documentation template**: Functions as a test suite and lays out the structure of model documentation, segmented into various sections and sub-sections. Documentation templates define the structure of your model documentation, specifying the tests that should be run, and how the results should be displayed.\n", - "\n", - "**Tests**: A function contained in the ValidMind Library, designed to run a specific quantitative test on the dataset or model. Tests are the building blocks of ValidMind, used to evaluate and document models and datasets, and can be run individually or as part of a suite defined by your model documentation template.\n", - "\n", - "**Metrics**: A subset of tests that do not have thresholds. In the context of this notebook, metrics and tests can be thought of as interchangeable concepts.\n", - "\n", - "**Custom metrics**: Custom metrics are functions that you define to evaluate your model or dataset. These functions can be registered with the ValidMind Library to be used in the ValidMind Platform.\n", - "\n", - "**Inputs**: Objects to be evaluated and documented in the ValidMind Library. They can be any of the following:\n", - "\n", - " - **model**: A single model that has been initialized in ValidMind with [`vm.init_model()`](https://docs.validmind.ai/validmind/validmind.html#init_model).\n", - " - **dataset**: Single dataset that has been initialized in ValidMind with [`vm.init_dataset()`](https://docs.validmind.ai/validmind/validmind.html#init_dataset).\n", - " - **models**: A list of ValidMind models - usually this is used when you want to compare multiple models in your custom metric.\n", - " - **datasets**: A list of ValidMind datasets - usually this is used when you want to compare multiple datasets in your custom metric. (Learn more: [Run tests with multiple datasets](https://docs.validmind.ai/notebooks/how_to/tests/run_tests/configure_tests/run_tests_that_require_multiple_datasets.html))\n", - "\n", - "**Parameters**: Additional arguments that can be passed when running a ValidMind test, used to pass additional information to a metric, customize its behavior, or provide additional context.\n", - "\n", - "**Outputs**: Custom metrics can return elements like tables or plots. Tables may be a list of dictionaries (each representing a row) or a pandas DataFrame. Plots may be matplotlib or plotly figures.\n", - "\n", - "**Test suites**: Collections of tests designed to run together to automate and generate model documentation end-to-end for specific use-cases.\n", - "\n", - "Example: the [`classifier_full_suite`](https://docs.validmind.ai/validmind/validmind/test_suites/classifier.html#ClassifierFullSuite) test suite runs tests from the [`tabular_dataset`](https://docs.validmind.ai/validmind/validmind/test_suites/tabular_datasets.html) and [`classifier`](https://docs.validmind.ai/validmind/validmind/test_suites/classifier.html) test suites to fully document the data and model sections for binary classification model use-cases." - ] - }, - { - "cell_type": "markdown", - "id": "8d52b6e0", - "metadata": {}, - "source": [ - "\n", - "\n", - "## Setting up" - ] - }, - { - "cell_type": "markdown", - "id": "e0d2daaf", - "metadata": {}, - "source": [ - "\n", - "\n", - "### Install the ValidMind Library\n", - "\n", - "
Recommended Python versions\n", - "

\n", - "Python 3.8 <= x <= 3.11
\n", - "\n", - "To install the library:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "fc97888f", - "metadata": {}, - "outputs": [], - "source": [ - "%pip install -q validmind" - ] - }, - { - "cell_type": "markdown", - "id": "1ff56571", - "metadata": {}, - "source": [ - "\n", - "\n", - "### Initialize the ValidMind Library" - ] - }, - { - "cell_type": "markdown", - "id": "c4d9f164", - "metadata": {}, - "source": [ - "\n", - "\n", - "#### Register sample model\n", - "\n", - "Let's first register a sample model for use with this notebook.\n", - "\n", - "1. In a browser, [log in to ValidMind](https://docs.validmind.ai/guide/configuration/log-in-to-validmind.html).\n", - "\n", - "2. In the left sidebar, navigate to **Inventory** and click **+ Register Model**.\n", - "\n", - "3. Enter the model details and click **Next >** to continue to assignment of model stakeholders. ([Need more help?](https://docs.validmind.ai/guide/model-inventory/register-models-in-inventory.html))\n", - "\n", - "4. Select your own name under the **MODEL OWNER** drop-down.\n", - "\n", - "5. Click **Register Model** to add the model to your inventory." - ] - }, - { - "cell_type": "markdown", - "id": "852392e5", - "metadata": {}, - "source": [ - "\n", - "\n", - "#### Apply documentation template\n", - "\n", - "Once you've registered your model, let's select a documentation template. A template predefines sections for your model documentation and provides a general outline to follow, making the documentation process much easier.\n", - "\n", - "1. In the left sidebar that appears for your model, click **Documents** and select **Development**.\n", - "\n", - "2. Under **TEMPLATE**, select `Binary classification`.\n", - "\n", - "3. Click **Use Template** to apply the template." - ] - }, - { - "cell_type": "markdown", - "id": "6490e991", - "metadata": {}, - "source": [ - "\n", - "\n", - "#### Get your code snippet\n", - "\n", - "Initialize the ValidMind Library with the *code snippet* unique to each model per document, ensuring your test results are uploaded to the correct model and automatically populated in the right document in the ValidMind Platform when you run this notebook.\n", - "\n", - "1. On the left sidebar that appears for your model, select **Getting Started** and select `Development` from the **DOCUMENT** drop-down menu.\n", - "2. Click **Copy snippet to clipboard**.\n", - "3. Next, [load your model identifier credentials from an `.env` file](https://docs.validmind.ai/developer/model-documentation/store-credentials-in-env-file.html) or replace the placeholder with your own code snippet::" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "c51ae01c", - "metadata": {}, - "outputs": [], - "source": [ - "# Load your model identifier credentials from an `.env` file\n", - "\n", - "%load_ext dotenv\n", - "%dotenv .env\n", - "\n", - "# Or replace with your code snippet\n", - "\n", - "import validmind as vm\n", - "\n", - "vm.init(\n", - " # api_host=\"...\",\n", - " # api_key=\"...\",\n", - " # api_secret=\"...\",\n", - " # model=\"...\",\n", - " document=\"documentation\",\n", - ")" - ] - }, - { - "cell_type": "markdown", - "id": "99e9d14f", - "metadata": {}, - "source": [ - "\n", - "\n", - "### Preview the documentation template\n", - "\n", - "Let's verify that you have connected the ValidMind Library to the ValidMind Platform and that the appropriate *template* is selected for your model.\n", - "\n", - "You will upload documentation and test results unique to your model based on this template later on. For now, **take a look at the default structure that the template provides with [the `vm.preview_template()` function](https://docs.validmind.ai/validmind/validmind.html#preview_template)** from the ValidMind library and note the empty sections:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "fd332a9d", - "metadata": {}, - "outputs": [], - "source": [ - "vm.preview_template()" - ] - }, - { - "cell_type": "markdown", - "id": "f805ec38", - "metadata": {}, - "source": [ - "\n", - "\n", - "### Initialize the Python environment\n", - "\n", - "Next, let's import the necessary libraries and set up your Python environment for data analysis:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "8e2127cd", - "metadata": {}, - "outputs": [], - "source": [ - "import xgboost as xgb\n", - "\n", - "%matplotlib inline" - ] - }, - { - "cell_type": "markdown", - "id": "1783e13c", - "metadata": {}, - "source": [ - "\n", - "\n", - "## Explore a ValidMind test\n", - "\n", - "Before we run a test, use [the `vm.tests.list_tests()` function](https://docs.validmind.ai/validmind/validmind/tests.html#list_tests) to return information on out-of-the-box tests available in the ValidMind Library.\n", - "\n", - "Let's assume you want to evaluate *classifier performance* for a model. Classifier performance measures how well a classification model correctly predicts outcomes, using metrics like [precision, recall, and F1 score](https://en.wikipedia.org/wiki/Precision_and_recall).\n", - "\n", - "We'll pass in a `filter` to the `list_tests` function to find the test ID for classifier performance:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "a6a6f715", - "metadata": {}, - "outputs": [], - "source": [ - "vm.tests.list_tests(filter=\"ClassifierPerformance\")" - ] - }, - { - "cell_type": "markdown", - "id": "96a56e4b", - "metadata": {}, - "source": [ - "We've identified from the output that the test ID for the classifier performance test is `validmind.model_validation.ClassifierPerformance`.\n", - "\n", - "Use this ID combined with [the `describe_test()` function](https://docs.validmind.ai/validmind/validmind/tests.html#describe_test) to retrieve more information about the test, including its **Required Inputs**:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "f8a46c7d", - "metadata": {}, - "outputs": [], - "source": [ - "test_id = \"validmind.model_validation.sklearn.ClassifierPerformance\"\n", - "vm.tests.describe_test(test_id)" - ] - }, - { - "cell_type": "markdown", - "id": "97053f50", - "metadata": {}, - "source": [ - "Since this test requires a dataset and a model, you can expect it to throw an error when we run it without passing in either as input:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "f853c272", - "metadata": {}, - "outputs": [], - "source": [ - "try:\n", - " vm.tests.run_test(test_id)\n", - "except Exception as e:\n", - " print(e)" - ] - }, - { - "cell_type": "markdown", - "id": "1a3115ed", - "metadata": {}, - "source": [ - "
Learn more about the individual tests available in the ValidMind Library\n", - "

\n", - "Check out our Explore tests notebook for more code examples and usage of key functions.
" - ] - }, - { - "cell_type": "markdown", - "id": "89da851b", - "metadata": {}, - "source": [ - "\n", - "\n", - "## Working with ValidMind datasets" - ] - }, - { - "cell_type": "markdown", - "id": "50bfdb1b", - "metadata": {}, - "source": [ - "\n", - "\n", - "### Import the sample dataset\n", - "\n", - "Since we need a dataset to run tests, let's import the public [Bank Customer Churn Prediction](https://www.kaggle.com/datasets/shantanudhakadd/bank-customer-churn-prediction) dataset from Kaggle so that we have something to work with.\n", - "\n", - "In our below example, note that:\n", - "\n", - "- The target column, `Exited` has a value of `1` when a customer has churned and `0` otherwise.\n", - "- The ValidMind Library provides a wrapper to automatically load the dataset as a [Pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) object. A Pandas Dataframe is a two-dimensional tabular data structure that makes use of rows and columns." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "3ef2dfbb", - "metadata": {}, - "outputs": [], - "source": [ - "# Import the sample dataset from the library\n", - "\n", - "from validmind.datasets.classification import customer_churn\n", - "\n", - "print(\n", - " f\"Loaded demo dataset with: \\n\\n\\t• Target column: '{customer_churn.target_column}' \\n\\t• Class labels: {customer_churn.class_labels}\"\n", - ")\n", - "\n", - "raw_df = customer_churn.load_data()\n", - "raw_df.head()" - ] - }, - { - "cell_type": "markdown", - "id": "a5a8212f", - "metadata": {}, - "source": [ - "\n", - "\n", - "### Split the dataset\n", - "\n", - "Let's first split our dataset to help assess how well the model generalizes to unseen data.\n", - "\n", - "Use [`preprocess()`](https://docs.validmind.ai/validmind/validmind/datasets/classification/customer_churn.html#preprocess) to split our dataset into three subsets:\n", - "\n", - "1. **train_df** — Used to train the model.\n", - "2. **validation_df** — Used to evaluate the model's performance during training.\n", - "3. **test_df** — Used later on to asses the model's performance on new, unseen data." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "88c87d4a", - "metadata": {}, - "outputs": [], - "source": [ - "train_df, validation_df, test_df = customer_churn.preprocess(raw_df)" - ] - }, - { - "cell_type": "markdown", - "id": "2ae225d7", - "metadata": {}, - "source": [ - "\n", - "\n", - "### Initialize the ValidMind dataset\n", - "\n", - "The next step is to connect your data with a ValidMind `Dataset` object. **This step is always necessary every time you want to connect a dataset to documentation and produce test results through ValidMind,** but you only need to do it once per dataset.\n", - "\n", - "ValidMind dataset objects provide a wrapper to any type of dataset (NumPy, Pandas, Polars, etc.) so that tests can run transparently regardless of the underlying library.\n", - "\n", - "Initialize a ValidMind dataset object using the [`init_dataset` function](https://docs.validmind.ai/validmind/validmind.html#init_dataset) from the ValidMind (`vm`) module. For this example, we'll pass in the following arguments:\n", - "\n", - "- **`dataset`** — The raw dataset that you want to provide as input to tests.\n", - "- **`input_id`** — A unique identifier that allows tracking what inputs are used when running each individual test.\n", - "- **`target_column`** — A required argument if tests require access to true values. This is the name of the target column in the dataset." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "bf0ec747", - "metadata": {}, - "outputs": [], - "source": [ - "vm_train_ds = vm.init_dataset(\n", - " dataset=train_df,\n", - " input_id=\"train_dataset\",\n", - " target_column=customer_churn.target_column,\n", - ")\n", - "\n", - "vm_test_ds = vm.init_dataset(\n", - " dataset=test_df,\n", - " input_id=\"test_dataset\",\n", - " target_column=customer_churn.target_column,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "id": "6d26f65b", - "metadata": {}, - "source": [ - "\n", - "\n", - "## Working with ValidMind models" - ] - }, - { - "cell_type": "markdown", - "id": "6d1677f6", - "metadata": {}, - "source": [ - "\n", - "\n", - "### Train a sample model\n", - "\n", - "To train the model, we need to provide it with:\n", - "\n", - "1. **Inputs** — Features such as customer age, usage, etc.\n", - "2. **Outputs (Expected answers/labels)** — in our case, we would like to know whether the customer churned or not.\n", - "\n", - "Here, we'll use `x_train` and `x_val` to hold the input data (features), and `y_train` and `y_val` to hold the answers (the target we want to predict):" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "39e8c7ea", - "metadata": {}, - "outputs": [], - "source": [ - "x_train = train_df.drop(customer_churn.target_column, axis=1)\n", - "y_train = train_df[customer_churn.target_column]\n", - "x_val = validation_df.drop(customer_churn.target_column, axis=1)\n", - "y_val = validation_df[customer_churn.target_column]" - ] - }, - { - "cell_type": "markdown", - "id": "4ac628eb", - "metadata": {}, - "source": [ - "Next, let's create an *XGBoost classifier model* that will automatically stop training if it doesn't improve after 10 tries. XGBoost is a gradient-boosted tree ensemble that builds trees sequentially, with each tree correcting the errors of the previous ones — typically known for strong predictive performance and built-in regularization to reduce overfitting.\n", - "\n", - "Setting an explicit threshold avoids wasting time and helps prevent further overfitting by stopping training when further improvement isn't happening. We'll also set three evaluation metrics to get a more complete picture of model performance:\n", - "\n", - "1. **error** — Measures how often the model makes incorrect predictions.\n", - "2. **logloss** — Indicates how confident the predictions are.\n", - "3. **auc** — Evaluates how well the model distinguishes between churn and not churn." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "255e3583", - "metadata": {}, - "outputs": [], - "source": [ - "model = xgb.XGBClassifier(early_stopping_rounds=10)\n", - "model.set_params(\n", - " eval_metric=[\"error\", \"logloss\", \"auc\"],\n", - ")" - ] - }, - { - "cell_type": "markdown", - "id": "f6430312", - "metadata": {}, - "source": [ - "Finally, our actual training step — where the model learns patterns from the data, so it can make predictions later:\n", - "\n", - "- The model is trained on `x_train` and `y_train`, and evaluates its performance using `x_val` and `y_val` to check if it’s learning well.\n", - "- To turn off printed output while training, we'll set `verbose` to `False`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "e3aa3657", - "metadata": {}, - "outputs": [], - "source": [ - "model.fit(\n", - " x_train,\n", - " y_train,\n", - " eval_set=[(x_val, y_val)],\n", - " verbose=False,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "id": "c303a046", - "metadata": {}, - "source": [ - "\n", - "\n", - "### Initialize the ValidMind model\n", - "\n", - "You'll also need to initialize a ValidMind model object (`vm_model`) that can be passed to other functions for analysis and tests on the data for our model.\n", - "\n", - "You simply initialize this model object with [`vm.init_model()`](https://docs.validmind.ai/validmind/validmind.html#init_model):" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "4b2be11f", - "metadata": {}, - "outputs": [], - "source": [ - "vm_model_xgb = vm.init_model(\n", - " model,\n", - " input_id=\"xgboost\",\n", - ")" - ] - }, - { - "cell_type": "markdown", - "id": "2fa83857", - "metadata": {}, - "source": [ - "\n", - "\n", - "### Assign predictions\n", - "\n", - "Once the model has been registered, you can assign model predictions to the training and testing datasets.\n", - "\n", - "- The [`assign_predictions()` method](https://docs.validmind.ai/validmind/validmind/vm_models.html#assign_predictions) from the `Dataset` object can link existing predictions to any number of models.\n", - "- This method links the model's class prediction values and probabilities to our `vm_train_ds` and `vm_test_ds` datasets.\n", - "\n", - "If no prediction values are passed, the method will compute predictions automatically:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "229185fd", - "metadata": {}, - "outputs": [], - "source": [ - "vm_train_ds.assign_predictions(model=vm_model_xgb)\n", - "vm_test_ds.assign_predictions(model=vm_model_xgb)" - ] - }, - { - "cell_type": "markdown", - "id": "d0b3312e", - "metadata": {}, - "source": [ - "\n", - "\n", - "## Running ValidMind tests\n", - "\n", - "Now that we know how to initialize ValidMind `dataset` and `model` objects, we're ready to run some tests!\n", - "\n", - "You run individual tests by calling [the `run_test` function](https://docs.validmind.ai/validmind/validmind/tests.html#run_test) provided by the `validmind.tests` module. For the examples below, we'll pass in the following arguments:\n", - "\n", - "- **`test_id`** — The ID of the test to run, as seen in the `ID` column when you run `list_tests`.\n", - "- **`inputs`** — A dictionary of test inputs, such as `dataset`, `model`, `datasets`, or `models`. These are ValidMind objects initialized with [`vm.init_dataset()`](https://docs.validmind.ai/validmind/validmind.html#init_dataset) or [`vm.init_model()`](https://docs.validmind.ai/validmind/validmind.html#init_model)." - ] - }, - { - "cell_type": "markdown", - "id": "96c89f32", - "metadata": {}, - "source": [ - "\n", - "\n", - "### Run classifier performance test with one model\n", - "\n", - "Run `validmind.data_validation.ClassifierPerformance` test with the testing dataset (`vm_test_ds`) and model (`vm_model_xgb`) as inputs:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "85189af9", - "metadata": {}, - "outputs": [], - "source": [ - "result = vm.tests.run_test(\n", - " \"validmind.model_validation.sklearn.ClassifierPerformance\",\n", - " inputs={\n", - " \"dataset\": vm_test_ds,\n", - " \"model\": vm_model_xgb,\n", - " },\n", - ")" - ] - }, - { - "cell_type": "markdown", - "id": "676dff89", - "metadata": {}, - "source": [ - "\n", - "\n", - "### Run comparison tests\n", - "\n", - "To evaluate which models might be a better fit for a use case based on their performance on selected criteria, we can run the same test with multiple models. We'll train three additional models and run the classifier performance test with for all four models using a single `run_test()` call.\n", - "\n", - "
ValidMind helps streamline your documentation and testing.\n", - "

\n", - "You could call run_test() multiple times passing in different inputs, but you can also pass an input_grid object — a dictionary of test input keys and values that allow you to run a single test for a combination of models and datasets.\n", - "

\n", - "With input_grid, run comparison tests for multiple datasets, or even multiple datasets and models simultaneously — input_grid can be used with run_test() for all possible combinations of inputs, generating a cohesive and comprehensive single output.\n", - "
" - ] - }, - { - "cell_type": "markdown", - "id": "3d9912dc", - "metadata": {}, - "source": [ - "*Random forest classifier* models use an ensemble method that builds multiple decision trees and averages their predictions. Random forest is robust to overfitting and handles non-linear relations well, but is typically less interpretable than simpler models:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "1976b7e8", - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.ensemble import RandomForestClassifier\n", - "\n", - "# Train the random forest classifer model\n", - "model_rf = RandomForestClassifier()\n", - "model_rf.fit(x_train, y_train)\n", - "\n", - "# Initialize the ValidMind model object for the random forest classifer model\n", - "vm_model_rf = vm.init_model(\n", - " model_rf,\n", - " input_id=\"random_forest\",\n", - ")\n", - "\n", - "# Assign predictions to the test dataset for the random forest classifer model\n", - "vm_test_ds.assign_predictions(model=vm_model_rf)" - ] - }, - { - "cell_type": "markdown", - "id": "a259927c", - "metadata": {}, - "source": [ - "*Logistic regression* models are linear models that estimate class probabilities via a logistic (sigmoid) function. Logistic regression is highly interpretable with fast training, establishing a strong baseline — however, they struggle when relationships are non-linear as real-world relationships often are:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "90bbf148", - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.linear_model import LogisticRegression\n", - "from sklearn.preprocessing import StandardScaler\n", - "from sklearn.pipeline import Pipeline\n", - "\n", - "# Scaling features ensures the lbfgs solver converges reliably\n", - "model_lr = Pipeline([\n", - " (\"scaler\", StandardScaler()),\n", - " (\"lr\", LogisticRegression()),\n", - "])\n", - "model_lr.fit(x_train, y_train)\n", - "\n", - "# Initialize the ValidMind model object for the logistic regression model\n", - "vm_model_lr = vm.init_model(\n", - " model_lr,\n", - " input_id=\"logistic_regression\",\n", - ")\n", - "\n", - "# Assign predictions to the test dataset for the logistic regression model\n", - "vm_test_ds.assign_predictions(model=vm_model_lr)" - ] - }, - { - "cell_type": "markdown", - "id": "9a666b41", - "metadata": {}, - "source": [ - "*Decision tree classifier* models are a single tree with data split on feature thresholds. Useful as an explanability benchmark, decision trees are easy to visualize and interpret — but are prone to overfitting without pruning or ensemble techniques:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "bfa1e17d", - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.tree import DecisionTreeClassifier\n", - "\n", - "# Train the decision tree classifer model\n", - "model_dt = DecisionTreeClassifier()\n", - "model_dt.fit(x_train, y_train)\n", - "\n", - "# Initialize the ValidMind model object for the decision tree classifier model\n", - "vm_model_dt = vm.init_model(\n", - " model_dt,\n", - " input_id=\"decision_tree\",\n", - ")\n", - "\n", - "# Assign predictions to the test dataset for the decision tree classifiermodel\n", - "vm_test_ds.assign_predictions(model=vm_model_dt)" - ] - }, - { - "cell_type": "markdown", - "id": "2c8f3268", - "metadata": {}, - "source": [ - "\n", - "\n", - "#### Run classifier performance test with multiple models\n", - "\n", - "Now, we'll use the `input_grid` to run the [`ClassifierPerformance` test](https://docs.validmind.ai/tests/model_validation/sklearn/ClassifierPerformance.html) on all four models using the testing dataset (`vm_test_ds`).\n", - "\n", - "When running individual tests, you can use a custom `result_id` to tag the individual result with a unique identifier by appending this `result_id` to the `test_id` with a `:` separator. We'll append an identifier to signify that this test was run on `all_models` to differentiate this test run from other runs:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "2e48ce1e", - "metadata": {}, - "outputs": [], - "source": [ - "perf_comparison_result = vm.tests.run_test(\n", - " \"validmind.model_validation.sklearn.ClassifierPerformance:all_models\",\n", - " input_grid={\n", - " \"dataset\": [vm_test_ds],\n", - " \"model\": [vm_model_xgb, vm_model_rf, vm_model_lr, vm_model_dt],\n", - " },\n", - ")" - ] - }, - { - "cell_type": "markdown", - "id": "81cbf144", - "metadata": {}, - "source": [ - "Our output indicates that the XGBoost and random forest classification models provide the strongest overall classification performance, so we'll continue our testing with those two models as input only." - ] - }, - { - "cell_type": "markdown", - "id": "3d3fb6ec", - "metadata": {}, - "source": [ - "\n", - "\n", - "#### Run classifier performance test with multiple parameter values\n", - "\n", - "Next, let's run the classifier performance test with the `param_grid` object, which runs the same test multiple times with different parameter values. We'll append an identifier to signify that this test was run with our `parameter_grid` configuration:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "d0ad94c9", - "metadata": {}, - "outputs": [], - "source": [ - "parameter_comparison_result = vm.tests.run_test(\n", - " \"validmind.model_validation.sklearn.ClassifierPerformance:parameter_grid\",\n", - " input_grid={\n", - " \"dataset\": [vm_test_ds],\n", - " \"model\": [vm_model_xgb,vm_model_rf]\n", - " },\n", - " param_grid={\n", - " \"average\": [\"macro\", \"micro\"]\n", - " },\n", - ")" - ] - }, - { - "cell_type": "markdown", - "id": "508c7546", - "metadata": {}, - "source": [ - "\n", - "\n", - "#### Run comparison test with multiple datasets\n", - "\n", - "Let's also run the [ROCCurve test](https://docs.validmind.ai/tests/model_validation/sklearn/ROCCurve.html) using `input_grid` to iterate through multiple datasets, which plots the ROC curves for the training (`vm_train_ds`) and test (`vm_test_ds`) datasets side by side — a common scenario when you want to compare the performance of a model on the training and test datasets and visually assess how much performance is lost in the test dataset.\n", - "\n", - "We'll also need to assign predictions to the training dataset for the random forest classifier model, since we didn't do that in our earlier setup:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "96c3b426", - "metadata": {}, - "outputs": [], - "source": [ - "vm_train_ds.assign_predictions(model=vm_model_rf)" - ] - }, - { - "cell_type": "markdown", - "id": "2be82bae", - "metadata": {}, - "source": [ - "We'll append an identifier to signify that this test was run with our `train_vs_test` dataset comparison configuration:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "4056aa1e", - "metadata": {}, - "outputs": [], - "source": [ - "roc_curve_result = vm.tests.run_test(\n", - " \"validmind.model_validation.sklearn.ROCCurve:train_vs_test\",\n", - " input_grid={\n", - " \"dataset\": [vm_train_ds, vm_test_ds],\n", - " \"model\": [vm_model_xgb,vm_model_rf],\n", - " },\n", - ")" - ] - }, - { - "cell_type": "markdown", - "id": "a05570d5", - "metadata": {}, - "source": [ - "\n", - "\n", - "## Work with test results\n", - "\n", - "Every test result returned by the `run_test()` function has a [`.log()` method](https://docs.validmind.ai/validmind/validmind/vm_models.html#TestResult.log) that can be used to send the test results to the ValidMind Platform. When logging individual test results to the platform, you'll need to manually add those results to the desired section of the model documentation.\n", - "\n", - "You can do this through the ValidMind Platform interface after logging your test results ([Learn more ...](https://docs.validmind.ai/developer/model-documentation/work-with-test-results.html)), or directly via the ValidMind Library when calling `.log()` by providing an optional `section_id`. The `section_id` should be a string that matches the title of a section in the documentation template in `snake_case`.\n", - "\n", - "Let's log the results of the classifier performance test (`perf_comparison_result`) and the ROCCurve (`roc_curve_result`) test in the `model_evaluation` section of the documentation — present in the template we previewed in the beginning of this notebook:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "e119bf1e", - "metadata": {}, - "outputs": [], - "source": [ - "perf_comparison_result.log(section_id=\"model_evaluation\")\n", - "roc_curve_result.log(section_id=\"model_evaluation\")" - ] - }, - { - "cell_type": "markdown", - "id": "ab5205ee", - "metadata": {}, - "source": [ - "Finally, let's head to the model we connected to at the beginning of this notebook and view our inserted test results in the updated documentation ([Need more help?](https://docs.validmind.ai/guide/model-documentation/working-with-model-documentation.html)):\n", - "\n", - "1. From the **Inventory** in the ValidMind Platform, go to the model you connected to earlier.\n", - "\n", - "2. In the left sidebar that appears for your model, click **Development** under Documents.\n", - "\n", - "3. Expand the **3.2. Model Evaluation** section.\n", - "\n", - "4. Confirm that `perf_comparison_result` and `roc_curve_result` display in this section as expected." - ] - }, - { - "cell_type": "markdown", - "id": "eb196aac", - "metadata": {}, - "source": [ - "\n", - "\n", - "## Next steps\n", - "\n", - "Now that you know how to run comparison tests with the ValidMind Library, you’re ready to take the next step. Extend the functionality of `run_test()` with your own custom test functions that can be incorporated into documentation templates just like any default out-of-the-box ValidMind test.\n", - "\n", - "
Learn how to implement custom tests with the ValidMind Library.\n", - "

\n", - "Check out our Implement comparison tests notebook for code examples and usage of key functions.
" - ] - }, - { - "cell_type": "markdown", - "id": "083c1d8d", - "metadata": {}, - "source": [ - "\n", - "\n", - "### Discover more learning resources\n", - "\n", - "We offer many interactive notebooks to help you automate testing, documenting, validating, and more:\n", - "\n", - "- [Run tests & test suites](https://docs.validmind.ai/developer/how-to/testing-overview.html)\n", - "- [Use ValidMind Library features](https://docs.validmind.ai/developer/how-to/feature-overview.html)\n", - "- [Code samples by use case](https://docs.validmind.ai/guide/samples-jupyter-notebooks.html)\n", - "\n", - "Or, visit our [documentation](https://docs.validmind.ai/) to learn more about ValidMind." - ] - }, - { - "cell_type": "markdown", - "id": "efba0f57", - "metadata": {}, - "source": [ - "\n", - "\n", - "## Upgrade ValidMind\n", - "\n", - "
After installing ValidMind, you’ll want to periodically make sure you are on the latest version to access any new features and other enhancements.
\n", - "\n", - "Retrieve the information for the currently installed version of ValidMind:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "0d35972c", - "metadata": { - "vscode": { - "languageId": "plaintext" - } - }, - "outputs": [], - "source": [ - "%pip show validmind" - ] - }, - { - "cell_type": "markdown", - "id": "abcd07ef", - "metadata": {}, - "source": [ - "If the version returned is lower than the version indicated in our [production open-source code](https://github.com/validmind/validmind-library/blob/prod/validmind/__version__.py), restart your notebook and run:\n", - "\n", - "```bash\n", - "%pip install --upgrade validmind\n", - "```" - ] - }, - { - "cell_type": "markdown", - "id": "5fe70b90", - "metadata": {}, - "source": [ - "You may need to restart your kernel after running the upgrade package for changes to be applied." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "\n", - "\n", - "\n", - "***\n", - "\n", - "Copyright © 2023-2026 ValidMind Inc. All rights reserved.
\n", - "Refer to [LICENSE](https://github.com/validmind/validmind-library/blob/main/LICENSE) for details.
\n", - "SPDX-License-Identifier: AGPL-3.0 AND ValidMind Commercial
" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "name": "python", - "version": "3.10" + "cells": [ + { + "cell_type": "markdown", + "id": "1d29276f", + "metadata": {}, + "source": [ + "# Run comparison tests\n", + "\n", + "Learn how to use the ValidMind Library to run comparison tests that take any datasets or models as inputs. Identify comparison tests to run, initialize ValidMind dataset and model objects in preparation for passing them to tests, and then run tests — generating outputs automatically logged to your model's documentation in the ValidMind Platform.\n", + "\n", + "
We recommend that you first complete our introductory notebook on running tests.\n", + "

\n", + "Run dataset-based tests
" + ] + }, + { + "cell_type": "markdown", + "id": "61065444", + "metadata": {}, + "source": [ + "::: {.content-hidden when-format=\"html\"}\n", + "## Contents \n", + "- [About ValidMind](#toc1__) \n", + " - [Before you begin](#toc1_1__) \n", + " - [New to ValidMind?](#toc1_2__) \n", + " - [Key concepts](#toc1_3__) \n", + "- [Setting up](#toc2__) \n", + " - [Install the ValidMind Library](#toc2_1__) \n", + " - [Initialize the ValidMind Library](#toc2_2__) \n", + " - [Register sample model](#toc2_2_1__) \n", + " - [Apply documentation template](#toc2_2_2__) \n", + " - [Get your code snippet](#toc2_2_3__) \n", + " - [Preview the documentation template](#toc2_3__) \n", + " - [Initialize the Python environment](#toc2_4__) \n", + "- [Explore a ValidMind test](#toc3__) \n", + "- [Working with ValidMind datasets](#toc4__) \n", + " - [Import the sample dataset](#toc4_1__) \n", + " - [Split the dataset](#toc4_2__) \n", + " - [Initialize the ValidMind dataset](#toc4_3__) \n", + "- [Working with ValidMind models](#toc5__) \n", + " - [Train a sample model](#toc5_1__) \n", + " - [Initialize the ValidMind model](#toc5_2__) \n", + " - [Assign predictions](#toc5_3__) \n", + "- [Running ValidMind tests](#toc6__) \n", + " - [Run classifier performance test with one model](#toc6_1__) \n", + " - [Run comparison tests](#toc6_2__) \n", + " - [Run classifier performance test with multiple models](#toc6_2_1__) \n", + " - [Run classifier performance test with multiple parameter values](#toc6_2_2__) \n", + " - [Run comparison test with multiple datasets](#toc6_2_3__) \n", + "- [Work with test results](#toc7__) \n", + "- [Next steps](#toc8__) \n", + " - [Discover more learning resources](#toc8_1__) \n", + "- [Upgrade ValidMind](#toc9__) \n", + "\n", + ":::\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "67a4d9dc", + "metadata": {}, + "source": [ + "\n", + "\n", + "## About ValidMind\n", + "\n", + "ValidMind is a suite of tools for managing model risk, including risk associated with AI and statistical models. \n", + "\n", + "You use the ValidMind Library to automate documentation and validation tests, and then use the ValidMind Platform to collaborate on model documentation. Together, these products simplify model risk management, facilitate compliance with regulations and institutional standards, and enhance collaboration between yourself and model validators." + ] + }, + { + "cell_type": "markdown", + "id": "eeb30df8", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Before you begin\n", + "\n", + "This notebook assumes you have basic familiarity with Python, including an understanding of how functions work. If you are new to Python, you can still run the notebook but we recommend further familiarizing yourself with the language. \n", + "\n", + "If you encounter errors due to missing modules in your Python environment, install the modules with `pip install`, and then re-run the notebook. For more help, refer to [Installing Python Modules](https://docs.python.org/3/installing/index.html)." + ] + }, + { + "cell_type": "markdown", + "id": "293c3f98", + "metadata": {}, + "source": [ + "\n", + "\n", + "### New to ValidMind?\n", + "\n", + "If you haven't already seen our documentation on the [ValidMind Library](https://docs.validmind.ai/developer/validmind-library.html), we recommend you begin by exploring the available resources in this section. There, you can learn more about documenting models and running tests, as well as find code samples and our Python Library API reference.\n", + "\n", + "
For access to all features available in this notebook, you'll need access to a ValidMind account.\n", + "

\n", + "Register with ValidMind
" + ] + }, + { + "cell_type": "markdown", + "id": "4fc836d0", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Key concepts\n", + "\n", + "**Model documentation**: A structured and detailed record pertaining to a model, encompassing key components such as its underlying assumptions, methodologies, data sources, inputs, performance metrics, evaluations, limitations, and intended uses. It serves to ensure transparency, adherence to regulatory requirements, and a clear understanding of potential risks associated with the model’s application.\n", + "\n", + "**Documentation template**: Functions as a test suite and lays out the structure of model documentation, segmented into various sections and sub-sections. Documentation templates define the structure of your model documentation, specifying the tests that should be run, and how the results should be displayed.\n", + "\n", + "**Tests**: A function contained in the ValidMind Library, designed to run a specific quantitative test on the dataset or model. Tests are the building blocks of ValidMind, used to evaluate and document models and datasets, and can be run individually or as part of a suite defined by your model documentation template.\n", + "\n", + "**Metrics**: A subset of tests that do not have thresholds. In the context of this notebook, metrics and tests can be thought of as interchangeable concepts.\n", + "\n", + "**Custom metrics**: Custom metrics are functions that you define to evaluate your model or dataset. These functions can be registered with the ValidMind Library to be used in the ValidMind Platform.\n", + "\n", + "**Inputs**: Objects to be evaluated and documented in the ValidMind Library. They can be any of the following:\n", + "\n", + " - **model**: A single model that has been initialized in ValidMind with [`vm.init_model()`](https://docs.validmind.ai/validmind/validmind.html#init_model).\n", + " - **dataset**: Single dataset that has been initialized in ValidMind with [`vm.init_dataset()`](https://docs.validmind.ai/validmind/validmind.html#init_dataset).\n", + " - **models**: A list of ValidMind models - usually this is used when you want to compare multiple models in your custom metric.\n", + " - **datasets**: A list of ValidMind datasets - usually this is used when you want to compare multiple datasets in your custom metric. (Learn more: [Run tests with multiple datasets](https://docs.validmind.ai/notebooks/how_to/tests/run_tests/configure_tests/run_tests_that_require_multiple_datasets.html))\n", + "\n", + "**Parameters**: Additional arguments that can be passed when running a ValidMind test, used to pass additional information to a metric, customize its behavior, or provide additional context.\n", + "\n", + "**Outputs**: Custom metrics can return elements like tables or plots. Tables may be a list of dictionaries (each representing a row) or a pandas DataFrame. Plots may be matplotlib or plotly figures.\n", + "\n", + "**Test suites**: Collections of tests designed to run together to automate and generate model documentation end-to-end for specific use-cases.\n", + "\n", + "Example: the [`classifier_full_suite`](https://docs.validmind.ai/validmind/validmind/test_suites/classifier.html#ClassifierFullSuite) test suite runs tests from the [`tabular_dataset`](https://docs.validmind.ai/validmind/validmind/test_suites/tabular_datasets.html) and [`classifier`](https://docs.validmind.ai/validmind/validmind/test_suites/classifier.html) test suites to fully document the data and model sections for binary classification model use-cases." + ] + }, + { + "cell_type": "markdown", + "id": "8d52b6e0", + "metadata": {}, + "source": [ + "\n", + "\n", + "## Setting up" + ] + }, + { + "cell_type": "markdown", + "id": "e0d2daaf", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Install the ValidMind Library\n", + "\n", + "
Recommended Python versions\n", + "

\n", + "Python 3.8 <= x <= 3.11
\n", + "\n", + "To install the library:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fc97888f", + "metadata": {}, + "outputs": [], + "source": [ + "%pip install -q validmind" + ] + }, + { + "cell_type": "markdown", + "id": "1ff56571", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Initialize the ValidMind Library" + ] + }, + { + "cell_type": "markdown", + "id": "c4d9f164", + "metadata": {}, + "source": [ + "\n", + "\n", + "#### Register sample model\n", + "\n", + "Let's first register a sample model for use with this notebook.\n", + "\n", + "1. In a browser, [log in to ValidMind](https://docs.validmind.ai/guide/configuration/log-in-to-validmind.html).\n", + "\n", + "2. In the left sidebar, navigate to **Inventory** and click **+ Register Model**.\n", + "\n", + "3. Enter the model details and click **Next >** to continue to assignment of model stakeholders. ([Need more help?](https://docs.validmind.ai/guide/model-inventory/register-models-in-inventory.html))\n", + "\n", + "4. Select your own name under the **MODEL OWNER** drop-down.\n", + "\n", + "5. Click **Register Model** to add the model to your inventory." + ] + }, + { + "cell_type": "markdown", + "id": "852392e5", + "metadata": {}, + "source": [ + "\n", + "\n", + "#### Apply documentation template\n", + "\n", + "Once you've registered your model, let's select a documentation template. A template predefines sections for your model documentation and provides a general outline to follow, making the documentation process much easier.\n", + "\n", + "1. In the left sidebar that appears for your model, click **Documents** and select **Development**.\n", + "\n", + "2. Under **TEMPLATE**, select `Binary classification`.\n", + "\n", + "3. Click **Use Template** to apply the template." + ] + }, + { + "cell_type": "markdown", + "id": "6490e991", + "metadata": {}, + "source": [ + "\n", + "\n", + "#### Get your code snippet\n", + "\n", + "Initialize the ValidMind Library with the *code snippet* unique to each model per document, ensuring your test results are uploaded to the correct model and automatically populated in the right document in the ValidMind Platform when you run this notebook.\n", + "\n", + "1. On the left sidebar that appears for your model, select **Getting Started** and select `Development` from the **DOCUMENT** drop-down menu.\n", + "2. Click **Copy snippet to clipboard**.\n", + "3. Next, [load your model identifier credentials from an `.env` file](https://docs.validmind.ai/developer/model-documentation/store-credentials-in-env-file.html) or replace the placeholder with your own code snippet::" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c51ae01c", + "metadata": {}, + "outputs": [], + "source": [ + "# Load your model identifier credentials from an `.env` file\n", + "\n", + "%load_ext dotenv\n", + "%dotenv .env\n", + "\n", + "# Or replace with your code snippet\n", + "\n", + "import validmind as vm\n", + "\n", + "vm.init(\n", + " # api_host=\"...\",\n", + " # api_key=\"...\",\n", + " # api_secret=\"...\",\n", + " # model=\"...\",\n", + " document=\"documentation\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "99e9d14f", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Preview the documentation template\n", + "\n", + "Let's verify that you have connected the ValidMind Library to the ValidMind Platform and that the appropriate *template* is selected for your model.\n", + "\n", + "You will upload documentation and test results unique to your model based on this template later on. For now, **take a look at the default structure that the template provides with [the `vm.preview_template()` function](https://docs.validmind.ai/validmind/validmind.html#preview_template)** from the ValidMind library and note the empty sections:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fd332a9d", + "metadata": {}, + "outputs": [], + "source": [ + "vm.preview_template()" + ] + }, + { + "cell_type": "markdown", + "id": "f805ec38", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Initialize the Python environment\n", + "\n", + "Next, let's import the necessary libraries and set up your Python environment for data analysis:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8e2127cd", + "metadata": {}, + "outputs": [], + "source": [ + "import xgboost as xgb\n", + "\n", + "%matplotlib inline" + ] + }, + { + "cell_type": "markdown", + "id": "1783e13c", + "metadata": {}, + "source": [ + "\n", + "\n", + "## Explore a ValidMind test\n", + "\n", + "Before we run a test, use [the `vm.tests.list_tests()` function](https://docs.validmind.ai/validmind/validmind/tests.html#list_tests) to return information on out-of-the-box tests available in the ValidMind Library.\n", + "\n", + "Let's assume you want to evaluate *classifier performance* for a model. Classifier performance measures how well a classification model correctly predicts outcomes, using metrics like [precision, recall, and F1 score](https://en.wikipedia.org/wiki/Precision_and_recall).\n", + "\n", + "We'll pass in a `filter` to the `list_tests` function to find the test ID for classifier performance:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a6a6f715", + "metadata": {}, + "outputs": [], + "source": [ + "vm.tests.list_tests(filter=\"ClassifierPerformance\")" + ] + }, + { + "cell_type": "markdown", + "id": "96a56e4b", + "metadata": {}, + "source": [ + "We've identified from the output that the test ID for the classifier performance test is `validmind.model_validation.ClassifierPerformance`.\n", + "\n", + "Use this ID combined with [the `describe_test()` function](https://docs.validmind.ai/validmind/validmind/tests.html#describe_test) to retrieve more information about the test, including its **Required Inputs**:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f8a46c7d", + "metadata": {}, + "outputs": [], + "source": [ + "test_id = \"validmind.model_validation.sklearn.ClassifierPerformance\"\n", + "vm.tests.describe_test(test_id)" + ] + }, + { + "cell_type": "markdown", + "id": "97053f50", + "metadata": {}, + "source": [ + "Since this test requires a dataset and a model, you can expect it to throw an error when we run it without passing in either as input:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f853c272", + "metadata": {}, + "outputs": [], + "source": [ + "try:\n", + " vm.tests.run_test(test_id)\n", + "except Exception as e:\n", + " print(e)" + ] + }, + { + "cell_type": "markdown", + "id": "1a3115ed", + "metadata": {}, + "source": [ + "
Learn more about the individual tests available in the ValidMind Library\n", + "

\n", + "Check out our Explore tests notebook for more code examples and usage of key functions.
" + ] + }, + { + "cell_type": "markdown", + "id": "89da851b", + "metadata": {}, + "source": [ + "\n", + "\n", + "## Working with ValidMind datasets" + ] + }, + { + "cell_type": "markdown", + "id": "50bfdb1b", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Import the sample dataset\n", + "\n", + "Since we need a dataset to run tests, let's import the public [Bank Customer Churn Prediction](https://www.kaggle.com/datasets/shantanudhakadd/bank-customer-churn-prediction) dataset from Kaggle so that we have something to work with.\n", + "\n", + "In our below example, note that:\n", + "\n", + "- The target column, `Exited` has a value of `1` when a customer has churned and `0` otherwise.\n", + "- The ValidMind Library provides a wrapper to automatically load the dataset as a [Pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) object. A Pandas Dataframe is a two-dimensional tabular data structure that makes use of rows and columns." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3ef2dfbb", + "metadata": {}, + "outputs": [], + "source": [ + "# Import the sample dataset from the library\n", + "\n", + "from validmind.datasets.classification import customer_churn\n", + "\n", + "print(\n", + " f\"Loaded demo dataset with: \\n\\n\\t• Target column: '{customer_churn.target_column}' \\n\\t• Class labels: {customer_churn.class_labels}\"\n", + ")\n", + "\n", + "raw_df = customer_churn.load_data()\n", + "raw_df.head()" + ] + }, + { + "cell_type": "markdown", + "id": "a5a8212f", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Split the dataset\n", + "\n", + "Let's first split our dataset to help assess how well the model generalizes to unseen data.\n", + "\n", + "Use [`preprocess()`](https://docs.validmind.ai/validmind/validmind/datasets/classification/customer_churn.html#preprocess) to split our dataset into three subsets:\n", + "\n", + "1. **train_df** — Used to train the model.\n", + "2. **validation_df** — Used to evaluate the model's performance during training.\n", + "3. **test_df** — Used later on to asses the model's performance on new, unseen data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "88c87d4a", + "metadata": {}, + "outputs": [], + "source": [ + "train_df, validation_df, test_df = customer_churn.preprocess(raw_df)" + ] + }, + { + "cell_type": "markdown", + "id": "2ae225d7", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Initialize the ValidMind dataset\n", + "\n", + "The next step is to connect your data with a ValidMind `Dataset` object. **This step is always necessary every time you want to connect a dataset to documentation and produce test results through ValidMind,** but you only need to do it once per dataset.\n", + "\n", + "ValidMind dataset objects provide a wrapper to any type of dataset (NumPy, Pandas, Polars, etc.) so that tests can run transparently regardless of the underlying library.\n", + "\n", + "Initialize a ValidMind dataset object using the [`init_dataset` function](https://docs.validmind.ai/validmind/validmind.html#init_dataset) from the ValidMind (`vm`) module. For this example, we'll pass in the following arguments:\n", + "\n", + "- **`dataset`** — The raw dataset that you want to provide as input to tests.\n", + "- **`input_id`** — A unique identifier that allows tracking what inputs are used when running each individual test.\n", + "- **`target_column`** — A required argument if tests require access to true values. This is the name of the target column in the dataset." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bf0ec747", + "metadata": {}, + "outputs": [], + "source": [ + "vm_train_ds = vm.init_dataset(\n", + " dataset=train_df,\n", + " input_id=\"train_dataset\",\n", + " target_column=customer_churn.target_column,\n", + ")\n", + "\n", + "vm_test_ds = vm.init_dataset(\n", + " dataset=test_df,\n", + " input_id=\"test_dataset\",\n", + " target_column=customer_churn.target_column,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "6d26f65b", + "metadata": {}, + "source": [ + "\n", + "\n", + "## Working with ValidMind models" + ] + }, + { + "cell_type": "markdown", + "id": "6d1677f6", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Train a sample model\n", + "\n", + "To train the model, we need to provide it with:\n", + "\n", + "1. **Inputs** — Features such as customer age, usage, etc.\n", + "2. **Outputs (Expected answers/labels)** — in our case, we would like to know whether the customer churned or not.\n", + "\n", + "Here, we'll use `x_train` and `x_val` to hold the input data (features), and `y_train` and `y_val` to hold the answers (the target we want to predict):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "39e8c7ea", + "metadata": {}, + "outputs": [], + "source": [ + "x_train = train_df.drop(customer_churn.target_column, axis=1)\n", + "y_train = train_df[customer_churn.target_column]\n", + "x_val = validation_df.drop(customer_churn.target_column, axis=1)\n", + "y_val = validation_df[customer_churn.target_column]" + ] + }, + { + "cell_type": "markdown", + "id": "4ac628eb", + "metadata": {}, + "source": [ + "Next, let's create an *XGBoost classifier model* that will automatically stop training if it doesn't improve after 10 tries. XGBoost is a gradient-boosted tree ensemble that builds trees sequentially, with each tree correcting the errors of the previous ones — typically known for strong predictive performance and built-in regularization to reduce overfitting.\n", + "\n", + "Setting an explicit threshold avoids wasting time and helps prevent further overfitting by stopping training when further improvement isn't happening. We'll also set three evaluation metrics to get a more complete picture of model performance:\n", + "\n", + "1. **error** — Measures how often the model makes incorrect predictions.\n", + "2. **logloss** — Indicates how confident the predictions are.\n", + "3. **auc** — Evaluates how well the model distinguishes between churn and not churn." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "255e3583", + "metadata": {}, + "outputs": [], + "source": [ + "model = xgb.XGBClassifier(early_stopping_rounds=10)\n", + "model.set_params(\n", + " eval_metric=[\"error\", \"logloss\", \"auc\"],\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "f6430312", + "metadata": {}, + "source": [ + "Finally, our actual training step — where the model learns patterns from the data, so it can make predictions later:\n", + "\n", + "- The model is trained on `x_train` and `y_train`, and evaluates its performance using `x_val` and `y_val` to check if it’s learning well.\n", + "- To turn off printed output while training, we'll set `verbose` to `False`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e3aa3657", + "metadata": {}, + "outputs": [], + "source": [ + "model.fit(\n", + " x_train,\n", + " y_train,\n", + " eval_set=[(x_val, y_val)],\n", + " verbose=False,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "c303a046", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Initialize the ValidMind model\n", + "\n", + "You'll also need to initialize a ValidMind model object (`vm_model`) that can be passed to other functions for analysis and tests on the data for our model.\n", + "\n", + "You simply initialize this model object with [`vm.init_model()`](https://docs.validmind.ai/validmind/validmind.html#init_model):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4b2be11f", + "metadata": {}, + "outputs": [], + "source": [ + "vm_model_xgb = vm.init_model(\n", + " model,\n", + " input_id=\"xgboost\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "2fa83857", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Assign predictions\n", + "\n", + "Once the model has been registered, you can assign model predictions to the training and testing datasets.\n", + "\n", + "- The [`assign_predictions()` method](https://docs.validmind.ai/validmind/validmind/vm_models.html#assign_predictions) from the `Dataset` object can link existing predictions to any number of models.\n", + "- This method links the model's class prediction values and probabilities to our `vm_train_ds` and `vm_test_ds` datasets.\n", + "\n", + "If no prediction values are passed, the method will compute predictions automatically:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "229185fd", + "metadata": {}, + "outputs": [], + "source": [ + "vm_train_ds.assign_predictions(model=vm_model_xgb)\n", + "vm_test_ds.assign_predictions(model=vm_model_xgb)" + ] + }, + { + "cell_type": "markdown", + "id": "d0b3312e", + "metadata": {}, + "source": [ + "\n", + "\n", + "## Running ValidMind tests\n", + "\n", + "Now that we know how to initialize ValidMind `dataset` and `model` objects, we're ready to run some tests!\n", + "\n", + "You run individual tests by calling [the `run_test` function](https://docs.validmind.ai/validmind/validmind/tests.html#run_test) provided by the `validmind.tests` module. For the examples below, we'll pass in the following arguments:\n", + "\n", + "- **`test_id`** — The ID of the test to run, as seen in the `ID` column when you run `list_tests`.\n", + "- **`inputs`** — A dictionary of test inputs, such as `dataset`, `model`, `datasets`, or `models`. These are ValidMind objects initialized with [`vm.init_dataset()`](https://docs.validmind.ai/validmind/validmind.html#init_dataset) or [`vm.init_model()`](https://docs.validmind.ai/validmind/validmind.html#init_model)." + ] + }, + { + "cell_type": "markdown", + "id": "96c89f32", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Run classifier performance test with one model\n", + "\n", + "Run `validmind.data_validation.ClassifierPerformance` test with the testing dataset (`vm_test_ds`) and model (`vm_model_xgb`) as inputs:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "85189af9", + "metadata": {}, + "outputs": [], + "source": [ + "result = vm.tests.run_test(\n", + " \"validmind.model_validation.sklearn.ClassifierPerformance\",\n", + " inputs={\n", + " \"dataset\": vm_test_ds,\n", + " \"model\": vm_model_xgb,\n", + " },\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "676dff89", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Run comparison tests\n", + "\n", + "To evaluate which models might be a better fit for a use case based on their performance on selected criteria, we can run the same test with multiple models. We'll train three additional models and run the classifier performance test with for all four models using a single `run_test()` call.\n", + "\n", + "
ValidMind helps streamline your documentation and testing.\n", + "

\n", + "You could call run_test() multiple times passing in different inputs, but you can also pass an input_grid object — a dictionary of test input keys and values that allow you to run a single test for a combination of models and datasets.\n", + "

\n", + "With input_grid, run comparison tests for multiple datasets, or even multiple datasets and models simultaneously — input_grid can be used with run_test() for all possible combinations of inputs, generating a cohesive and comprehensive single output.\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "3d9912dc", + "metadata": {}, + "source": [ + "*Random forest classifier* models use an ensemble method that builds multiple decision trees and averages their predictions. Random forest is robust to overfitting and handles non-linear relations well, but is typically less interpretable than simpler models:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1976b7e8", + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.ensemble import RandomForestClassifier\n", + "\n", + "# Train the random forest classifer model\n", + "model_rf = RandomForestClassifier()\n", + "model_rf.fit(x_train, y_train)\n", + "\n", + "# Initialize the ValidMind model object for the random forest classifer model\n", + "vm_model_rf = vm.init_model(\n", + " model_rf,\n", + " input_id=\"random_forest\",\n", + ")\n", + "\n", + "# Assign predictions to the test dataset for the random forest classifer model\n", + "vm_test_ds.assign_predictions(model=vm_model_rf)" + ] + }, + { + "cell_type": "markdown", + "id": "a259927c", + "metadata": {}, + "source": [ + "*Logistic regression* models are linear models that estimate class probabilities via a logistic (sigmoid) function. Logistic regression is highly interpretable with fast training, establishing a strong baseline — however, they struggle when relationships are non-linear as real-world relationships often are:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "90bbf148", + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.linear_model import LogisticRegression\n", + "from sklearn.preprocessing import StandardScaler\n", + "from sklearn.pipeline import Pipeline\n", + "\n", + "# Scaling features ensures the lbfgs solver converges reliably\n", + "model_lr = Pipeline([\n", + " (\"scaler\", StandardScaler()),\n", + " (\"lr\", LogisticRegression()),\n", + "])\n", + "model_lr.fit(x_train, y_train)\n", + "\n", + "# Initialize the ValidMind model object for the logistic regression model\n", + "vm_model_lr = vm.init_model(\n", + " model_lr,\n", + " input_id=\"logistic_regression\",\n", + ")\n", + "\n", + "# Assign predictions to the test dataset for the logistic regression model\n", + "vm_test_ds.assign_predictions(model=vm_model_lr)" + ] + }, + { + "cell_type": "markdown", + "id": "9a666b41", + "metadata": {}, + "source": [ + "*Decision tree classifier* models are a single tree with data split on feature thresholds. Useful as an explanability benchmark, decision trees are easy to visualize and interpret — but are prone to overfitting without pruning or ensemble techniques:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bfa1e17d", + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.tree import DecisionTreeClassifier\n", + "\n", + "# Train the decision tree classifer model\n", + "model_dt = DecisionTreeClassifier()\n", + "model_dt.fit(x_train, y_train)\n", + "\n", + "# Initialize the ValidMind model object for the decision tree classifier model\n", + "vm_model_dt = vm.init_model(\n", + " model_dt,\n", + " input_id=\"decision_tree\",\n", + ")\n", + "\n", + "# Assign predictions to the test dataset for the decision tree classifiermodel\n", + "vm_test_ds.assign_predictions(model=vm_model_dt)" + ] + }, + { + "cell_type": "markdown", + "id": "2c8f3268", + "metadata": {}, + "source": [ + "\n", + "\n", + "#### Run classifier performance test with multiple models\n", + "\n", + "Now, we'll use the `input_grid` to run the [`ClassifierPerformance` test](https://docs.validmind.ai/tests/model_validation/sklearn/ClassifierPerformance.html) on all four models using the testing dataset (`vm_test_ds`).\n", + "\n", + "When running individual tests, you can use a custom `result_id` to tag the individual result with a unique identifier by appending this `result_id` to the `test_id` with a `:` separator. We'll append an identifier to signify that this test was run on `all_models` to differentiate this test run from other runs:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2e48ce1e", + "metadata": {}, + "outputs": [], + "source": [ + "perf_comparison_result = vm.tests.run_test(\n", + " \"validmind.model_validation.sklearn.ClassifierPerformance:all_models\",\n", + " input_grid={\n", + " \"dataset\": [vm_test_ds],\n", + " \"model\": [vm_model_xgb, vm_model_rf, vm_model_lr, vm_model_dt],\n", + " },\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "81cbf144", + "metadata": {}, + "source": [ + "Our output indicates that the XGBoost and random forest classification models provide the strongest overall classification performance, so we'll continue our testing with those two models as input only." + ] + }, + { + "cell_type": "markdown", + "id": "3d3fb6ec", + "metadata": {}, + "source": [ + "\n", + "\n", + "#### Run classifier performance test with multiple parameter values\n", + "\n", + "Next, let's run the classifier performance test with the `param_grid` object, which runs the same test multiple times with different parameter values. We'll append an identifier to signify that this test was run with our `parameter_grid` configuration:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d0ad94c9", + "metadata": {}, + "outputs": [], + "source": [ + "parameter_comparison_result = vm.tests.run_test(\n", + " \"validmind.model_validation.sklearn.ClassifierPerformance:parameter_grid\",\n", + " input_grid={\n", + " \"dataset\": [vm_test_ds],\n", + " \"model\": [vm_model_xgb,vm_model_rf]\n", + " },\n", + " param_grid={\n", + " \"average\": [\"macro\", \"micro\"]\n", + " },\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "508c7546", + "metadata": {}, + "source": [ + "\n", + "\n", + "#### Run comparison test with multiple datasets\n", + "\n", + "Let's also run the [ROCCurve test](https://docs.validmind.ai/tests/model_validation/sklearn/ROCCurve.html) using `input_grid` to iterate through multiple datasets, which plots the ROC curves for the training (`vm_train_ds`) and test (`vm_test_ds`) datasets side by side — a common scenario when you want to compare the performance of a model on the training and test datasets and visually assess how much performance is lost in the test dataset.\n", + "\n", + "We'll also need to assign predictions to the training dataset for the random forest classifier model, since we didn't do that in our earlier setup:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "96c3b426", + "metadata": {}, + "outputs": [], + "source": [ + "vm_train_ds.assign_predictions(model=vm_model_rf)" + ] + }, + { + "cell_type": "markdown", + "id": "2be82bae", + "metadata": {}, + "source": [ + "We'll append an identifier to signify that this test was run with our `train_vs_test` dataset comparison configuration:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4056aa1e", + "metadata": {}, + "outputs": [], + "source": [ + "roc_curve_result = vm.tests.run_test(\n", + " \"validmind.model_validation.sklearn.ROCCurve:train_vs_test\",\n", + " input_grid={\n", + " \"dataset\": [vm_train_ds, vm_test_ds],\n", + " \"model\": [vm_model_xgb,vm_model_rf],\n", + " },\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "a05570d5", + "metadata": {}, + "source": [ + "\n", + "\n", + "## Work with test results\n", + "\n", + "Every test result returned by the `run_test()` function has a [`.log()` method](https://docs.validmind.ai/validmind/validmind/vm_models.html#TestResult.log) that can be used to send the test results to the ValidMind Platform. When logging individual test results to the platform, you'll need to manually add those results to the desired section of the model documentation.\n", + "\n", + "You can do this through the ValidMind Platform interface after logging your test results ([Learn more ...](https://docs.validmind.ai/developer/model-documentation/work-with-test-results.html)), or directly via the ValidMind Library when calling `.log()` by providing an optional `section_id`. The `section_id` should be a string that matches the title of a section in the documentation template in `snake_case`.\n", + "\n", + "Let's log the results of the classifier performance test (`perf_comparison_result`) and the ROCCurve (`roc_curve_result`) test in the `model_evaluation` section of the documentation — present in the template we previewed in the beginning of this notebook:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e119bf1e", + "metadata": {}, + "outputs": [], + "source": [ + "perf_comparison_result.log(section_id=\"model_evaluation\")\n", + "roc_curve_result.log(section_id=\"model_evaluation\")" + ] + }, + { + "cell_type": "markdown", + "id": "ab5205ee", + "metadata": {}, + "source": [ + "Finally, let's head to the model we connected to at the beginning of this notebook and view our inserted test results in the updated documentation ([Need more help?](https://docs.validmind.ai/guide/model-documentation/working-with-model-documentation.html)):\n", + "\n", + "1. From the **Inventory** in the ValidMind Platform, go to the model you connected to earlier.\n", + "\n", + "2. In the left sidebar that appears for your model, click **Development** under Documents.\n", + "\n", + "3. Expand the **3.2. Model Evaluation** section.\n", + "\n", + "4. Confirm that `perf_comparison_result` and `roc_curve_result` display in this section as expected." + ] + }, + { + "cell_type": "markdown", + "id": "eb196aac", + "metadata": {}, + "source": [ + "\n", + "\n", + "## Next steps\n", + "\n", + "Now that you know how to run comparison tests with the ValidMind Library, you’re ready to take the next step. Extend the functionality of `run_test()` with your own custom test functions that can be incorporated into documentation templates just like any default out-of-the-box ValidMind test.\n", + "\n", + "
Learn how to implement custom tests with the ValidMind Library.\n", + "

\n", + "Check out our Implement comparison tests notebook for code examples and usage of key functions.
" + ] + }, + { + "cell_type": "markdown", + "id": "083c1d8d", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Discover more learning resources\n", + "\n", + "We offer many interactive notebooks to help you automate testing, documenting, validating, and more:\n", + "\n", + "- [Run tests & test suites](https://docs.validmind.ai/developer/how-to/testing-overview.html)\n", + "- [Use ValidMind Library features](https://docs.validmind.ai/developer/how-to/feature-overview.html)\n", + "- [Code samples by use case](https://docs.validmind.ai/guide/samples-jupyter-notebooks.html)\n", + "\n", + "Or, visit our [documentation](https://docs.validmind.ai/) to learn more about ValidMind." + ] + }, + { + "cell_type": "markdown", + "id": "efba0f57", + "metadata": {}, + "source": [ + "\n", + "\n", + "## Upgrade ValidMind\n", + "\n", + "
After installing ValidMind, you’ll want to periodically make sure you are on the latest version to access any new features and other enhancements.
\n", + "\n", + "Retrieve the information for the currently installed version of ValidMind:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0d35972c", + "metadata": { + "vscode": { + "languageId": "plaintext" } + }, + "outputs": [], + "source": [ + "%pip show validmind" + ] + }, + { + "cell_type": "markdown", + "id": "abcd07ef", + "metadata": {}, + "source": [ + "If the version returned is lower than the version indicated in our [production open-source code](https://github.com/validmind/validmind-library/blob/prod/validmind/__version__.py), restart your notebook and run:\n", + "\n", + "```bash\n", + "%pip install --upgrade validmind\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "5fe70b90", + "metadata": {}, + "source": [ + "You may need to restart your kernel after running the upgrade package for changes to be applied." + ] + }, + { + "cell_type": "markdown", + "id": "copyright-89579e57ed9b466892b9340ec948b137", + "metadata": {}, + "source": [ + "\n", + "\n", + "\n", + "\n", + "***\n", + "\n", + "Copyright © 2023-2026 ValidMind Inc. All rights reserved.
\n", + "Refer to [LICENSE](https://github.com/validmind/validmind-library/blob/main/LICENSE) for details.
\n", + "SPDX-License-Identifier: AGPL-3.0 AND ValidMind Commercial
" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" }, - "nbformat": 4, - "nbformat_minor": 5 + "language_info": { + "name": "python", + "version": "3.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 }