From 5f7cc865b3a5147d41109fd1454071f419095700 Mon Sep 17 00:00:00 2001 From: Charlotte Wickham Date: Fri, 29 May 2026 10:51:34 -0700 Subject: [PATCH 1/3] Add strip-html-blank-lines Quarto filter for embedded tables Pointblank and great-tables emit HTML output with blank lines inside the wrapper
. Hugo's Goldmark closes a CommonMark type 6 HTML block at the first blank line, which causes the rest of the table to be parsed as markdown -- wrapping CSS in

tags and turning indented SVG content into

 blocks. The result is a visibly broken
table on the rendered page.

This adds a Quarto extension that collapses blank lines inside raw
HTML blocks before Hugo sees them, wires it up across the ported
pointblank posts via _metadata.yml, and re-renders those posts. A
note in the authoring guide explains when to opt in.

Also fixes a pre-existing YAML strictness issue in lets-workshop-together
(toc: no -> toc: false) that was blocking re-render.
---
 .../strip-html-blank-lines/_extension.yml     |   7 +
 .../strip-html-blank-lines.lua                |  30 +++
 content/blog/_authoring-guide.md              |  18 ++
 content/blog/ported/pointblank/_metadata.yml  |   2 +
 .../pointblank/all-about-actions/index.md     |  84 ++-----
 .../pointblank/intro-pointblank/index.md      |  68 +-----
 .../lets-workshop-together/index.md           |  12 +-
 .../lets-workshop-together/index.qmd          |   2 +-
 .../pointblank/overhauled-user-guide/index.md |  12 +-
 .../pointblank/validation-libs-2025/index.md  | 217 ++++++++----------
 10 files changed, 190 insertions(+), 262 deletions(-)
 create mode 100644 content/_extensions/strip-html-blank-lines/_extension.yml
 create mode 100644 content/_extensions/strip-html-blank-lines/strip-html-blank-lines.lua
 create mode 100644 content/blog/ported/pointblank/_metadata.yml

diff --git a/content/_extensions/strip-html-blank-lines/_extension.yml b/content/_extensions/strip-html-blank-lines/_extension.yml
new file mode 100644
index 000000000..475c1fc5b
--- /dev/null
+++ b/content/_extensions/strip-html-blank-lines/_extension.yml
@@ -0,0 +1,7 @@
+title: Strip HTML Blank Lines
+author: Posit
+version: 1.0.0
+contributes:
+  filters:
+    - path: strip-html-blank-lines.lua
+      at: pre-ast
diff --git a/content/_extensions/strip-html-blank-lines/strip-html-blank-lines.lua b/content/_extensions/strip-html-blank-lines/strip-html-blank-lines.lua
new file mode 100644
index 000000000..1a92c4427
--- /dev/null
+++ b/content/_extensions/strip-html-blank-lines/strip-html-blank-lines.lua
@@ -0,0 +1,30 @@
+-- Collapse blank lines inside raw HTML blocks so Hugo's Goldmark parser keeps
+-- the HTML block contiguous.
+--
+-- Background: Hugo's Goldmark closes a CommonMark "type 6" HTML block (one
+-- opened by 
, , etc.) at the first blank line. Pointblank and +-- great-tables emit tables whose
wrappers contain blank lines, which +-- causes Goldmark to drop out of HTML mode mid-table and re-parse the rest as +-- markdown -- wrapping CSS in

tags and turning indented SVG payloads into +--

 blocks. Stripping the blank lines keeps everything inside one
+-- HTML block.
+
+if not quarto.doc.is_format("hugo-md") and not quarto.doc.is_format("gfm") then
+  return {}
+end
+
+local function collapse_blanks(text)
+  local previous
+  repeat
+    previous = text
+    text = text:gsub("\n[ \t]*\n", "\n")
+  until text == previous
+  return text
+end
+
+function RawBlock(el)
+  if el.format == "html" then
+    el.text = collapse_blanks(el.text)
+    return el
+  end
+end
diff --git a/content/blog/_authoring-guide.md b/content/blog/_authoring-guide.md
index 58f5177d9..76938f9ba 100644
--- a/content/blog/_authoring-guide.md
+++ b/content/blog/_authoring-guide.md
@@ -319,6 +319,24 @@ Quarto sends your `.qmd` through Pandoc, which parses inline HTML and can rewrit
 ```
 ````
 
+#### HTML widgets with blank lines (pointblank, great-tables)
+
+Hugo's Goldmark parser closes a CommonMark "type 6" HTML block (one opened by `
`, `
`, etc.) at the first blank line. Some Python objects — notably pointblank validation reports and great-tables tables — emit their HTML output with blank lines inside the wrapper `
`. The blank line tells Goldmark to stop parsing as HTML and resume parsing as markdown mid-table, which wraps CSS in `

` tags and turns indented SVG content into `

` blocks.
+
+Symptom: a table renders with broken styling, or fragments of raw HTML (``, escaped tags) appear as text on the page.
+
+Fix: opt the post into the `strip-html-blank-lines` Quarto filter, which collapses blank lines inside raw HTML blocks before Hugo sees them.
+
+```yaml
+---
+title: My Post
+filters:
+  - strip-html-blank-lines
+---
+```
+
+You only need this on posts that embed HTML from libraries known to emit blank lines. If a whole subdirectory of posts needs it (e.g. all `pointblank` posts), add the filter once in a `_metadata.yml` next to those posts instead of repeating it in each frontmatter.
+
 #### Linking to other blog posts
 
 Use the **permalink URL** — the `/blog/YYYY-MM-DD_slug/` path you see in the browser:
diff --git a/content/blog/ported/pointblank/_metadata.yml b/content/blog/ported/pointblank/_metadata.yml
new file mode 100644
index 000000000..e835e0d45
--- /dev/null
+++ b/content/blog/ported/pointblank/_metadata.yml
@@ -0,0 +1,2 @@
+filters:
+  - strip-html-blank-lines
diff --git a/content/blog/ported/pointblank/all-about-actions/index.md b/content/blog/ported/pointblank/all-about-actions/index.md
index 6e4c0e1e6..da253f34a 100644
--- a/content/blog/ported/pointblank/all-about-actions/index.md
+++ b/content/blog/ported/pointblank/all-about-actions/index.md
@@ -1,15 +1,17 @@
 ---
 title: Level Up Your Data Validation with `Actions` and `FinalActions`
-description: "Automate responses to bad data with Pointblank's Actions and FinalActions."
+description: Automate responses to bad data with Pointblank's Actions and FinalActions.
 auto-description: true
 people:
   - Rich Iannone
-date: '2025-05-02T00:00:00.000Z'
+date: '2025-05-02'
 ported_from: pointblank
 source: pointblank
 port_status: in-progress
-software: ["pointblank"]
-languages: ["Python"]
+software:
+  - pointblank
+languages:
+  - Python
 topics:
   - Data Wrangling
 tags:
@@ -83,7 +85,6 @@ validation_1
           -webkit-font-smoothing: antialiased;
           -moz-osx-font-smoothing: grayscale;
         }
-
 #pb_tbl thead, tbody, tfoot, tr, td, th { border-style: none; }
  tr { background-color: transparent; }
 #pb_tbl p { margin: 0; padding: 0; }
@@ -125,7 +126,6 @@ validation_1
  #pb_tbl .gt_super { font-size: 65%; }
  #pb_tbl .gt_footnote_marks { font-size: 75%; vertical-align: 0.4em; position: initial; }
  #pb_tbl .gt_asterisk { font-size: 100%; vertical-align: 0; }
- 
 
 
@@ -144,14 +144,12 @@ validation_1 - - - + @@ -190,7 +188,6 @@ validation_1
col_vals_gt()
-
@@ -213,10 +210,7 @@ validation_1 - -
Pointblank Validation
2026-03-13|19:09:28
Polars
2026-05-29|17:44:13
Polars
d 1000
-
In this example: @@ -287,7 +281,6 @@ validation_2 -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } - #pb_tbl thead, tbody, tfoot, tr, td, th { border-style: none; } tr { background-color: transparent; } #pb_tbl p { margin: 0; padding: 0; } @@ -329,7 +322,6 @@ validation_2 #pb_tbl .gt_super { font-size: 65%; } #pb_tbl .gt_footnote_marks { font-size: 75%; vertical-align: 0.4em; position: initial; } #pb_tbl .gt_asterisk { font-size: 100%; vertical-align: 0; } - @@ -348,14 +340,12 @@ validation_2 - - - + @@ -397,7 +387,6 @@ validation_2
col_vals_regex()
- @@ -438,7 +427,6 @@ validation_2
col_vals_gt()
- @@ -461,10 +449,7 @@ validation_2 - -
Pointblank Validation
2026-03-13|19:09:28
PolarsWARNING0.05ERROR0.1CRITICAL0.15
2026-05-29|17:44:13
PolarsWARNING0.05ERROR0.1CRITICAL0.15
player_id [A-Z]{12}\d{3} item_revenue 0.1
-
In this example, we're using a simple function that prints a generic message whenever any threshold @@ -521,7 +506,6 @@ validation_3 -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } - #pb_tbl thead, tbody, tfoot, tr, td, th { border-style: none; } tr { background-color: transparent; } #pb_tbl p { margin: 0; padding: 0; } @@ -563,7 +547,6 @@ validation_3 #pb_tbl .gt_super { font-size: 65%; } #pb_tbl .gt_footnote_marks { font-size: 75%; vertical-align: 0.4em; position: initial; } #pb_tbl .gt_asterisk { font-size: 100%; vertical-align: 0; } - @@ -582,14 +565,12 @@ validation_3 - - - + @@ -628,7 +609,6 @@ validation_3
col_vals_lt()
- @@ -651,10 +631,7 @@ validation_3 - -
Pointblank Validation
2026-03-13|19:09:28
PolarsWARNING1ERROR4CRITICAL10
2026-05-29|17:44:13
PolarsWARNING1ERROR4CRITICAL10
d 3000
- This templating approach is a great way to create context-aware notifications that adapt to the @@ -708,7 +685,7 @@ validation_4 Column: d Validation type: col_vals_gt Severity: critical (level 50) - Time: 2026-03-13 19:09:29.030877+00:00 + Time: 2026-05-29 17:44:13.937852+00:00 Explanation: Exceedance of failed test units where values in `d` should have been > `5000`. @@ -722,7 +699,6 @@ validation_4 -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } - #pb_tbl thead, tbody, tfoot, tr, td, th { border-style: none; } tr { background-color: transparent; } #pb_tbl p { margin: 0; padding: 0; } @@ -764,7 +740,6 @@ validation_4 #pb_tbl .gt_super { font-size: 65%; } #pb_tbl .gt_footnote_marks { font-size: 75%; vertical-align: 0.4em; position: initial; } #pb_tbl .gt_asterisk { font-size: 100%; vertical-align: 0; } - @@ -783,14 +758,12 @@ validation_4 - - - + @@ -829,7 +802,6 @@ validation_4
col_vals_gt()
- @@ -852,10 +824,7 @@ validation_4 - -
Pointblank Validation
2026-03-13|19:09:29
PolarsWARNINGERRORCRITICAL1
2026-05-29|17:44:13
PolarsWARNINGERRORCRITICAL1
d 5000
- The metadata dictionary contains essential fields for a given validation step, including the step @@ -941,7 +910,6 @@ validation_5 -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } - #pb_tbl thead, tbody, tfoot, tr, td, th { border-style: none; } tr { background-color: transparent; } #pb_tbl p { margin: 0; padding: 0; } @@ -983,7 +951,6 @@ validation_5 #pb_tbl .gt_super { font-size: 65%; } #pb_tbl .gt_footnote_marks { font-size: 75%; vertical-align: 0.4em; position: initial; } #pb_tbl .gt_asterisk { font-size: 100%; vertical-align: 0; } - @@ -1002,14 +969,12 @@ validation_5 - - - + @@ -1048,7 +1013,6 @@ validation_5
col_vals_gt()
- @@ -1089,7 +1053,6 @@ validation_5
col_vals_lt()
- @@ -1112,10 +1075,7 @@ validation_5 - -
Pointblank Validation
2026-03-13|19:09:29
Polarssmall_tableWARNING1ERROR5CRITICAL10
2026-05-29|17:44:13
Polarssmall_tableWARNING1ERROR5CRITICAL10
a 1 d 10000
- The [`get_validation_summary()`](https://posit-dev.github.io/pointblank/reference/get_validation_summary.html) @@ -1192,7 +1152,6 @@ validation_6 -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } - #pb_tbl thead, tbody, tfoot, tr, td, th { border-style: none; } tr { background-color: transparent; } #pb_tbl p { margin: 0; padding: 0; } @@ -1234,7 +1193,6 @@ validation_6 #pb_tbl .gt_super { font-size: 65%; } #pb_tbl .gt_footnote_marks { font-size: 75%; vertical-align: 0.4em; position: initial; } #pb_tbl .gt_asterisk { font-size: 100%; vertical-align: 0; } - @@ -1253,14 +1211,12 @@ validation_6 - - - + @@ -1299,7 +1255,6 @@ validation_6
col_vals_gt()
- @@ -1340,7 +1295,6 @@ validation_6
col_vals_lt()
- @@ -1363,10 +1317,7 @@ validation_6 - -
Pointblank Validation
2026-03-13|19:09:29
PolarsWARNING1ERROR5CRITICAL10
2026-05-29|17:44:14
PolarsWARNING1ERROR5CRITICAL10
a 5 d 1000
- This approach allows you to log individual step failures during the validation process using @@ -1460,7 +1411,6 @@ validation_7 -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } - #pb_tbl thead, tbody, tfoot, tr, td, th { border-style: none; } tr { background-color: transparent; } #pb_tbl p { margin: 0; padding: 0; } @@ -1502,7 +1452,6 @@ validation_7 #pb_tbl .gt_super { font-size: 65%; } #pb_tbl .gt_footnote_marks { font-size: 75%; vertical-align: 0.4em; position: initial; } #pb_tbl .gt_asterisk { font-size: 100%; vertical-align: 0; } - @@ -1521,14 +1470,12 @@ validation_7 - - - + @@ -1678,10 +1625,7 @@ validation_7 - -
Pointblank Validation
2026-03-13|19:09:29
Polarsgame_revenueWARNING0.05ERROR0.1CRITICAL0.15
2026-05-29|17:44:14
Polarsgame_revenueWARNING0.05ERROR0.1CRITICAL0.15
- ## Wrapping Up: from Passive Validation to Active Data Quality Management diff --git a/content/blog/ported/pointblank/intro-pointblank/index.md b/content/blog/ported/pointblank/intro-pointblank/index.md index 15dae42ef..889fc6a48 100644 --- a/content/blog/ported/pointblank/intro-pointblank/index.md +++ b/content/blog/ported/pointblank/intro-pointblank/index.md @@ -1,6 +1,8 @@ --- title: Introducing Pointblank -description: "Get started with Pointblank for data validation using Polars, Pandas, or DuckDB." +description: >- + Get started with Pointblank for data validation using Polars, Pandas, or + DuckDB. auto-description: true people: - Rich Iannone @@ -114,7 +116,6 @@ validation_1 -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } - #pb_tbl thead, tbody, tfoot, tr, td, th { border-style: none; } tr { background-color: transparent; } #pb_tbl p { margin: 0; padding: 0; } @@ -156,7 +157,6 @@ validation_1 #pb_tbl .gt_super { font-size: 65%; } #pb_tbl .gt_footnote_marks { font-size: 75%; vertical-align: 0.4em; position: initial; } #pb_tbl .gt_asterisk { font-size: 100%; vertical-align: 0; } - @@ -175,9 +175,7 @@ validation_1 - - @@ -221,7 +219,6 @@ validation_1
col_vals_lt()
- @@ -262,7 +259,6 @@ validation_1
col_vals_between()
- @@ -303,7 +299,6 @@ validation_1
col_vals_in_set()
- @@ -347,7 +342,6 @@ validation_1
col_vals_regex()
- @@ -371,15 +365,11 @@ validation_1 - - + - -
Pointblank Validation
a 10 d [0, 5000] f low, mid, high b ^[0-9]-[a-z]{3}-[0-9]{3}$
2026-04-02 19:07:32 UTC< 1 s2026-04-02 19:07:32 UTC
2026-05-29 17:43:26 UTC< 1 s2026-05-29 17:43:26 UTC
- There's a lot to take in here so let's break down the code first! Note these three key pieces: @@ -462,7 +452,6 @@ validation_2 -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } - #pb_tbl thead, tbody, tfoot, tr, td, th { border-style: none; } tr { background-color: transparent; } #pb_tbl p { margin: 0; padding: 0; } @@ -504,7 +493,6 @@ validation_2 #pb_tbl .gt_super { font-size: 65%; } #pb_tbl .gt_footnote_marks { font-size: 75%; vertical-align: 0.4em; position: initial; } #pb_tbl .gt_asterisk { font-size: 100%; vertical-align: 0; } - @@ -523,9 +511,7 @@ validation_2 - - @@ -572,7 +558,6 @@ validation_2
col_vals_regex()
- @@ -613,7 +598,6 @@ validation_2
col_vals_gt()
- @@ -654,7 +638,6 @@ validation_2
col_vals_ge()
- @@ -695,7 +678,6 @@ validation_2
col_vals_in_set()
- @@ -736,7 +718,6 @@ validation_2
col_vals_in_set()
- @@ -779,7 +760,6 @@ validation_2
col_vals_not_in_set()
- @@ -820,7 +800,6 @@ validation_2
col_vals_between()
- @@ -867,7 +846,6 @@ validation_2
rows_distinct()
- @@ -913,7 +891,6 @@ validation_2
row_count_match()
- @@ -955,7 +932,6 @@ validation_2
col_exists()
- @@ -979,22 +955,16 @@ validation_2 - - + - - - -
Pointblank Validation
player_id ^[A-Z]{12}[0-9]{3}$ session_duration 5 item_revenue 0.02 item_type iap, ad acquisition google, facebook, organic, crosspromo, other_campaign country Mongolia, Germany session_duration [10, 50] player_id, session_id, time 2000 start_day
2026-04-02 19:07:32 UTC< 1 s2026-04-02 19:07:32 UTC
2026-05-29 17:43:26 UTC< 1 s2026-05-29 17:43:26 UTC

Notes

Step 7 (pre_applied) Precondition applied: table dimensions [2,000 rows, 11 columns][1 row, 1 column].

- This data validation makes use of the many @@ -1036,7 +1006,6 @@ validation_2.get_step_report(i=2) -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } - #pb_preview_tbl thead, tbody, tfoot, tr, td, th { border-style: none; } tr { background-color: transparent; } #pb_preview_tbl p { margin: 0; padding: 0; } @@ -1078,7 +1047,6 @@ validation_2.get_step_report(i=2) #pb_preview_tbl .gt_super { font-size: 65%; } #pb_preview_tbl .gt_footnote_marks { font-size: 75%; vertical-align: 0.4em; position: initial; } #pb_preview_tbl .gt_asterisk { font-size: 100%; vertical-align: 0; } - @@ -1095,9 +1063,7 @@ validation_2.get_step_report(i=2) - - @@ -1258,10 +1224,7 @@ validation_2.get_step_report(i=2) - -
Report for Validation Step 2
ASSERTION session_duration > 5
18 / 2000 TEST UNIT FAILURES IN COLUMN 8
EXTRACT OF FIRST 10 ROWS (WITH TEST UNIT FAILURES IN RED):
Philippines
- This report provides the 18 rows where the failure occurred. If you scroll the table to the right @@ -1317,7 +1280,6 @@ validation_3 -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } - #pb_tbl thead, tbody, tfoot, tr, td, th { border-style: none; } tr { background-color: transparent; } #pb_tbl p { margin: 0; padding: 0; } @@ -1359,7 +1321,6 @@ validation_3 #pb_tbl .gt_super { font-size: 65%; } #pb_tbl .gt_footnote_marks { font-size: 75%; vertical-align: 0.4em; position: initial; } #pb_tbl .gt_asterisk { font-size: 100%; vertical-align: 0; } - @@ -1378,9 +1339,7 @@ validation_3 - - @@ -1429,7 +1388,6 @@ validation_3
col_schema_match()
- @@ -1453,12 +1411,9 @@ validation_3 - - + - - - -
Pointblank Validation
SCHEMA
2026-04-02 19:07:33 UTC< 1 s2026-04-02 19:07:33 UTC
2026-05-29 17:43:28 UTC< 1 s2026-05-29 17:43:28 UTC

Notes @@ -1647,11 +1602,8 @@ tr { background-color: transparent; }
- This step fails, but the validation report table doesn't tell us how (or where). Using @@ -1671,7 +1623,6 @@ validation_3.get_step_report(i=1) -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } - #pb_step_tbl thead, tbody, tfoot, tr, td, th { border-style: none; } tr { background-color: transparent; } #pb_step_tbl p { margin: 0; padding: 0; } @@ -1713,7 +1664,6 @@ validation_3.get_step_report(i=1) #pb_step_tbl .gt_super { font-size: 65%; } #pb_step_tbl .gt_footnote_marks { font-size: 75%; vertical-align: 0.4em; position: initial; } #pb_step_tbl .gt_asterisk { font-size: 100%; vertical-align: 0; } - @@ -1726,9 +1676,7 @@ validation_3.get_step_report(i=1) - - @@ -1834,15 +1782,11 @@ validation_3.get_step_report(i=1) - - -
Report for Validation Step 1
COLUMN SCHEMA MATCH
COMPLETE
IN ORDER
COLUMN ≠ column
DTYPE ≠ dtype
float ≠ float64
Supplied Column Schema:
[('date_time', 'timestamp(6)'), ('dates', 'date'), ('a', 'int64'), ('b',), ('c',), ('d', 'float64'), ('e', ['bool', 'boolean']), ('f', 'str')]
- The step report here shows the target table's schema on the left side and the expectation of the diff --git a/content/blog/ported/pointblank/lets-workshop-together/index.md b/content/blog/ported/pointblank/lets-workshop-together/index.md index 4c2467fa6..721db10e4 100644 --- a/content/blog/ported/pointblank/lets-workshop-together/index.md +++ b/content/blog/ported/pointblank/lets-workshop-together/index.md @@ -1,16 +1,18 @@ --- title: 'C''mon C''mon: Let''s Do a Pointblank Workshop!' -description: "Want a free Pointblank workshop for your data team? Here's how to set one up." +description: Want a free Pointblank workshop for your data team? Here's how to set one up. auto-description: true people: - Rich Iannone -date: '2025-06-03T00:00:00.000Z' -toc: no +date: '2025-06-03' +toc: false ported_from: pointblank source: pointblank port_status: in-progress -software: ["pointblank"] -languages: ["Python"] +software: + - pointblank +languages: + - Python topics: - Data Wrangling tags: diff --git a/content/blog/ported/pointblank/lets-workshop-together/index.qmd b/content/blog/ported/pointblank/lets-workshop-together/index.qmd index 098a060b8..8623e4431 100644 --- a/content/blog/ported/pointblank/lets-workshop-together/index.qmd +++ b/content/blog/ported/pointblank/lets-workshop-together/index.qmd @@ -5,7 +5,7 @@ auto-description: true people: - Rich Iannone date: '2025-06-03' -toc: no +toc: false ported_from: pointblank source: pointblank port_status: in-progress diff --git a/content/blog/ported/pointblank/overhauled-user-guide/index.md b/content/blog/ported/pointblank/overhauled-user-guide/index.md index 7f38afaf5..10b5d3a5b 100644 --- a/content/blog/ported/pointblank/overhauled-user-guide/index.md +++ b/content/blog/ported/pointblank/overhauled-user-guide/index.md @@ -1,16 +1,20 @@ --- title: Overhauling Pointblank's User Guide -description: "Pointblank's revamped user guide: spiral learning, clearer examples, and full API coverage." +description: >- + Pointblank's revamped user guide: spiral learning, clearer examples, and full + API coverage. auto-description: true people: - Rich Iannone - Michael Chow -date: '2025-05-20T00:00:00.000Z' +date: '2025-05-20' ported_from: pointblank source: pointblank port_status: in-progress -software: ["pointblank"] -languages: ["Python"] +software: + - pointblank +languages: + - Python topics: - Data Wrangling tags: diff --git a/content/blog/ported/pointblank/validation-libs-2025/index.md b/content/blog/ported/pointblank/validation-libs-2025/index.md index 7876abc31..b54acb441 100644 --- a/content/blog/ported/pointblank/validation-libs-2025/index.md +++ b/content/blog/ported/pointblank/validation-libs-2025/index.md @@ -1,6 +1,8 @@ --- title: Data Validation Libraries for Polars (2025 Edition) -description: "Choosing a data validation library for Polars? We compare Pandera, Patito, Pointblank, Validoopsie, and Dataframely." +description: >- + Choosing a data validation library for Polars? We compare Pandera, Patito, + Pointblank, Validoopsie, and Dataframely. auto-description: true people: - Rich Iannone @@ -52,56 +54,54 @@ the inside baseball. Here are the unique strengths for each library: -
+
| Library | ⭐ | Best Features | @@ -116,56 +116,54 @@ Here are the unique strengths for each library: Based on these strengths, here are my recommendations for which libraries to use according to use case: -
+
| Use Case | Best Libraries | Description | @@ -421,7 +419,6 @@ validation -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } - #pb_tbl thead, tbody, tfoot, tr, td, th { border-style: none; } tr { background-color: transparent; } #pb_tbl p { margin: 0; padding: 0; } @@ -463,7 +460,6 @@ validation #pb_tbl .gt_super { font-size: 65%; } #pb_tbl .gt_footnote_marks { font-size: 75%; vertical-align: 0.4em; position: initial; } #pb_tbl .gt_asterisk { font-size: 100%; vertical-align: 0; } - @@ -482,9 +478,7 @@ validation - - @@ -528,7 +522,6 @@ validation
col_vals_gt()
- @@ -569,7 +562,6 @@ validation
col_vals_between()
- @@ -613,7 +605,6 @@ validation
col_vals_regex()
- @@ -654,7 +645,6 @@ validation
col_vals_between()
- @@ -700,7 +690,6 @@ validation
col_schema_match()
- @@ -724,12 +713,9 @@ validation - - + - - - -
Pointblank Validation
user_id 0 age [18, 80] email ^[^@]+@[^@]+\.[^@]+$ score [0, 100] SCHEMA
2026-04-02 19:07:38 UTC< 1 s2026-04-02 19:07:38 UTC
2026-05-29 17:44:28 UTC< 1 s2026-05-29 17:44:28 UTC

Notes @@ -878,11 +864,8 @@ tr { background-color: transparent; }
-
This example demonstrates Pointblank's chainable validation approach where each validation step is @@ -984,18 +967,14 @@ validation.validate() print("Validation results:", validation.results) ``` -
2026-04-02 12:07:38.191 | INFO     | validoopsie.validate:validate:414 - Passed validation: {'validation': 'ColumnValuesToBeBetween', 'impact': 'high', 'timestamp': '2026-04-02T12:07:38.179694-07:00', 'column': 'user_id', 'result': {'status': 'Success', 'threshold_pass': True, 'message': 'All items passed the validation.', 'frame_row_number': 5, 'threshold': 0.0}}
-
-2026-04-02 12:07:38.191 | ERROR    | validoopsie.validate:validate:406 - Failed validation: ColumnValuesToBeBetween_age - The column 'age' has values that are not between 18 and 80.
-
-2026-04-02 12:07:38.192 | WARNING  | validoopsie.validate:validate:408 - Failed validation: PatternMatch_email - The column 'email' has entries that do not match the pattern '^[^@]+@[^@]+\.[^@]+$'.
-
-2026-04-02 12:07:38.192 | INFO     | validoopsie.validate:validate:414 - Passed validation: {'validation': 'ColumnValuesToBeBetween', 'impact': 'medium', 'timestamp': '2026-04-02T12:07:38.182269-07:00', 'column': 'score', 'result': {'status': 'Success', 'threshold_pass': True, 'message': 'All items passed the validation.', 'frame_row_number': 5, 'threshold': 0.0}}
-
-2026-04-02 12:07:38.192 | INFO     | validoopsie.validate:validate:414 - Passed validation: {'validation': 'TypeCheck', 'impact': 'high', 'timestamp': '2026-04-02T12:07:38.182676-07:00', 'column': 'DataTypeColumnValidation', 'result': {'status': 'Success', 'threshold_pass': True, 'message': 'All items passed the validation.', 'frame_row_number': 4, 'threshold': 0.0}}
+
2026-05-29 10:44:28.536 | INFO     | validoopsie.validate:validate:414 - Passed validation: {'validation': 'ColumnValuesToBeBetween', 'impact': 'high', 'timestamp': '2026-05-29T10:44:28.495055-07:00', 'column': 'user_id', 'result': {'status': 'Success', 'threshold_pass': True, 'message': 'All items passed the validation.', 'frame_row_number': 5, 'threshold': 0.0}}
+2026-05-29 10:44:28.537 | ERROR    | validoopsie.validate:validate:406 - Failed validation: ColumnValuesToBeBetween_age - The column 'age' has values that are not between 18 and 80.
+2026-05-29 10:44:28.537 | WARNING  | validoopsie.validate:validate:408 - Failed validation: PatternMatch_email - The column 'email' has entries that do not match the pattern '^[^@]+@[^@]+\.[^@]+$'.
+2026-05-29 10:44:28.537 | INFO     | validoopsie.validate:validate:414 - Passed validation: {'validation': 'ColumnValuesToBeBetween', 'impact': 'medium', 'timestamp': '2026-05-29T10:44:28.499332-07:00', 'column': 'score', 'result': {'status': 'Success', 'threshold_pass': True, 'message': 'All items passed the validation.', 'frame_row_number': 5, 'threshold': 0.0}}
+2026-05-29 10:44:28.537 | INFO     | validoopsie.validate:validate:414 - Passed validation: {'validation': 'TypeCheck', 'impact': 'high', 'timestamp': '2026-05-29T10:44:28.499875-07:00', 'column': 'DataTypeColumnValidation', 'result': {'status': 'Success', 'threshold_pass': True, 'message': 'All items passed the validation.', 'frame_row_number': 4, 'threshold': 0.0}}
 
- Validation results: {'Summary': {'passed': False, 'validations': ['ColumnValuesToBeBetween_user_id', 'ColumnValuesToBeBetween_age', 'PatternMatch_email', 'ColumnValuesToBeBetween_score', 'TypeCheck_DataTypeColumnValidation'], 'failed_validation': ['ColumnValuesToBeBetween_age', 'PatternMatch_email']}, 'ColumnValuesToBeBetween_user_id': {'validation': 'ColumnValuesToBeBetween', 'impact': 'high', 'timestamp': '2026-04-02T12:07:38.179694-07:00', 'column': 'user_id', 'result': {'status': 'Success', 'threshold_pass': True, 'message': 'All items passed the validation.', 'frame_row_number': 5, 'threshold': 0.0}}, 'ColumnValuesToBeBetween_age': {'validation': 'ColumnValuesToBeBetween', 'impact': 'medium', 'timestamp': '2026-04-02T12:07:38.181019-07:00', 'column': 'age', 'result': {'status': 'Fail', 'threshold_pass': False, 'message': "The column 'age' has values that are not between 18 and 80.", 'failing_items': [95], 'failed_number': 1, 'frame_row_number': 5, 'threshold': 0.1, 'failed_percentage': 0.2}}, 'PatternMatch_email': {'validation': 'PatternMatch', 'impact': 'low', 'timestamp': '2026-04-02T12:07:38.181596-07:00', 'column': 'email', 'result': {'status': 'Fail', 'threshold_pass': False, 'message': "The column 'email' has entries that do not match the pattern '^[^@]+@[^@]+\\.[^@]+$'.", 'failing_items': ['invalid-email'], 'failed_number': 1, 'frame_row_number': 5, 'threshold': 0.05, 'failed_percentage': 0.2}}, 'ColumnValuesToBeBetween_score': {'validation': 'ColumnValuesToBeBetween', 'impact': 'medium', 'timestamp': '2026-04-02T12:07:38.182269-07:00', 'column': 'score', 'result': {'status': 'Success', 'threshold_pass': True, 'message': 'All items passed the validation.', 'frame_row_number': 5, 'threshold': 0.0}}, 'TypeCheck_DataTypeColumnValidation': {'validation': 'TypeCheck', 'impact': 'high', 'timestamp': '2026-04-02T12:07:38.182676-07:00', 'column': 'DataTypeColumnValidation', 'result': {'status': 'Success', 'threshold_pass': True, 'message': 'All items passed the validation.', 'frame_row_number': 4, 'threshold': 0.0}}} + Validation results: {'Summary': {'passed': False, 'validations': ['ColumnValuesToBeBetween_user_id', 'ColumnValuesToBeBetween_age', 'PatternMatch_email', 'ColumnValuesToBeBetween_score', 'TypeCheck_DataTypeColumnValidation'], 'failed_validation': ['ColumnValuesToBeBetween_age', 'PatternMatch_email']}, 'ColumnValuesToBeBetween_user_id': {'validation': 'ColumnValuesToBeBetween', 'impact': 'high', 'timestamp': '2026-05-29T10:44:28.495055-07:00', 'column': 'user_id', 'result': {'status': 'Success', 'threshold_pass': True, 'message': 'All items passed the validation.', 'frame_row_number': 5, 'threshold': 0.0}}, 'ColumnValuesToBeBetween_age': {'validation': 'ColumnValuesToBeBetween', 'impact': 'medium', 'timestamp': '2026-05-29T10:44:28.497442-07:00', 'column': 'age', 'result': {'status': 'Fail', 'threshold_pass': False, 'message': "The column 'age' has values that are not between 18 and 80.", 'failing_items': [95], 'failed_number': 1, 'frame_row_number': 5, 'threshold': 0.1, 'failed_percentage': 0.2}}, 'PatternMatch_email': {'validation': 'PatternMatch', 'impact': 'low', 'timestamp': '2026-05-29T10:44:28.498322-07:00', 'column': 'email', 'result': {'status': 'Fail', 'threshold_pass': False, 'message': "The column 'email' has entries that do not match the pattern '^[^@]+@[^@]+\\.[^@]+$'.", 'failing_items': ['invalid-email'], 'failed_number': 1, 'frame_row_number': 5, 'threshold': 0.05, 'failed_percentage': 0.2}}, 'ColumnValuesToBeBetween_score': {'validation': 'ColumnValuesToBeBetween', 'impact': 'medium', 'timestamp': '2026-05-29T10:44:28.499332-07:00', 'column': 'score', 'result': {'status': 'Success', 'threshold_pass': True, 'message': 'All items passed the validation.', 'frame_row_number': 5, 'threshold': 0.0}}, 'TypeCheck_DataTypeColumnValidation': {'validation': 'TypeCheck', 'impact': 'high', 'timestamp': '2026-05-29T10:44:28.499875-07:00', 'column': 'DataTypeColumnValidation', 'result': {'status': 'Success', 'threshold_pass': True, 'message': 'All items passed the validation.', 'frame_row_number': 4, 'threshold': 0.0}}} This example showcases Validoopsie's key differentiators: modular validation categories (`ValuesValidation`, `StringValidation`, `TypeValidation`) combined with *impact levels* that @@ -1036,11 +1015,9 @@ validation = ( validation.validate() ``` -
2026-04-02 12:07:38.199 | INFO     | validoopsie.validate:validate:414 - Passed validation: {'validation': 'ColumnNotBeNull', 'impact': 'high', 'timestamp': '2026-04-02T12:07:38.197433-07:00', 'column': 'user_id', 'result': {'status': 'Success', 'threshold_pass': True, 'message': 'All items passed the validation.', 'frame_row_number': 5, 'threshold': 0.0}}
-
-2026-04-02 12:07:38.199 | ERROR    | validoopsie.validate:validate:406 - Failed validation: PatternMatch_email - The column 'email' has entries that do not match the pattern '^[^@]+@[^@]+\.[^@]+$'.
-
-2026-04-02 12:07:38.199 | INFO     | validoopsie.validate:validate:414 - Passed validation: {'validation': 'ColumnValuesToBeBetween', 'impact': 'low', 'timestamp': '2026-04-02T12:07:38.198631-07:00', 'column': 'score', 'result': {'status': 'Success', 'threshold_pass': True, 'message': "The column 'score' has values that are not between 90 and 100.", 'failing_items': [78.3, 85.5, 88.7], 'failed_number': 3, 'frame_row_number': 5, 'threshold': 0.8, 'failed_percentage': 0.6}}
+
2026-05-29 10:44:28.547 | INFO     | validoopsie.validate:validate:414 - Passed validation: {'validation': 'ColumnNotBeNull', 'impact': 'high', 'timestamp': '2026-05-29T10:44:28.545540-07:00', 'column': 'user_id', 'result': {'status': 'Success', 'threshold_pass': True, 'message': 'All items passed the validation.', 'frame_row_number': 5, 'threshold': 0.0}}
+2026-05-29 10:44:28.548 | ERROR    | validoopsie.validate:validate:406 - Failed validation: PatternMatch_email - The column 'email' has entries that do not match the pattern '^[^@]+@[^@]+\.[^@]+$'.
+2026-05-29 10:44:28.548 | INFO     | validoopsie.validate:validate:414 - Passed validation: {'validation': 'ColumnValuesToBeBetween', 'impact': 'low', 'timestamp': '2026-05-29T10:44:28.547029-07:00', 'column': 'score', 'result': {'status': 'Success', 'threshold_pass': True, 'message': "The column 'score' has values that are not between 90 and 100.", 'failing_items': [78.3, 85.5, 88.7], 'failed_number': 3, 'frame_row_number': 5, 'threshold': 0.8, 'failed_percentage': 0.6}}
 
Validoopsie strikes a unique balance between operational flexibility and production reliability, From 65fe0e14b74f79620bfc6fb196445f1e0733bf1a Mon Sep 17 00:00:00 2001 From: Charlotte Wickham Date: Fri, 29 May 2026 11:06:14 -0700 Subject: [PATCH 2/3] Apply filter to pointblank-intro (great-tables dir) This post showcases pointblank validation tables (with the SVG-icon-in-cell content that triggers the visible
/escaped-HTML breakage), so it
needs the strip-html-blank-lines filter even though it lives under the
great-tables directory. Other great-tables posts only exhibit the subtle
CSS-paragraph variant of the bug, which doesn't visibly degrade table
rendering, so they don't need the filter.
---
 .../great-tables/pointblank-intro/index.md    | 50 ++++++-------------
 .../great-tables/pointblank-intro/index.qmd   |  2 +
 2 files changed, 17 insertions(+), 35 deletions(-)

diff --git a/content/blog/ported/great-tables/pointblank-intro/index.md b/content/blog/ported/great-tables/pointblank-intro/index.md
index 75532897b..768c6cfe4 100644
--- a/content/blog/ported/great-tables/pointblank-intro/index.md
+++ b/content/blog/ported/great-tables/pointblank-intro/index.md
@@ -1,19 +1,25 @@
 ---
 title: How We Used Great Tables to Supercharge Reporting in Pointblank
-description: "See how Great Tables powers Pointblank's beautiful, shareable validation reports."
+description: >-
+  See how Great Tables powers Pointblank's beautiful, shareable validation
+  reports.
 auto-description: true
 people:
   - Rich Iannone
-date: '2025-02-11T00:00:00.000Z'
+date: '2025-02-11'
 ported_from: great_tables
 source: great_tables
 port_status: in-progress
-software: ["great-tables"]
-languages: ["Python"]
+software:
+  - great-tables
+languages:
+  - Python
 topics:
   - Visualization
 tags:
   - Great Tables
+filters:
+  - strip-html-blank-lines
 ---
 
 
@@ -59,13 +65,13 @@ validation
 
 
 
-    /Users/charlottewickham/Documents/posit/open-source-website/content/blog/great-tables/pointblank-intro/.venv/lib/python3.13/site-packages/pointblank/column.py:990: SyntaxWarning: invalid escape sequence '\d'
+    /Users/charlottewickham/Documents/posit/open-source-website/.claude/worktrees/filter-ported-pointblank/content/blog/ported/great-tables/pointblank-intro/.venv/lib/python3.13/site-packages/pointblank/column.py:990: SyntaxWarning: invalid escape sequence '\d'
       """
-    /Users/charlottewickham/Documents/posit/open-source-website/content/blog/great-tables/pointblank-intro/.venv/lib/python3.13/site-packages/pointblank/thresholds.py:295: SyntaxWarning: invalid escape sequence '\d'
+    /Users/charlottewickham/Documents/posit/open-source-website/.claude/worktrees/filter-ported-pointblank/content/blog/ported/great-tables/pointblank-intro/.venv/lib/python3.13/site-packages/pointblank/thresholds.py:295: SyntaxWarning: invalid escape sequence '\d'
       """
-    /Users/charlottewickham/Documents/posit/open-source-website/content/blog/great-tables/pointblank-intro/.venv/lib/python3.13/site-packages/pointblank/validate.py:112: SyntaxWarning: invalid escape sequence '\d'
+    /Users/charlottewickham/Documents/posit/open-source-website/.claude/worktrees/filter-ported-pointblank/content/blog/ported/great-tables/pointblank-intro/.venv/lib/python3.13/site-packages/pointblank/validate.py:112: SyntaxWarning: invalid escape sequence '\d'
       """Access step-level metadata when authoring custom actions.
-    /Users/charlottewickham/Documents/posit/open-source-website/content/blog/great-tables/pointblank-intro/.venv/lib/python3.13/site-packages/pointblank/validate.py:8866: SyntaxWarning: invalid escape sequence '\d'
+    /Users/charlottewickham/Documents/posit/open-source-website/.claude/worktrees/filter-ported-pointblank/content/blog/ported/great-tables/pointblank-intro/.venv/lib/python3.13/site-packages/pointblank/validate.py:8866: SyntaxWarning: invalid escape sequence '\d'
       """
 
 
@@ -77,7 +83,6 @@ validation -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } - #pb_tbl thead, tbody, tfoot, tr, td, th { border-style: none; } tr { background-color: transparent; } #pb_tbl p { margin: 0; padding: 0; } @@ -116,7 +121,6 @@ validation #pb_tbl .gt_super { font-size: 65%; } #pb_tbl .gt_footnote_marks { font-size: 75%; vertical-align: 0.4em; position: initial; } #pb_tbl .gt_asterisk { font-size: 100%; vertical-align: 0; } - @@ -135,9 +139,7 @@ validation - - @@ -181,7 +183,6 @@ validation
col_vals_gt()
- @@ -222,7 +223,6 @@ validation
col_vals_le()
- @@ -264,7 +264,6 @@ validation
col_exists()
- @@ -306,7 +305,6 @@ validation
col_exists()
- @@ -330,15 +328,11 @@ validation - - + - -
Pointblank Validation
d 1000 c 5 date date_time
2026-03-13 19:07:32 UTC< 1 s2026-03-13 19:07:32 UTC
2026-05-29 18:04:56 UTC< 1 s2026-05-29 18:04:56 UTC
-
The first validation step (`cols_val_gt()`) checks the `d` column in the data, to ensure each value is greater than `1000`. Notice that the red bar on the left indicates it failed, and the `FAIL` column says it has 6 failing values out of 13 `UNITS`. @@ -394,7 +388,6 @@ validation.get_step_report(i=1) -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } - #pb_preview_tbl thead, tbody, tfoot, tr, td, th { border-style: none; } tr { background-color: transparent; } #pb_preview_tbl p { margin: 0; padding: 0; } @@ -433,7 +426,6 @@ validation.get_step_report(i=1) #pb_preview_tbl .gt_super { font-size: 65%; } #pb_preview_tbl .gt_footnote_marks { font-size: 75%; vertical-align: 0.4em; position: initial; } #pb_preview_tbl .gt_asterisk { font-size: 100%; vertical-align: 0; } - @@ -447,9 +439,7 @@ validation.get_step_report(i=1) - - @@ -533,10 +523,7 @@ validation.get_step_report(i=1) - -
Report for Validation Step 1
ASSERTION d > 1000
6 / 13 TEST UNIT FAILURES IN COLUMN 6
EXTRACT OF ALL 6 ROWS WITH TEST UNIT FAILURES IN RED:
low
-
The use of a table for reporting is ideal here! The main features of this step report table include: @@ -568,7 +555,6 @@ pb.preview(pb.load_dataset(dataset="game_revenue", tbl_type="duckdb")) -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } - #pb_preview_tbl thead, tbody, tfoot, tr, td, th { border-style: none; } tr { background-color: transparent; } #pb_preview_tbl p { margin: 0; padding: 0; } @@ -607,7 +593,6 @@ pb.preview(pb.load_dataset(dataset="game_revenue", tbl_type="duckdb")) #pb_preview_tbl .gt_super { font-size: 65%; } #pb_preview_tbl .gt_footnote_marks { font-size: 75%; vertical-align: 0.4em; position: initial; } #pb_preview_tbl .gt_asterisk { font-size: 100%; vertical-align: 0; } - @@ -624,9 +609,7 @@ pb.preview(pb.load_dataset(dataset="game_revenue", tbl_type="duckdb")) - - @@ -787,10 +770,7 @@ pb.preview(pb.load_dataset(dataset="game_revenue", tbl_type="duckdb")) - -
DuckDBRows2,000Columns11
United States
-
Notice that the table displays only 10 rows by default, 5 from the top and 5 from the bottom. The grey text on the left of the table indicates the row number, and a blue line helps demarcate the top and bottom rows. diff --git a/content/blog/ported/great-tables/pointblank-intro/index.qmd b/content/blog/ported/great-tables/pointblank-intro/index.qmd index 482863639..d67111eac 100644 --- a/content/blog/ported/great-tables/pointblank-intro/index.qmd +++ b/content/blog/ported/great-tables/pointblank-intro/index.qmd @@ -14,6 +14,8 @@ topics: - Visualization tags: - Great Tables +filters: + - strip-html-blank-lines --- The Great Tables package allows you to make tables, and they're really great when part of a report, a book, or a web page. The API is meant to be easy to work with so DataFrames could be made into publication-quality tables without a lot of hassle. And having nice-looking tables in the mix elevates the quality of the medium you're working in. From a3c2031b72ff180d02323b9bf804cce06d6eaaf5 Mon Sep 17 00:00:00 2001 From: Charlotte Wickham Date: Fri, 29 May 2026 11:17:07 -0700 Subject: [PATCH 3/3] Hide pointblank SyntaxWarning output in pointblank-intro The installed pointblank package has three docstrings that contain \d without an r-prefix, so importing pointblank emits SyntaxWarning at parse time. Those warnings were leaking into the rendered .md as visible output above the validation table. Adding warning: false on the first import chunk suppresses them. --- .../ported/great-tables/pointblank-intro/index.md | 12 +----------- .../ported/great-tables/pointblank-intro/index.qmd | 1 + 2 files changed, 2 insertions(+), 11 deletions(-) diff --git a/content/blog/ported/great-tables/pointblank-intro/index.md b/content/blog/ported/great-tables/pointblank-intro/index.md index 768c6cfe4..5ca37c140 100644 --- a/content/blog/ported/great-tables/pointblank-intro/index.md +++ b/content/blog/ported/great-tables/pointblank-intro/index.md @@ -64,16 +64,6 @@ validation ``` - - /Users/charlottewickham/Documents/posit/open-source-website/.claude/worktrees/filter-ported-pointblank/content/blog/ported/great-tables/pointblank-intro/.venv/lib/python3.13/site-packages/pointblank/column.py:990: SyntaxWarning: invalid escape sequence '\d' - """ - /Users/charlottewickham/Documents/posit/open-source-website/.claude/worktrees/filter-ported-pointblank/content/blog/ported/great-tables/pointblank-intro/.venv/lib/python3.13/site-packages/pointblank/thresholds.py:295: SyntaxWarning: invalid escape sequence '\d' - """ - /Users/charlottewickham/Documents/posit/open-source-website/.claude/worktrees/filter-ported-pointblank/content/blog/ported/great-tables/pointblank-intro/.venv/lib/python3.13/site-packages/pointblank/validate.py:112: SyntaxWarning: invalid escape sequence '\d' - """Access step-level metadata when authoring custom actions. - /Users/charlottewickham/Documents/posit/open-source-website/.claude/worktrees/filter-ported-pointblank/content/blog/ported/great-tables/pointblank-intro/.venv/lib/python3.13/site-packages/pointblank/validate.py:8866: SyntaxWarning: invalid escape sequence '\d' - """ -