Skip to content

PSPDFKit-labs/openclaw-nutrient-pdf

Repository files navigation

Nutrient PDF Plugin for OpenClaw

Explicit, on-demand Nutrient PDF extraction for OpenClaw — structured Markdown output with tables, headings, and reading order preserved.

Table comparison: flat-text word soup vs Nutrient structured markdown

What this plugin does

It adds an explicit Nutrient extraction surface you can call on demand:

  • nutrient_pdf_extract — an agent tool to extract a specific PDF to structured Markdown
  • openclaw nutrient-pdf extract <file.pdf> — a CLI command for direct extraction from your terminal
  • openclaw nutrient-pdf status — check CLI availability and version

Use it when you want Nutrient's table and heading fidelity on a particular document, requested explicitly by the agent or from the command line.

What it does not do

It does not change OpenClaw's built-in pdf tool. As of OpenClaw 2026.6, the built-in tool does its own extraction through the bundled document-extract plugin (the clawpdf engine), and OpenClaw does not currently expose a hook for an external plugin to substitute its own extractor there. So this plugin is a supplement for explicit extraction, not a drop-in replacement for the default engine.

Note for users on OpenClaw 2026.4 – 2026.5: earlier versions had an agents.defaults.pdfExtraction.engine setting that routed the built-in tool through Nutrient. That configuration was removed in 2026.6 when extraction moved into the bundled document-extract plugin. This plugin no longer references it.

Why Nutrient

Plain-text PDF extractors produce word soup: they score 0.000 on table structure and 0.000 on heading preservation across 200 real documents. That includes clawpdf — the PDFium-based extractor OpenClaw bundles as its default in 2026.6.

When an agent asks "what's in row 3, column 4?" it needs structure, not a flat text dump. Nutrient produces Markdown with proper table rows and columns that agents can look up directly.

Benchmark scores: clawpdf vs Nutrient across 200 documents

Benchmark (200 documents, opendataloader-bench)

Metric clawpdf Nutrient Change
Overall accuracy 0.580 0.889 +53%
Table structure 0.000 0.739 --
Heading fidelity 0.000 0.824 --
Reading order 0.874 0.926 +6%

Scored with NID (reading order), TEDS (table structure), and MHS (heading fidelity), against clawpdf — OpenClaw's bundled 2026.6 default extractor (PDFium WASM, in the document-extract plugin).

Reproducibility: clawpdf 0.3.0 (the version OpenClaw 2026.6 bundles) vs the Nutrient document CLI pdf-to-markdown 1.1.0, on the opendataloader-bench 200-document corpus, measured 2026-06-16. clawpdf is a flat-text extractor and scores 0.000 on both table and heading structure, clustering with other text-only extractors (pypdf, pdf.js) at ~0.58 overall. Earlier versions of this README compared against pdf.js, OpenClaw's pre-2026.6 default; clawpdf scores nearly identically (0.580 vs 0.578 overall), so the gap is unchanged by the switch.

Install

openclaw plugins install @nutrient-sdk/openclaw-nutrient-pdf

Verify the bundled pdf-to-markdown CLI is reachable:

openclaw nutrient-pdf status

Then use the tool from an agent, or extract directly:

openclaw nutrient-pdf extract ./report.pdf

Configuration

Optional settings under plugins.entries.nutrient-pdf.config. These affect only this plugin's tool and CLI:

{
  plugins: {
    entries: {
      "nutrient-pdf": {
        config: {
          command: "pdf-to-markdown",  // path to the CLI binary (auto-resolves by default)
          timeoutMs: 30000,            // extraction timeout per document
        }
      }
    }
  }
}

All processing runs locally. No cloud uploads, no API keys.

Free tier

The pdf-to-markdown CLI includes 1,000 free documents per month. See nutrient.io for higher-volume licensing.

Links

License

MIT -- see LICENSE for details and third-party dependency notice.

About

Nutrient-powered PDF extraction plugin for OpenClaw. 52% better accuracy on 200-document benchmark.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors