Explicit, on-demand Nutrient PDF extraction for OpenClaw — structured Markdown output with tables, headings, and reading order preserved.
It adds an explicit Nutrient extraction surface you can call on demand:
nutrient_pdf_extract— an agent tool to extract a specific PDF to structured Markdownopenclaw nutrient-pdf extract <file.pdf>— a CLI command for direct extraction from your terminalopenclaw nutrient-pdf status— check CLI availability and version
Use it when you want Nutrient's table and heading fidelity on a particular document, requested explicitly by the agent or from the command line.
It does not change OpenClaw's built-in pdf tool. As of OpenClaw 2026.6, the built-in tool does its own extraction through the bundled document-extract plugin (the clawpdf engine), and OpenClaw does not currently expose a hook for an external plugin to substitute its own extractor there. So this plugin is a supplement for explicit extraction, not a drop-in replacement for the default engine.
Note for users on OpenClaw 2026.4 – 2026.5: earlier versions had an
agents.defaults.pdfExtraction.enginesetting that routed the built-in tool through Nutrient. That configuration was removed in 2026.6 when extraction moved into the bundleddocument-extractplugin. This plugin no longer references it.
Plain-text PDF extractors produce word soup: they score 0.000 on table structure and 0.000 on heading preservation across 200 real documents. That includes clawpdf — the PDFium-based extractor OpenClaw bundles as its default in 2026.6.
When an agent asks "what's in row 3, column 4?" it needs structure, not a flat text dump. Nutrient produces Markdown with proper table rows and columns that agents can look up directly.
| Metric | clawpdf | Nutrient | Change |
|---|---|---|---|
| Overall accuracy | 0.580 | 0.889 | +53% |
| Table structure | 0.000 | 0.739 | -- |
| Heading fidelity | 0.000 | 0.824 | -- |
| Reading order | 0.874 | 0.926 | +6% |
Scored with NID (reading order), TEDS (table structure), and MHS (heading fidelity), against clawpdf — OpenClaw's bundled 2026.6 default extractor (PDFium WASM, in the document-extract plugin).
Reproducibility: clawpdf
0.3.0(the version OpenClaw 2026.6 bundles) vs the Nutrient document CLIpdf-to-markdown1.1.0, on the opendataloader-bench 200-document corpus, measured 2026-06-16. clawpdf is a flat-text extractor and scores 0.000 on both table and heading structure, clustering with other text-only extractors (pypdf, pdf.js) at ~0.58 overall. Earlier versions of this README compared against pdf.js, OpenClaw's pre-2026.6 default; clawpdf scores nearly identically (0.580 vs 0.578 overall), so the gap is unchanged by the switch.
openclaw plugins install @nutrient-sdk/openclaw-nutrient-pdfVerify the bundled pdf-to-markdown CLI is reachable:
openclaw nutrient-pdf statusThen use the tool from an agent, or extract directly:
openclaw nutrient-pdf extract ./report.pdfOptional settings under plugins.entries.nutrient-pdf.config. These affect only this plugin's tool and CLI:
{
plugins: {
entries: {
"nutrient-pdf": {
config: {
command: "pdf-to-markdown", // path to the CLI binary (auto-resolves by default)
timeoutMs: 30000, // extraction timeout per document
}
}
}
}
}All processing runs locally. No cloud uploads, no API keys.
The pdf-to-markdown CLI includes 1,000 free documents per month. See nutrient.io for higher-volume licensing.
MIT -- see LICENSE for details and third-party dependency notice.

