Fast document processing powered by Rust. One API. Every document format.
| Format | Read | Write | Extract Text | Extract Tables | Convert |
|---|---|---|---|---|---|
| Yes | Yes | Yes | Yes | Yes | |
| DOCX | Yes | Yes | Yes | Yes | Yes |
| XLSX | Yes | Yes | Yes | Yes | Yes |
| PPTX | Yes | Yes | Yes | Yes | Yes |
| HTML | Yes | Yes | Yes | Yes | Yes |
| EPUB | Yes | Yes | Yes | - | Yes |
pip install paperjamCLI tool (Rust):
cargo install paperjam-cliimport paperjam
doc = paperjam.open("report.pdf")
docx = paperjam.open("document.docx")
xlsx = paperjam.open("data.xlsx")
pptx = paperjam.open("slides.pptx")doc = paperjam.open("report.pdf")
text = doc.pages[0].extract_text()
tables = doc.pages[0].extract_tables()
md = doc.to_markdown(layout_aware=True)paperjam.convert("report.pdf", "report.docx")
paperjam.convert("data.xlsx", "data.pdf")
paperjam.convert("page.html", "page.epub")# pipeline.yaml
steps:
- open: "reports/*.pdf"
- extract_tables:
strategy: auto
output: tables.csv
- convert:
format: docx
output: "converted/"paperjam pipeline run pipeline.yamlpaperjam extract text report.pdf
paperjam extract tables data.pdf --format csv
paperjam convert report.pdf report.docx
paperjam info document.pdfpip install paperjam-mcpAdd to your MCP client configuration (Claude Code, Claude Desktop, Cursor):
{
"mcpServers": {
"paperjam": {
"command": "uvx",
"args": ["paperjam-mcp", "--working-dir", "."]
}
}
}- Multi-format support -- PDF, DOCX, XLSX, PPTX, HTML, EPUB through one unified API
- Text extraction -- plain text, positioned lines, spans with font info
- Table extraction -- lattice and stream strategies with CSV/DataFrame export
- Format conversion -- convert between any supported formats
- Pipeline engine -- define multi-step document workflows in YAML
- MCP server -- expose document operations as tools for AI agents
- PDF manipulation -- split, merge, reorder, rotate, delete, insert blank pages
- Metadata & bookmarks -- read and edit document properties and outline
- Annotations & watermarks -- add, read, remove annotations; text watermarks
- Forms -- inspect, fill, create, and modify form fields
- Security -- encryption (AES-128/256, RC4), sanitization, true content-stream redaction
- Digital signatures -- sign, verify, and inspect with LTV timestamp support
- PDF/A & PDF/UA -- validation and conversion, accessibility checks
- Native async -- powered by Rust and tokio, no Python thread pools
- CLI tool -- full-featured command-line interface for scripting and automation
- WASM playground -- try it in the browser at docs.byteveda.org/paperjam
Full docs, API reference, and interactive playground at docs.byteveda.org/paperjam.
MIT
