GitHub - ByteVeda/paperjam: Fast document processing powered by Rust

Fast document processing powered by Rust. One API. Every document format.

Supported Formats

Format	Read	Write	Extract Text	Extract Tables	Convert
PDF	Yes	Yes	Yes	Yes	Yes
DOCX	Yes	Yes	Yes	Yes	Yes
XLSX	Yes	Yes	Yes	Yes	Yes
PPTX	Yes	Yes	Yes	Yes	Yes
HTML	Yes	Yes	Yes	Yes	Yes
EPUB	Yes	Yes	Yes	-	Yes

Installation

pip install paperjam

CLI tool (Rust):

cargo install paperjam-cli

Quick Start

Open any format

import paperjam

doc = paperjam.open("report.pdf")
docx = paperjam.open("document.docx")
xlsx = paperjam.open("data.xlsx")
pptx = paperjam.open("slides.pptx")

Extract text and tables

doc = paperjam.open("report.pdf")

text = doc.pages[0].extract_text()
tables = doc.pages[0].extract_tables()
md = doc.to_markdown(layout_aware=True)

Convert between formats

paperjam.convert("report.pdf", "report.docx")
paperjam.convert("data.xlsx", "data.pdf")
paperjam.convert("page.html", "page.epub")

Run a pipeline

# pipeline.yaml
steps:
  - open: "reports/*.pdf"
  - extract_tables:
      strategy: auto
      output: tables.csv
  - convert:
      format: docx
      output: "converted/"

paperjam pipeline run pipeline.yaml

CLI usage

paperjam extract text report.pdf
paperjam extract tables data.pdf --format csv
paperjam convert report.pdf report.docx
paperjam info document.pdf

MCP server

pip install paperjam-mcp

Add to your MCP client configuration (Claude Code, Claude Desktop, Cursor):

{
  "mcpServers": {
    "paperjam": {
      "command": "uvx",
      "args": ["paperjam-mcp", "--working-dir", "."]
    }
  }
}

Features

Multi-format support -- PDF, DOCX, XLSX, PPTX, HTML, EPUB through one unified API
Text extraction -- plain text, positioned lines, spans with font info
Table extraction -- lattice and stream strategies with CSV/DataFrame export
Format conversion -- convert between any supported formats
Pipeline engine -- define multi-step document workflows in YAML
MCP server -- expose document operations as tools for AI agents
PDF manipulation -- split, merge, reorder, rotate, delete, insert blank pages
Metadata & bookmarks -- read and edit document properties and outline
Annotations & watermarks -- add, read, remove annotations; text watermarks
Forms -- inspect, fill, create, and modify form fields
Security -- encryption (AES-128/256, RC4), sanitization, true content-stream redaction
Digital signatures -- sign, verify, and inspect with LTV timestamp support
PDF/A & PDF/UA -- validation and conversion, accessibility checks
Native async -- powered by Rust and tokio, no Python thread pools
CLI tool -- full-featured command-line interface for scripting and automation
WASM playground -- try it in the browser at docs.byteveda.org/paperjam

Documentation

Full docs, API reference, and interactive playground at docs.byteveda.org/paperjam.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
.github		.github
crates		crates
docs-site		docs-site
examples		examples
mcp-server		mcp-server
py_src/paperjam		py_src/paperjam
scripts		scripts
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Supported Formats

Installation

Quick Start

Open any format

Extract text and tables

Convert between formats

Run a pipeline

CLI usage

MCP server

Features

Documentation

License

About

Uh oh!

Releases 6

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Supported Formats

Installation

Quick Start

Open any format

Extract text and tables

Convert between formats

Run a pipeline

CLI usage

MCP server

Features

Documentation

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages