Skip to content

chore(deps): update dependency pdf2json to v4#42

Open
renovate[bot] wants to merge 1 commit intomasterfrom
renovate/pdf2json-4.x
Open

chore(deps): update dependency pdf2json to v4#42
renovate[bot] wants to merge 1 commit intomasterfrom
renovate/pdf2json-4.x

Conversation

@renovate
Copy link
Copy Markdown

@renovate renovate bot commented Oct 12, 2025

This PR contains the following updates:

Package Change Age Confidence
pdf2json 1.2.14.0.2 age confidence

Release Notes

modesty/pdf2json (pdf2json)

v4.0.2: Stable Build v4.0.2

Compare Source

add support for transparent groups, ensure endGroup would merge sub-canvas text/line/etc. back to primary output data. this completes the fix for #​418

v4.0.1: Stable Build v4.0.1

Compare Source

Bug fixes

  1. fix: correct circular dependency without dup](PR #​415)
  2. fix: issue #​418

v4.0.0: Stable Build v4.0.0 [Breaking Changes]

Compare Source

v4.0.0 Release Notes

includes critical fixes for text encoding, space preservation, and text positioning, along with improved error handling. This release contains breaking changes that require attention when upgrading from v3.x.

🚨 Breaking Changes

Text Encoding Change (Issue #​385, PR #​410)

What Changed: Text in JSON output is no longer URI-encoded. All text now outputs as UTF-8 directly.

Why: To properly support Chinese, Japanese, Korean, and other multi-byte Unicode characters. The previous URI encoding caused issues with CJK text display and partial character extraction.

Migration Required: If your code expects URI-encoded text, you must update it to handle plain UTF-8 text.

JSON Output Examples

Before v4.0.0 (URI-encoded):

{
  "Pages": [{
    "Texts": [{
      "R": [{
        "T": "Added%20Text%20from%20Acrobat"
      }]
    }]
  }]
}

After v4.0.0 (UTF-8):

{
  "Pages": [{
    "Texts": [{
      "R": [{
        "T": "Added Text from Acrobat"
      }]
    }]
  }]
}
Code Migration

Before v4.0.0:

// Had to decode URI components
const text = decodeURIComponent(textObj.R[0].T);
// Output: "Added Text from Acrobat"

After v4.0.0:

// Direct text access, no decoding needed
const text = textObj.R[0].T;
// Output: "Added Text from Acrobat"
CJK Character Support

Before v4.0.0:

{
  "T": "%E4%B8%AD%E6%96%87"
}

After v4.0.0:

{
  "T": "中文"
}

✨ Features & Enhancements

Accurate Space Preservation (Issues #​355, #​361, #​319, PR #​411)

Complete overhaul of space detection and preservation in text extraction (test CLI with -c command line option):

  • Glyph-based width calculation - Uses actual font metrics instead of estimates
  • Proper coordinate system handling - Correctly processes scaled positions with unscaled widths
  • Text scale support - Applies textHScale for compressed/expanded text
  • Dynamic Y-tolerance - Font size-aware vertical positioning (fontSize × 0.15)

Impact: Spaces in extracted text (both content.txt and JSON output) now accurately reflect the original PDF layout. Multi-word phrases, tables, and formatted text preserve proper spacing.

Example Output Improvement

Before v4.0.0:

Name:JohnDoeSSN:123-45-6789

After v4.0.0:

Name: John Doe    SSN: 123-45-6789

🐛 Bug Fixes

Text Block Coordinate Accuracy (Issue #​408, PR #​409)
  • Fixed text block coordinate calculations for proper positioning
  • Added comprehensive coordinate tests
  • Ensures accurate x/y values in JSON output
Character Extraction Completeness (Issue #​385, PR #​410)
  • Fixed missing character extraction for glyphs marked as "disabled"
  • Moved text extraction outside glyph.disabled check
  • All visible characters now properly extracted
CLI Error Handling (Issue #​414)
  • Unified error and exception handling for CLI operations
  • Better error messages for invalid input parameters
  • Auto-creates output directory when not specified (removed unnecessary validation)
  • Improved stack trace display
more related issues should have been fixed (needs testing PDFs)
  • #​352 : unexpected space
  • #​291 : problem with sentences broken into 1 word
  • #​272 : unrecognized Text
  • #​220 : two TEXTs unexpected joined together in one RUN
  • #​212 : content is being randomly split into multiple lines
  • #​177 : heading level of text is not captured
  • #​156 : extracting table content
  • #​94 : parser not handling some spaces between words

📦 Dependencies

  • Maintained zero runtime dependencies (since v3.1.6)
  • Updated development dependencies for build tooling

v3.2.2: Stable Build v3.2.2

Compare Source

  • fix #​406
  • refactor: separate out logger functionality from nodeUtil

v3.2.1: Stable build: V3.2.1

Compare Source

  • types update:
    • fix #​392
    • update types for root pdfparser.js
  • feat: add type3 glyph font test support
    • issue fixed: #​389, #​377, #​332
    • architectural compliance, separate the type3 glyph fonts processing from rendering, use standard canvas text rendering pipeline for glyph, tested with /test/pdf/misc/i389_type3_glyph.pdf
  • chores: update README, bump dev dependencies versions while keeping zero dependency

v3.2.0: Stable build v3.2.0

Compare Source

  1. add support for deno and bun plus tests
    -- fix: issue #​68 and #​396
    -- add node:protocol to make them explicit when running in env other than node, including deno and bun
  2. moved root pdfparser source and types to ./src and ./src/types respectively ---- double check your import path please, all exports from ./dist now
  3. reduce distributed package size to 2.1mb, improve pack and build
  4. feat: enable reading multiple pdf files with a single PDFParser object, credit @​nicolabaesso
  5. other chores, including tests, jest upgrade, readme update, etc.

v3.1.6: Stable build v3.1.6

Compare Source

What's Changed

  • zero dependency: remove dependency on @​xmldom/xmldom to make pdf2json zero dependency
  • fix: correct link for open code of conduct #​204
  • Fixed radio/checkbox return values in getAllFieldsTypes(), thanks @​bogie for #​383
  • fix: move package manager version from engines to devEngines, thanks @​styfle for #​387

New Contributors

Full Changelog: modesty/pdf2json@v3.1.5...v3.1.6

v3.1.5: Stable build v3.1.5

Compare Source

feature added:

  1. add commonjs type definition file generation, thanks @​grainrigi
  2. add 'types' to package.json 'exports' root, thanks @​jeremybanka

Issues addressed:

  1. fix #​165: check and make buffer before parse
  2. fix #​373: handle bad encoding expcetion by start page rendering after page operator list is resolved
  3. fix #​306: infinite loop of invalid stram
  4. fix #​369: handle object value for field's rectangle coordinates
  5. other maintenance, eslint, tsconfig, dependency version bumps, etc.

v3.1.4: Stable Build v3.1.4

Compare Source

  • dev-dependency updates for braces,
  • correct import for typescript type to fix #​349: Cannot compile project with 3.1.3
  • plus issues addressed in v3.1.4:
    • #​350: replace nodeUtil.warn with nodeUtil.p2jwarn
    • #​274: Invalid XRef stream
    • #​216: stream must have data, verfied fix

v3.1.3: Stable build v3.1.3

Compare Source

  • eslint is configured and enabled
  • typescript: configured and part of build
  • typescript: updated pdfparser.d.ts with more types
  • typescript: previous lib/p2jcmd*.js are replaced with src/cli/p2jcli*.ts
  • maint: previous root/pdf2json.js is removed, favor bin/pdf2json.js
  • tests: Jest test's Page content are validated with test/data/xxx.json
  • error and exception handling: address the following issues and also added associated test PDFs:
    ** ENOENT: no such file or directory, open '/var/task/../package.json' #​343
    ** Node.js Server got stuck when parsing specific PDF while it is working for other PDFs #​321
    ** TypeError: Cannot read property 'free' of undefined #​318
    ** parserError: 'bad XRef entry' #​277
    ** params.get is not a function #​262
    ** Error: Requesting object that isn't resolved yet #​255

v3.1.2: Stable build v3.1.2

Compare Source

  • add conditional export for both esm and cjs,
  • remove unused dev dependency
  • more tests

v3.1.1: Stable build v3.1.1

Compare Source

This v3.1.1 release replaces pdf2json@​3.1.0.

  • output to both esm and commonJS bundles and source map with rollup
  • bundle outputs directory: ./dist
  • note: previous pdfparser.cjs from root is moved to ./dist/pdfparser.cjs
  • note: previous output bundles are now minified
  • note: previous vows tests are removed, test suits are rewritten in Jest, currently 23 test cases
  • note: npm build is required to run command line, output from build step is not tracked by git
  • more README.md updates and type corrections, thanks @​gladykov @​mkrishnan-codes
  • add env option to disable debugging logs, thanks @​AyresMonteiro

v3.1.0

Compare Source

v3.0.5: Stable build v3.0.5

Compare Source

  1. Add more exported types for TypeScript & update pdfparser.d.ts. thanks to @​adufr and @​tuffstuff9
  2. Add JSDocs to pdfparser.js, thanks to @​nql5161
  3. Add rollup for better ES module and commonJS handling, thanks to @​isimisi
  4. Issue #​313 : enhance parsebuffer logging with default level, thanks to @​nql5161

v3.0.4: Stable Build v3.0.4

Compare Source

  1. issue #​296: correct bin value with .js file extension
  2. make v3.0.4 the latest in npm

v3.0.3: Stable build v3.0.3

Compare Source

Enhancement:

  • Issue #​278: add initial Typescript typings

v3.0.2: Stable Build v3.0.2

Compare Source

Bug fixes:

  • issue #​289: make sure LTS node ver is set in package.json and deprecated MobBlobBuilder is replaced with Blob
  • issue #​293: fix text color, both color idx and hex code that not in color dictionary

v3.0.1: Stable build v3.0.1

Compare Source

dependency update xmldom

v3.0.0: Stable build v3.0.0: ES Module

Compare Source

Breaking changes: converted commonJS to ES Module, see README for details
plus dependency upgrade for security patch and other minor bug fixes

v2.1.0

Compare Source

v2.0.2: Stable build v2.0.2

Compare Source

release/version2 branch: patch security issues in 2.x line. issue #​300

v2.0.1: Stable Build v2.0.1

Compare Source

Patch release, fix value of checkbox and add support for signature field.

  • Better exception handling, including empty page exception. Report exception error via event and avoid crash
  • Add checked value to checkboxes. thanks @​andrewinge
  • Add signature field type and fixed NO_OPS_RANGE. thanks @​cmmcneill

v2.0.0: Stable build v2.0.0 (w/ breaking changes)

Compare Source

Major refactoring since 2015. Full meta support, least dependency, improved exception handling and performance, better stream support and more testings. See readme for details on breaking changes on output JSON format.

v1.3.1

Compare Source

v1.3.0

Compare Source

v1.2.5: Stable build v1.2.5

Compare Source

Better error handling. README updates.

v1.2.4: Stable build v1.2.4

Compare Source

bug fixes and security updates

v1.2.3

Compare Source

v1.2.2

Compare Source


Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

@renovate renovate bot added the renovate label Oct 12, 2025
@renovate renovate bot added the renovate label Oct 12, 2025
@renovate renovate bot force-pushed the renovate/pdf2json-4.x branch from ba3ef24 to 5957ddc Compare January 7, 2026 21:03
@renovate renovate bot force-pushed the renovate/pdf2json-4.x branch from 5957ddc to 5506003 Compare January 17, 2026 05:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant