Added functionality to more easily extract plain text by TheOriginalBytePlayer · Pull Request #3 · omonien/DX.Pdfium4D

TheOriginalBytePlayer · 2026-01-13T21:38:44Z

This pull request introduces several improvements to the PDF document API, focusing on enhancing the usability and functionality of the TPdfDocument class and exposing additional PDFium text extraction capabilities. The most important changes include updating the way PDF documents are loaded, adding indexed page access, and exposing a new function for retrieving character bounding boxes.

Enhancements to PDF document loading and access:

Changed TPdfDocument.LoadFromFile from a procedure to a function that returns a boolean indicating success, and updated its implementation to set the result based on loading status and page count. [1] [2] [3]
Added an indexed Pages property to TPdfDocument for direct access to pages by index.

PDFium API exposure:

Added a new external procedure FPDFText_GetCharBox to expose PDFium's character bounding box retrieval functionality.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

omonien · 2026-03-31T15:20:41Z

Hi @TheOriginalBytePlayer, thanks for your interest in the project and for taking the time to submit this PR!

After careful review, I've decided not to merge this in its current form for the following reasons:

FPDFText_GetText is already in master - this declaration would be a duplicate.
1. Breaking API change: Changing LoadFromFile from procedure to function: Boolean breaks all existing callers. The current exception-based error handling via EPdfLoadException is the intended pattern and provides richer error information than a boolean return value.
1. Pages[] indexed property: Returning a TPdfPage that the caller must free from an indexed property is a memory-leak risk - users typically don't expect ownership transfer from property getters. I do like the idea though and will implement a cached version where the document manages the page lifecycle internally.
1. .gitignore: The /lib entry was added twice.
  I'll be incorporating a properly designed Pages[] accessor with internal caching in an upcoming commit. Thanks again for your contribution!

Updates

fe01677

Copilot AI review requested due to automatic review settings January 13, 2026 21:38

Copilot started reviewing on behalf of TheOriginalBytePlayer January 13, 2026 21:39 View session

Copilot AI reviewed Jan 13, 2026

View reviewed changes

TheOriginalBytePlayer and others added 5 commits January 13, 2026 14:59

Apply suggestion from @Copilot

93d2f88

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Apply suggestion from @Copilot

ccbfbc5

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Apply suggestion from @Copilot

f6e1d7c

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Apply suggestion from @Copilot

22f97d7

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Apply suggestion from @Copilot

4bb0625

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

omonien closed this Mar 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added functionality to more easily extract plain text#3

Added functionality to more easily extract plain text#3
TheOriginalBytePlayer wants to merge 6 commits intoomonien:masterfrom
TheOriginalBytePlayer:master

TheOriginalBytePlayer commented Jan 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

omonien commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

TheOriginalBytePlayer commented Jan 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

omonien commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants