(I'm aware that #16 already exists, I though it would be nice to layout a few reasons in an organized fashion)
This PDF library is, in my experience, the best in the business. PDFMiner, with all due respect, is slow, inaccurate, and inconsistent making impossible in some cases to use reliably. Other XPDF/Poppler bindings are outdated and abandoned. Other workarounds (such as those mentioned in #16) are plagued with some of the same issues (mainly inaccuracy).
This is where pdftotext comes in handy. It's fast and gives accurate results. The only problem is that there's a pretty high barrier for being able to use this package. Developers must install a few packages on a Linux system for this package to be built and installed. Windows users, on the other hand are left with no clue on how to install. This could all be mitigated with prebuilt binaries for Windows, but also other platforms.
(I'm aware that #16 already exists, I though it would be nice to layout a few reasons in an organized fashion)
This PDF library is, in my experience, the best in the business. PDFMiner, with all due respect, is slow, inaccurate, and inconsistent making impossible in some cases to use reliably. Other XPDF/Poppler bindings are outdated and abandoned. Other workarounds (such as those mentioned in #16) are plagued with some of the same issues (mainly inaccuracy).
This is where pdftotext comes in handy. It's fast and gives accurate results. The only problem is that there's a pretty high barrier for being able to use this package. Developers must install a few packages on a Linux system for this package to be built and installed. Windows users, on the other hand are left with no clue on how to install. This could all be mitigated with prebuilt binaries for Windows, but also other platforms.