Skip to content

arorahrsh/Pdf2Dom

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

145 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pdf2Dom

Build Status

Pdf2Dom is a PDF parser that converts the documents to a HTML DOM representation. The obtained DOM tree may be then serialized to a HTML file or further processed. The inline CSS definitions contained in the resulting document are used for making the HTML page as similar as possible to the PDF input. A command-line utility for converting the PDF documents to HTML is included in the distribution package. Pdf2Dom may be also used as an independent Java library with a standard DOM interface for your DOM-based applications or as an alternative parser for the CSSBox rendering engine in order to add the PDF processing capability to CSSBox.

Pdf2Dom is based on the Apache PDFBox™ library.

See the project page for more information and downloads: http://cssbox.sourceforge.net/pdf2dom

About

Pdf2Dom is a PDF parser that converts the documents to a HTML DOM representation. The obtained DOM tree may be then serialized to a HTML file or further processed. A command-line utility and GUI application for converting the PDF documents to HTML is included in the distribution package. Pdf2Dom may be also used as an independent Java library w…

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Java 99.8%
  • Shell 0.2%