-
Notifications
You must be signed in to change notification settings - Fork 0
feat: integrate docling for document data extraction #76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
Abdeali099
wants to merge
27
commits into
version-15
Choose a base branch
from
use-docling-to-extract-data
base: version-15
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
27 commits
Select commit
Hold shift + click to select a range
ef24085
feat: integrate docling for document data extraction
karm1000 e521e36
chore: ensure developer mode is respected in transaction parsing queue
Abdeali099 093955f
refactor: streamline file processing methods and enhance readability
Abdeali099 cd16721
refactor: enhance type annotations and improve method signatures in F…
Abdeali099 fa12e4d
refactor: streamline PDF processing methods and enhance file handling
Abdeali099 9628d83
fix: add page limit check for PDF processing in FileProcessor
Abdeali099 ae1ea00
feat: add PDF processor selection and integration for enhanced docume…
Abdeali099 5910f00
fix: update default PDF processor to Docling and refine description
Abdeali099 002b433
fix: enhance DoclingPDFProcessor with converter setup and add PDF pip…
Abdeali099 3d43a25
feat: implement PDF processor selection and default setting for enhan…
Abdeali099 7002d84
Merge branch 'version-15' into use-docling-to-extract-data
Abdeali099 04fedc3
fix: refactor DoclingPDFProcessor to import necessary modules locally
Abdeali099 c0ebb89
revert: remove unnecessary request cache decorator from get_pdf_proce…
Abdeali099 51b54c5
chore: add comment to clarify processor resolution order
Abdeali099 d15a87b
fix: optimize DoclingPDFProcessor to use a singleton converter instance
Abdeali099 1e9825d
fix: reset file pointer before returning in OCRMyPDFProcessor
Abdeali099 5b5d7f1
fix: enhance error handling for unsupported spreadsheet file types
Abdeali099 e9004cd
chore: fix typo
Abdeali099 505f7f1
fix: improve formatting and clarity in PDFProcessor documentation
Abdeali099 fef2bb1
fix: add check for existing PDF processor setting before setting default
Abdeali099 fd8d700
fix: enhance error handling in DoclingPDFProcessor for conversion status
Abdeali099 426332e
fix: improve CSV content decoding by refining fallback encodings
Abdeali099 a47f59a
chore: minor fix
Abdeali099 b5690d1
fix: reset file pointer in trim_pages method for proper PDF processing
Abdeali099 984bf6f
chore: minor change
Abdeali099 22b35bc
fix: improve formatting and readability in PDFProcessor documentation
Abdeali099 9f02aaf
chore: minor change
Abdeali099 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| import frappe | ||
|
|
||
| from transaction_parser.transaction_parser.utils.pdf_processor import ( | ||
| DEFAULT_PDF_PROCESSOR, | ||
| ) | ||
Abdeali099 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
|
|
||
| def execute(): | ||
| DOCTYPE = "Transaction Parser Settings" | ||
| FIELD = "pdf_processor" | ||
|
|
||
| if not frappe.db.get_single_value(DOCTYPE, FIELD): | ||
| frappe.db.set_single_value(DOCTYPE, FIELD, DEFAULT_PDF_PROCESSOR) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.