Skip to content

memory problem in v 5.3.3 #227

@teowave

Description

@teowave

I get these memory errors on documents that work fine with v4.7.3:

[13:33:02 WRN] [PdfProcessor.Docling] [batch] stderr: [info] Using docling default PDF backend (auto-selected): class=DoclingParseDocumentBackend, module=docling.backend.docling_parse_backend, docling-parse=5.3.3 {}
[13:33:02 INF] [PdfProcessor.Docling] [OK] 'redacted1.pdf' passed validation (size=2851361 bytes, ext=.pdf) {}
[13:33:02 WRN] [PdfProcessor.Docling] [batch] Processing 1 files... {}
[13:33:02 WRN] [PdfProcessor.Docling] [batch][progress] (1/1, 0%) starting: C:\demo\basedir1\redacted1.pdf {}
[13:33:02 WRN] [PdfProcessor.Docling] [batch][debug] Processing: input=C:\customer\redacted1.pdf {}
[13:33:02 WRN] [PdfProcessor.Docling] [batch][debug] ocr_root=C:\ocr\basedir1, rel_path=redacted1.pdf {}
[13:33:02 WRN] [PdfProcessor.Docling] [batch][debug] target_dir=C:\ocr\basedir1, base_stem=redacted1 {}
[13:33:02 WRN] [PdfProcessor.Docling] Stage preprocess failed for run 1, pages [9]: std::bad_alloc {}
[13:33:02 WRN] [PdfProcessor.Docling] Stage preprocess failed for run 1, pages [10]: std::bad_alloc {}
[13:33:02 WRN] [PdfProcessor.Docling] Stage preprocess failed for run 1, pages [11]: std::bad_alloc {}
[13:33:02 WRN] [PdfProcessor.Docling] Stage preprocess failed for run 1, pages [12]: std::bad_alloc {}
[13:33:02 WRN] [PdfProcessor.Docling] Stage preprocess failed for run 1, pages [13]: std::bad_alloc {}
[13:33:02 WRN] [PdfProcessor.Docling] Stage preprocess failed for run 1, pages [14]: std::bad_alloc {}
[13:33:02 WRN] [PdfProcessor.Docling] Stage preprocess failed for run 1, pages [15]: std::bad_alloc {}
[13:33:02 WRN] [PdfProcessor.Docling] Stage preprocess failed for run 1, pages [16]: std::bad_alloc {}
[13:33:02 WRN] [PdfProcessor.Docling] Stage preprocess failed for run 1, pages [17]: std::bad_alloc {}
[13:33:02 WRN] [PdfProcessor.Docling] Stage preprocess failed for run 1, pages [18]: std::bad_alloc {}
[13:33:02 INF] [PdfProcessor.Docling] [DatabaseLogger] LogFile status='processing' statusLower='processing' runGuid=300eb86c-e2da-4b1b-8624-c7d46fd0f2d4 {}
[13:33:02 WRN] [PdfProcessor.Docling] Stage preprocess failed for run 1, pages [19]: std::bad_alloc {}
[13:33:02 WRN] [PdfProcessor.Docling] Stage preprocess failed for run 1, pages [20]: std::bad_alloc {}
[13:33:02 INF] [PdfProcessor.Docling] [DatabaseLogger] STAMPING GUID for status 'processing' {}
[13:33:02 INF] [PdfProcessor.Docling] Updated processing log for: redacted1.pdf with status: processing {}
[13:33:02 WRN] [PdfProcessor.Docling] Stage preprocess failed for run 1, pages [21]: std::bad_alloc {}
[13:33:02 WRN] [PdfProcessor.Docling] Stage preprocess failed for run 1, pages [22]: std::bad_alloc {}
[13:33:02 WRN] [PdfProcessor.Docling] Stage preprocess failed for run 1, pages [23]: std::bad_alloc {}

the initial pages of the doc seem to be fine, but the bug sometimes triggers at page 6, sometimes at page 9 etc.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions