Description
On Termux (Android aarch64, Python 3.13), docling-parse crashes with a segmentation fault when encountering a PDF with JPEG 2000 (JPX) images. The crash occurs in the native C++ code during stream decoding.
Environment
- OS: Android 11 (Termux)
- Arch: aarch64
- Python: 3.13
- Library Version: 1.0.0 (current source build)
Root Cause
The crash happens in src/parse/pdf_resources/page_xobject_image.h within init_stream_data() when qpdf_xobject.getStreamData() is called on a stream filtered with /JPXDecode. This appears to be related to ABI conflicts or instability in the OpenJPEG backend when running in the Termux environment.
Log Snippet
2026-04-12 19:41:46.035 ( 0.053s) [ FC6C500] page_xobject_image.h:427 INFO| filter: /JPXDecode
2026-04-12 19:41:46.035 ( 0.053s) [ FC6C500] page_xobject_image.h:433 INFO| init_stream_data
2026-04-12 19:41:46.035 ( 0.053s) [ FC6C500] page_xobject_image.h:444 INFO| raw stream size: 21435 bytes
Fatal Python error: Segmentation fault
Proposed Fix
Since JPX support is problematic on mobile/Android platforms and may not be required for all use cases, adding a WITHOUT_JPX build option allows the library to skip unstable decoding paths and maintain overall parser stability.
1. CMakeLists.txt
Add a toggle to make JPX support optional:
option(WITHOUT_JPX OFF "Disable JPX support")
if(WITHOUT_JPX)
add_definitions(-DWITHOUT_JPX)
endif()
2. src/parse/pdf_resources/page_xobject_image.h
Implement a check and guard the decoding call:
bool has_jpx_filter() const {
for(auto const& f : image_filters) {
if(f == "/JPXDecode") return true;
}
return false;
}
// In init_stream_data()
try {
bool skip_decoding = false;
#ifdef WITHOUT_JPX
if (has_jpx_filter()) skip_decoding = true;
#endif
if (!skip_decoding) {
decoded_stream_data = to_shared_ptr(qpdf_xobject.getStreamData());
} else {
LOG_S(WARNING) << "skipping decoding due to WITHOUT_JPX and /JPXDecode filter";
decoded_stream_data = nullptr;
}
} catch(...) { ... }
This fix has been verified locally on Termux and allows the rest of the document content (text, fonts, vectors) to parse successfully without crashing.
Description
On Termux (Android aarch64, Python 3.13),
docling-parsecrashes with a segmentation fault when encountering a PDF with JPEG 2000 (JPX) images. The crash occurs in the native C++ code during stream decoding.Environment
Root Cause
The crash happens in
src/parse/pdf_resources/page_xobject_image.hwithininit_stream_data()whenqpdf_xobject.getStreamData()is called on a stream filtered with/JPXDecode. This appears to be related to ABI conflicts or instability in the OpenJPEG backend when running in the Termux environment.Log Snippet
Proposed Fix
Since JPX support is problematic on mobile/Android platforms and may not be required for all use cases, adding a
WITHOUT_JPXbuild option allows the library to skip unstable decoding paths and maintain overall parser stability.1. CMakeLists.txt
Add a toggle to make JPX support optional:
2. src/parse/pdf_resources/page_xobject_image.h
Implement a check and guard the decoding call:
This fix has been verified locally on Termux and allows the rest of the document content (text, fonts, vectors) to parse successfully without crashing.