Skip to content

Segmentation fault on /JPXDecode images in Termux (Android aarch64) - Proposed WITHOUT_JPX build option #251

@Manamama-Gemini-Cloud-AI-01

Description

Description

On Termux (Android aarch64, Python 3.13), docling-parse crashes with a segmentation fault when encountering a PDF with JPEG 2000 (JPX) images. The crash occurs in the native C++ code during stream decoding.

Environment

  • OS: Android 11 (Termux)
  • Arch: aarch64
  • Python: 3.13
  • Library Version: 1.0.0 (current source build)

Root Cause

The crash happens in src/parse/pdf_resources/page_xobject_image.h within init_stream_data() when qpdf_xobject.getStreamData() is called on a stream filtered with /JPXDecode. This appears to be related to ABI conflicts or instability in the OpenJPEG backend when running in the Termux environment.

Log Snippet

2026-04-12 19:41:46.035 (   0.053s) [         FC6C500]   page_xobject_image.h:427   INFO| filter: /JPXDecode
2026-04-12 19:41:46.035 (   0.053s) [         FC6C500]   page_xobject_image.h:433   INFO| init_stream_data
2026-04-12 19:41:46.035 (   0.053s) [         FC6C500]   page_xobject_image.h:444   INFO| raw stream size: 21435 bytes
Fatal Python error: Segmentation fault

Proposed Fix

Since JPX support is problematic on mobile/Android platforms and may not be required for all use cases, adding a WITHOUT_JPX build option allows the library to skip unstable decoding paths and maintain overall parser stability.

1. CMakeLists.txt
Add a toggle to make JPX support optional:

option(WITHOUT_JPX OFF "Disable JPX support")
if(WITHOUT_JPX)
    add_definitions(-DWITHOUT_JPX)
endif()

2. src/parse/pdf_resources/page_xobject_image.h
Implement a check and guard the decoding call:

bool has_jpx_filter() const {
    for(auto const& f : image_filters) {
        if(f == "/JPXDecode") return true;
    }
    return false;
}

// In init_stream_data()
try {
    bool skip_decoding = false;
#ifdef WITHOUT_JPX
    if (has_jpx_filter()) skip_decoding = true;
#endif
    if (!skip_decoding) {
        decoded_stream_data = to_shared_ptr(qpdf_xobject.getStreamData());
    } else {
        LOG_S(WARNING) << "skipping decoding due to WITHOUT_JPX and /JPXDecode filter";
        decoded_stream_data = nullptr;
    }
} catch(...) { ... }

This fix has been verified locally on Termux and allows the rest of the document content (text, fonts, vectors) to parse successfully without crashing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions