Skip to content

Converted document does not include all options from dropdown list or scrollable list in original PDF document #1

@aaronbnb

Description

@aaronbnb

PDF Form Conversion Drops <select> Options in Accessible HTML Output

Description

When converting a PDF form into accessible HTML, <select> dropdown fields in the generated HTML only contain the first visually displayed option from the original PDF.

The original PDF contains additional selectable options embedded within the form field metadata, but these options are not being extracted into the resulting HTML.

As a result, the generated accessible form is incomplete and functionally inaccurate.


Expected Behavior

All options associated with the PDF form field should be extracted and represented in the generated HTML <select> element.

Example:

Original PDF form dropdown contains:

  • Option A
  • Option B
  • Option C

Generated HTML should be:

<select>
  <option>Option A</option>
  <option>Option B</option>
  <option>Option C</option>
</select>

Actual Behavior

Only the first visible option is included:

<select>
  <option>Option A</option>
</select>

Technical Notes

The missing dropdown options appear to exist within the PDF form metadata / AcroForm field definitions.

The converter already appears to query PDF metadata for other interactive elements (for example, detecting embedded links). A similar metadata extraction approach should be used for form field option lists.

Potential areas to inspect:

  • AcroForm field dictionaries
  • /Opt entries for choice fields
  • Combo box / list box field definitions
  • XFDF/FDF-derived option values if applicable

Suggested Fix

Enhance the PDF form parsing logic to:

  1. Detect choice/list/combo form fields
  2. Read all available option values from the PDF metadata
  3. Populate the generated HTML <select> with the complete option set
  4. Preserve selected/default values where available

Reproduction Steps

  1. Open a PDF containing a dropdown form field with multiple options
  2. Convert the PDF to accessible HTML
  3. Inspect the generated <select> element
  4. Observe that only the first option is present

Environment

  • PDF-to-accessible-HTML conversion tool
  • Affected element type: PDF form dropdown / combo box fields
  • Generated output: HTML <select> elements

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions