PDF Form Conversion Drops <select> Options in Accessible HTML Output
Description
When converting a PDF form into accessible HTML, <select> dropdown fields in the generated HTML only contain the first visually displayed option from the original PDF.
The original PDF contains additional selectable options embedded within the form field metadata, but these options are not being extracted into the resulting HTML.
As a result, the generated accessible form is incomplete and functionally inaccurate.
Expected Behavior
All options associated with the PDF form field should be extracted and represented in the generated HTML <select> element.
Example:
Original PDF form dropdown contains:
- Option A
- Option B
- Option C
Generated HTML should be:
<select>
<option>Option A</option>
<option>Option B</option>
<option>Option C</option>
</select>
Actual Behavior
Only the first visible option is included:
<select>
<option>Option A</option>
</select>
Technical Notes
The missing dropdown options appear to exist within the PDF form metadata / AcroForm field definitions.
The converter already appears to query PDF metadata for other interactive elements (for example, detecting embedded links). A similar metadata extraction approach should be used for form field option lists.
Potential areas to inspect:
- AcroForm field dictionaries
/Opt entries for choice fields
- Combo box / list box field definitions
- XFDF/FDF-derived option values if applicable
Suggested Fix
Enhance the PDF form parsing logic to:
- Detect choice/list/combo form fields
- Read all available option values from the PDF metadata
- Populate the generated HTML
<select> with the complete option set
- Preserve selected/default values where available
Reproduction Steps
- Open a PDF containing a dropdown form field with multiple options
- Convert the PDF to accessible HTML
- Inspect the generated
<select> element
- Observe that only the first option is present
Environment
- PDF-to-accessible-HTML conversion tool
- Affected element type: PDF form dropdown / combo box fields
- Generated output: HTML
<select> elements
PDF Form Conversion Drops
<select>Options in Accessible HTML OutputDescription
When converting a PDF form into accessible HTML,
<select>dropdown fields in the generated HTML only contain the first visually displayed option from the original PDF.The original PDF contains additional selectable options embedded within the form field metadata, but these options are not being extracted into the resulting HTML.
As a result, the generated accessible form is incomplete and functionally inaccurate.
Expected Behavior
All options associated with the PDF form field should be extracted and represented in the generated HTML
<select>element.Example:
Original PDF form dropdown contains:
Generated HTML should be:
Actual Behavior
Only the first visible option is included:
Technical Notes
The missing dropdown options appear to exist within the PDF form metadata / AcroForm field definitions.
The converter already appears to query PDF metadata for other interactive elements (for example, detecting embedded links). A similar metadata extraction approach should be used for form field option lists.
Potential areas to inspect:
/Optentries for choice fieldsSuggested Fix
Enhance the PDF form parsing logic to:
<select>with the complete option setReproduction Steps
<select>elementEnvironment
<select>elements