working with xfa forms by vision10 · Pull Request #129 · cantoo-scribe/pdf-lib

vision10 · 2025-12-01T15:25:37Z

What?

First attempt at adding support for preserving XFA (XML Forms) forms and extracting/modifying JavaScript embedded in XFA templates.

Why?

pdf-lib strips by default XFA data when loading and saving PDFs, causing these forms to lose all functionality. Additionally, there was no way to programmatically access or modify the JavaScript code embedded in XFA templates

How?

**Preserv XFA forms **
- Added preserveXFA option to PDFDocument.load() and PDFDocument.save()
- When enabled, preserves the entire XFA array structure from the AcroForm dictionary
- Prevents XFA data loss during PDF modification
XFA JavaScript Extraction (getXFAJavaScripts())
- New method that extracts all JavaScript from XFA template XML
- Parses compressed PDF streams and XML structure
- Returns array of {field: string, event: string, script: string} objects
- Handles XFA's non-standard XML formatting (newlines in closing tags like </script\n>)
- Uses backward search to determine field and event context for each script
XFA JavaScript Modification (setXFAJavaScript(field, event, script))
- New method to modify specific scripts by field name and event name
- Finds matching field/event in XML, replaces script content
- Creates new compressed stream with modified XML
- Returns boolean indicating success/failure
- Preserves XFA structure and all other scripts

Technical details:

XFA data is stored as alternating name/stream pairs in a PDFArray
Template section contains the JavaScript in XML <script> elements
Implemented special regex pattern to handle XFA's malformed XML (<\/script\s*> instead of <\/script>)
Added PDFRef dereferencing for XFA array lookup
Uses decodePDFRawStream to handle compressed streams

Testing?

Unit Tests (7 tests in PDFDocumentXFA.spec.ts)
- ✅ Extract XFA JavaScript from template (29 scripts from test PDF)
- ✅ Returns empty array for non-XFA PDFs
- ✅ Can modify XFA JavaScript
- ✅ Returns false when modifying non-existent field
- ✅ Preserves XFA structure after modification
- ✅ Can save and reload PDF with modified XFA JavaScript
- ✅ Extracts scripts from multiple events on same field
- Uses assets/pdfs/with_xfa_fields.pdf (included in repo)
Integration Testing
- Tested with my own complex pdf and the ready made one from the tests
- Save/reload cycle preserves all modifications

New Dependencies?

No new production dependencies. The implementation uses existing dependencies:

pako (already in dependencies) - for stream compression/decompression
All XFA functionality built using existing pdf-lib core modules

Screenshots

Anything Else?

Documentation updates

Sharcoux · 2026-02-17T14:18:37Z

README.md

+// Make modifications...
+
+// Save with XFA preservation
+const pdfBytes = await pdfDoc.save({ 


Unless I missed it, this is not yet implemented in this PR

Sharcoux · 2026-02-17T14:26:28Z

src/api/PDFDocument.ts

+   * @param newScript The new JavaScript code to set
+   * @returns True if the script was found and updated, false otherwise
+   */
+  setXFAJavaScript(


You will probably need to escape xml from newScript.

Sharcoux · 2026-02-17T14:27:09Z

src/api/PDFDocument.ts

+
+      // Find and replace the script
+      // Note: XFA uses newlines in closing tags like </script\n>
+      const fieldPattern = new RegExp(


You should be able to use XML parsing instead of a regex

Sharcoux · 2026-02-17T14:29:01Z

src/api/PDFDocument.ts

+        : acroForm;
+
+    const xfa = formDict.get(PDFName.of('XFA'));
+    if (!xfa || !(xfa instanceof PDFArray)) {


This will return false if xfa is a PDFRef

Sharcoux · 2026-02-17T14:30:16Z

src/api/PDFDocument.ts

+
+      // Create new stream with modified XML
+      const newXmlBytes = new TextEncoder().encode(xmlString);
+      const newStream = this.context.stream(newXmlBytes);


If the stream is compressed, you'll need to call flateStream instead of streal

Sharcoux · 2026-02-17T14:36:36Z

src/api/PDFDocument.ts

+        const fieldNameMatch = beforeScript.match(
+          /<field[^>]*name="([^"]*)"[^>]*>/gi,
+        );
+        const fieldName = fieldNameMatch


Returning unknown for multiple field might cause a problem when using setXFAJavaScripts, no?

Sharcoux · 2026-02-17T14:37:30Z

src/api/PDFDocument.ts

+   * ```
+   * @returns An array of objects containing script names and their JavaScript code.
+   */
+  getDocumentJavaScripts(): Array<{ name: string; script: string }> {


This doesn't appear in the readme

Sharcoux · 2026-02-17T14:39:03Z

src/api/PDFJavaScriptAction.ts

+        ? nameStr.substring(1)
+        : nameStr;
+      // Decode hex sequences like #28 -> (
+      return withoutSlash.replace(/#([\dA-Fa-f]{2})/g, (_, hex) =>


Hum... Seems fragile.

Sharcoux · 2026-02-17T14:41:01Z

src/api/PDFDocument.ts

+   * ```
+   * @returns An array of objects containing field names, events, and JavaScript code.
+   */
+  getXFAJavaScripts(): Array<{ field: string; event: string; script: string }> {


It is probably possible to share parts of this function with setXFAJavaScripts

Sharcoux · 2026-02-17T14:42:58Z

src/api/PDFDocument.ts

+    fieldName: string,
+    eventName: string,
+    newScript: string,
+  ): boolean {


I wonder if it would not be better to throw an error with details about why it failed instead of returning false. At least, we should consider logging it

vision10 · 2026-03-13T21:15:31Z

thank you for the input, Im not very familiar with pdfs spec
I did some refactoring, hope its better now

github-actions bot added the needs-triage label Dec 1, 2025

Sharcoux requested changes Feb 17, 2026

View reviewed changes

Sorin-nightz added 3 commits March 13, 2026 23:01

xfa forms

6e05c95

xfa forms

341b29e

addresed xfa forms review

80674a2

vision10 force-pushed the master branch from 16e2595 to 80674a2 Compare March 13, 2026 21:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

working with xfa forms#129

working with xfa forms#129
vision10 wants to merge 3 commits intocantoo-scribe:masterfrom
vision10:master

vision10 commented Dec 1, 2025

Uh oh!

Sharcoux Feb 17, 2026

Uh oh!

Sharcoux Feb 17, 2026

Uh oh!

Sharcoux Feb 17, 2026

Uh oh!

Sharcoux Feb 17, 2026

Uh oh!

Sharcoux Feb 17, 2026

Uh oh!

Sharcoux Feb 17, 2026

Uh oh!

Sharcoux Feb 17, 2026

Uh oh!

Sharcoux Feb 17, 2026

Uh oh!

Sharcoux Feb 17, 2026

Uh oh!

Sharcoux Feb 17, 2026

Uh oh!

vision10 commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

vision10 commented Dec 1, 2025

What?

Why?

How?

Testing?

New Dependencies?

Screenshots

Suggested Reading?

Anything Else?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vision10 commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants