Skip to content

inplace atomizer: emit fragmented fields per ECMA-376 Part 4 (split <w:ins>/<w:del> at field-character boundaries) #217

@stevenobiajulu

Description

@stevenobiajulu

Context

Surfaced during PR #208 peer review (Codex + Gemini) and an ECMA-376 Part 4 deep-research report. Tracked separately so it doesn't block #208 or #209.

The gap

The current inplace atomizer wraps whole field sequences as a single track-change wrapper, even when only the field's instruction text changes. Verified at:

  • inPlaceModifier.ts:717getAtomRuns returns all field runs as one logical unit.
  • inPlaceModifier.ts:938 — deleted field replay clones the entire field-run sequence inside a single <w:del>.
  • inPlaceModifier.ts:1505, 1671 — run-split logic explicitly skips collapsed fields and field characters, preventing fragmentation.
  • inPlaceModifier.ts:1957, 2300 — insertion/move-destination handlers wrap the whole atom-run set.
  • collapsed-field-inplace.test.ts:211 — existing tests assert multi-run complete field wrappers, not partial-wrapper fragmentation.

Why this is non-conformant

ECMA-376 Part 4 (DeletedFieldCode and fldChar topics):

  • w:fldChar is strictly barred from <w:del> — Microsoft Word treats violations as fatal and discards the field state machine, falling back to literal-text rendering of the field instructions.
  • w:delInstrText MUST reside inside <w:del>.
  • When a field's instruction text is modified under track changes, a conformant emitter must therefore fragment the field across wrapper boundaries: the w:fldChar markers remain unwrapped at the sibling-run level while <w:ins>/<w:del> wrap only the changed w:instrText / w:delInstrText payloads. The canonical fixture from ECMA-376 (FORMCHECKBOX→FORMFIELDTEXT transition):
<w:fldChar w:fldCharType=\"begin\"/>
<w:ins><w:r><w:instrText>FORMCHECKBOX</w:instrText></w:r></w:ins>
<w:del><w:r><w:delInstrText>FORMFIELDTEXT</w:delInstrText></w:r></w:del>
<w:fldChar w:fldCharType=\"separate\"/>

LibreOffice and docx4j enforce this. Pandoc has tracked the equivalent splitting gap for hyperlinks (jgm/pandoc#4609). safe-docx today is in the same non-conformant cohort.

What needs to change

  1. inPlaceModifier.ts run-split logic — remove the explicit field-char and collapsed-field skip at 1505 and 1671. Allow splitting at field-character boundaries when wrapping for ins/del.
  2. Insertion/move-destination handlers (1957, 2300) — when the atom group crosses a field-char boundary, emit sibling-level fldChar runs with <w:ins> / <w:del> wrapping only the changed instrText/result.
  3. Deletion handler (938) — for a deleted complete field, leave the fldChars unwrapped at run-sibling level (or convert to a different representation, e.g., wrap each w:fldChar inside <w:ins> of the post-deletion state) since <w:del> cannot contain w:fldChar. Note: the semantics of "a deleted field" under ECMA-376 are ambiguous (since fldChars can't be deleted); research the right answer per the ECMA-376 follow-up research prompt before implementing.
  4. Tests — add fixtures for: (a) FORMCHECKBOX-style field modification (instruction text rewrite); (b) hyperlink fragmentation; (c) bookmarked field modification.

Acceptance criteria

  • Output of compareDocumentsAtomizer for a field-modification scenario contains unwrapped w:fldChar markers at run-sibling level.
  • Output never contains w:fldChar inside <w:del> (PR validateFieldStructure: add w:delInstrText placement check + reject w:fldChar inside <w:del> #209 already added the runtime check for this; this issue makes the engine never emit it).
  • Microsoft Word renders the field result correctly after accept on a modified-field output.
  • LibreOffice round-trips the modified-field DOCX without discarding the field.

Downstream effects on the Lean proof (PR #208 / PR-B)

When this lands, the Lean fieldContextNeutral ∀ ctx predicate will no longer be satisfied by the engine output — fragmented <w:ins>/<w:del> wrappers containing only w:instrText/w:delInstrText are not neutral under empty ctx. The Lean refactor to weaken to a document-level preservationFriendly property (PR-B in the safe-docx planning) becomes load-bearing at that point. Sequence the two: this issue first, then PR-B if PR-A has merged, OR PR-B in parallel and rebase after this lands.

Sources

Ref: #208, #209, #213.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions