Context
The recent peer review of #208 surfaced two distinct ECMA-376 conformance gaps in validateFieldStructure (#209) that had gone unnoticed because no automated schema validation runs against engine output. The Lean proof work has shown that hand-rolled conformance checks (the validateFieldStructure family) tend to under-specify the actual schema. A general-purpose schema validator catches whole classes of these issues at once.
Real-world precedents:
- docx4j uses a JAXB
JaxbValidationEventHandler against the schema during parsing.
- LibreOffice runs its export through XML attribute-output checks before emission.
- Microsoft Word uses the schema-derived state machine and discards malformed structures.
What this would add
- A CI job that runs every emitted
document.xml (from the integration test corpus) through an ECMA-376 / Open Packaging Conventions validator.
- Candidate validators to evaluate:
officevalidator (Microsoft's tool), xmlstarlet val against the published XSDs, openpackaging-validator (Java).
- The validator runs against the outputs of:
lean-spec-bridge.test.ts, collapsed-field-inplace.test.ts, inPlaceModifier.test.ts, and round-trip-inplace.test.ts.
- Failures fail the CI job and block merge.
Why this is worth doing
Notes
- Schema validation against the full ECMA-376 schema is non-trivial — the schema is huge and validators differ in strictness. Investigate which validator gives the right strictness/false-positive balance before wiring it into required CI.
- This is a defense-in-depth measure, not a substitute for
validateFieldStructure and the per-wrapper neutrality checks. Field-context placement is a semantic constraint the schema can't express on its own.
Ref: #208, #209.
Context
The recent peer review of #208 surfaced two distinct ECMA-376 conformance gaps in
validateFieldStructure(#209) that had gone unnoticed because no automated schema validation runs against engine output. The Lean proof work has shown that hand-rolled conformance checks (thevalidateFieldStructurefamily) tend to under-specify the actual schema. A general-purpose schema validator catches whole classes of these issues at once.Real-world precedents:
JaxbValidationEventHandleragainst the schema during parsing.What this would add
document.xml(from the integration test corpus) through an ECMA-376 / Open Packaging Conventions validator.officevalidator(Microsoft's tool),xmlstarlet valagainst the published XSDs,openpackaging-validator(Java).lean-spec-bridge.test.ts,collapsed-field-inplace.test.ts,inPlaceModifier.test.ts, andround-trip-inplace.test.ts.Why this is worth doing
pipeline.ts(balance, instrText/delInstrText placement, fldChar/del nesting after validateFieldStructure: add w:delInstrText placement check + reject w:fldChar inside <w:del> #209). A schema validator catches: invalid attribute values, structural cardinality violations, undeclared namespaces, mc-preprocessor compatibility violations, etc.Notes
validateFieldStructureand the per-wrapper neutrality checks. Field-context placement is a semantic constraint the schema can't express on its own.Ref: #208, #209.