Skip to content

[SPARK-57540][SQL] Instantiate Woodstox WstxOutputFactory directly in StaxXmlGenerator#56600

Open
gaogaotiantian wants to merge 1 commit into
apache:masterfrom
gaogaotiantian:SPARK-wstx-output-factory
Open

[SPARK-57540][SQL] Instantiate Woodstox WstxOutputFactory directly in StaxXmlGenerator#56600
gaogaotiantian wants to merge 1 commit into
apache:masterfrom
gaogaotiantian:SPARK-wstx-output-factory

Conversation

@gaogaotiantian

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

In StaxXmlGenerator, the StAX XMLOutputFactory was obtained via XMLOutputFactory.newInstance() and then configured with Woodstox-specific properties from the shaded Hadoop Woodstox (org.apache.hadoop.shaded.com.ctc.wstx.api.WstxOutputProperties).

This PR constructs the shaded Woodstox factory directly:

val factory = new org.apache.hadoop.shaded.com.ctc.wstx.stax.WstxOutputFactory()

instead of relying on XMLOutputFactory.newInstance().

Why are the changes needed?

XMLOutputFactory.newInstance() resolves a concrete StAX implementation through the JAXP/service-loader lookup. If any other (unshaded) StAX provider happens to be on the classpath, it can win that lookup and return a factory that is not the shaded Woodstox. That factory does not understand the shaded WstxOutputProperties.P_OUTPUT_VALIDATE_STRUCTURE / P_OUTPUT_VALIDATE_NAMES keys set immediately afterwards, and Woodstox/StAX throws IllegalArgumentException for unknown property keys.

By instantiating the shaded WstxOutputFactory directly, the factory implementation and the property keys are guaranteed to come from the same shaded Woodstox, eliminating the potential conflict regardless of what else is on the classpath.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing to_xml / XML write tests in XmlSuite cover this code path. No behavior change is expected; the factory used is the same shaded Woodstox that the configured properties already targeted.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.8)

…in `StaxXmlGenerator`

Construct the shaded `WstxOutputFactory` directly instead of resolving a StAX
implementation via `XMLOutputFactory.newInstance()`, so the factory always
matches the shaded `WstxOutputProperties` keys it is configured with.

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants