Fix blocks containing character references#921
Conversation
When converting wordpress blocks to data-wp-block divs using WP_HTML_Tag_Processor, use htmlspecialchars instead of esc_attr to encode the characters for an attribute, because esc_attr mangles existing HTML entities. Internally, WP_HTML_Tag_Processor->set_attribute also calls exc_attr but this should be a no-op when it is already encoded for an attribute. When converting wordpress blocks to data-wp-block divs using DomDocument, call DOMElement->setAttribute instead of setting DOMAttr->value, because DOMAttr->value setter mangles strings containing HTML entities (parses the entities and re-encodes them).
🦋 Changeset detectedLatest commit: 3832361 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
|
@yonran is attempting to deploy a commit to the tenup-internal Team on Vercel. A member of the Team first needs to authorize it. |
|
Thanks for the Puill Request. I'll have a look at the issue and the proposed solution tomorrow. |
|
This seems to be inline with WordPress docs: https://developer.wordpress.org/apis/security/escaping/#escaping-arbitrary-variable-within-html-attribute-for-use-by-javascript So I need to check whether this is a breaking change and also check breaking tests. |
There was a problem hiding this comment.
There's a failing test that just requires updating the expected markup
--- a/wp/headless-wp/tests/php/tests/TestGutenbergIntegration.php
+++ b/wp/headless-wp/tests/php/tests/TestGutenbergIntegration.php
@@ -167,10 +167,10 @@ RESULT;
MARKUP
);
$dom_expected = <<<RESULT
- <p data-wp-block='{"dropCap":false}' data-wp-block-name="core/paragraph">The temperature is 23°C ☀️ (sun emoji) and © (copyright symbol). HTML entity for Degrees: °.</p>
+ <p data-wp-block-name="core/paragraph" data-wp-block='{"dropCap":false}'>The temperature is 23°C ☀️ (sun emoji) and © (copyright symbol). HTML entity for Degrees: °.</p>
RESULT;
$html_tag_api_expected = <<<RESULT
- <p data-wp-block="{"dropCap":false}" data-wp-block-name="core/paragraph">The temperature is 23°C ☀️ (sun emoji) and © (copyright symbol). HTML entity for Degrees: °.</p>
+ <p data-wp-block-name="core/paragraph" data-wp-block="{"dropCap":false}">The temperature is 23°C ☀️ (sun emoji) and © (copyright symbol). HTML entity for Degrees: °.</p>
RESULT;
$dom_output = $this->parser->render_block( $html, $block, $instance );
|
bugbot run |
| $use_html_tag_api = apply_filters( 'tenup_headless_wp_render_block_use_tag_processor', false ); | ||
|
|
||
| if ( class_exists( WP_HTML_Tag_Processor::class ) && $use_html_tag_api ) { | ||
| if ( $use_html_tag_api ) { |
There was a problem hiding this comment.
Bug: WP HTML Tag Processor Class Check Removal
The removal of the class_exists( WP_HTML_Tag_Processor::class ) check can cause a fatal error on WordPress versions older than 6.2. If the $use_html_tag_api filter is enabled, the code will attempt to instantiate WP_HTML_Tag_Processor without verifying its existence, leading to a fatal error instead of gracefully falling back to the DomDocument implementation.
| use Exception; | ||
| use WP_Block; | ||
| use WP_HTML_Tag_Processor; | ||
| use HeadlessWP\Fixed_WP_HTML_Tag_Processor; |
There was a problem hiding this comment.
Bug: Unused Import Causes Potential Fatal Error
The HeadlessWP\Fixed_WP_HTML_Tag_Processor class is imported but never used. The code continues to use WP_HTML_Tag_Processor for processing, and a separate workaround method (set_block_attributes_tag_api) was implemented instead. This unused import is dead code, likely a remnant of an incomplete fix, and could cause a fatal error if the class does not exist.
|
bugbot run |
|
Thanks for fixing the tests and making the solution neater. Looks good. |
Description of the Change
Fix serialization of blocks containing HTML character references/entities. If the block had HTML within the JSON, the value was mangled and turned into either incorrect JSON or incorrect/insecure HTML.
Previously, the code used wordpress’s
esc_attr()to escape the block JSON as an attribute.esc_attrtakes a maybe-attribute-escaped string and does a lossy conversion into a valid attribute, encoding special characters such as&but not encoding existing attributes such as&. One should never useesc_attrwhich guesses the encoding of the source string. Instead, one should know the encoding of the string and either encode to wrap it or decode and encode to convert to a different encoding (see also in the context of character encodings: Joel on Software: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)).API Change: the
tenup_headless_wp_render_blocks_attrs_serializedfilter now takes as argument the JSON-encodedwp_json_encode( $block_attrs )rather than HTML mangled-JSON-encodedesc_attr( wp_json_encode( $block_attrs ) ).Implementation notes:
WP_HTML_Tag_Processor->set_attributeis not safe to use because it internally callsesc_attr. So I had to copy the class to change that one line tohtmlspecialcharsinstead.DomDocument, callDOMElement->setAttributeinstead of settingDOMAttr->value, becauseDOMAttr->valuesetter mangles strings containing HTML entities (parses the entities and re-encodes them) (attr.c).How to test the Change
Changelog Entry
Credits
Props @yonran
Checklist: