Skip to content

Fix blocks containing character references#921

Merged
nicholasio merged 7 commits into10up:developfrom
yonran:fix-blocks-containing-character-references
Jul 29, 2025
Merged

Fix blocks containing character references#921
nicholasio merged 7 commits into10up:developfrom
yonran:fix-blocks-containing-character-references

Conversation

@yonran
Copy link
Contributor

@yonran yonran commented Jul 16, 2025

Description of the Change

Fix serialization of blocks containing HTML character references/entities. If the block had HTML within the JSON, the value was mangled and turned into either incorrect JSON or incorrect/insecure HTML.

Previously, the code used wordpress’s esc_attr() to escape the block JSON as an attribute. esc_attr takes a maybe-attribute-escaped string and does a lossy conversion into a valid attribute, encoding special characters such as & but not encoding existing attributes such as &. One should never use esc_attr which guesses the encoding of the source string. Instead, one should know the encoding of the string and either encode to wrap it or decode and encode to convert to a different encoding (see also in the context of character encodings: Joel on Software: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)).

API Change: the tenup_headless_wp_render_blocks_attrs_serialized filter now takes as argument the JSON-encoded wp_json_encode( $block_attrs ) rather than HTML mangled-JSON-encoded esc_attr( wp_json_encode( $block_attrs ) ).

Implementation notes:

  • When converting wordpress blocks to data-wp-block divs using WP_HTML_Tag_Processor, WP_HTML_Tag_Processor->set_attribute is not safe to use because it internally calls esc_attr. So I had to copy the class to change that one line to htmlspecialchars instead.
  • When converting wordpress blocks to data-wp-block divs using DomDocument, call DOMElement->setAttribute instead of setting DOMAttr->value, because DOMAttr->value setter mangles strings containing HTML entities (parses the entities and re-encodes them) (attr.c).

How to test the Change

Changelog Entry

Fixed - Bug fix render_block: don’t mangle blocks containing character references

Credits

Props @yonran

Checklist:

yonran added 3 commits July 15, 2025 18:11
When converting wordpress blocks to data-wp-block divs using WP_HTML_Tag_Processor, use htmlspecialchars instead of esc_attr to encode the characters for an attribute, because esc_attr mangles existing HTML entities. Internally, WP_HTML_Tag_Processor->set_attribute also calls exc_attr but this should be a no-op when it is already encoded for an attribute.

When converting wordpress blocks to data-wp-block divs using DomDocument, call DOMElement->setAttribute instead of setting DOMAttr->value, because DOMAttr->value setter mangles strings containing HTML entities (parses the entities and re-encodes them).
@changeset-bot
Copy link

changeset-bot bot commented Jul 16, 2025

🦋 Changeset detected

Latest commit: 3832361

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@headstartwp/headstartwp Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link

vercel bot commented Jul 16, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
headstartwp-app-router ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 28, 2025 11:44pm
headstarwp ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 28, 2025 11:44pm

@vercel
Copy link

vercel bot commented Jul 16, 2025

@yonran is attempting to deploy a commit to the tenup-internal Team on Vercel.

A member of the Team first needs to authorize it.

@nicholasio
Copy link
Member

Thanks for the Puill Request.

I'll have a look at the issue and the proposed solution tomorrow.

@nicholasio
Copy link
Member

nicholasio commented Jul 21, 2025

This seems to be inline with WordPress docs: https://developer.wordpress.org/apis/security/escaping/#escaping-arbitrary-variable-within-html-attribute-for-use-by-javascript

So wp_json_encode should be enough.

I need to check whether this is a breaking change and also check breaking tests.

Copy link
Member

@nicholasio nicholasio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a failing test that just requires updating the expected markup

--- a/wp/headless-wp/tests/php/tests/TestGutenbergIntegration.php
+++ b/wp/headless-wp/tests/php/tests/TestGutenbergIntegration.php
@@ -167,10 +167,10 @@ RESULT;
                                MARKUP
                        );
                $dom_expected          = <<<RESULT
-                       <p data-wp-block='{"dropCap":false}' data-wp-block-name="core/paragraph">The temperature is 23&deg;C &#9728;&#65039; (sun emoji) and &copy; (copyright symbol). HTML entity for Degrees: &deg;.</p>
+                       <p data-wp-block-name="core/paragraph" data-wp-block='{"dropCap":false}'>The temperature is 23&deg;C &#9728;&#65039; (sun emoji) and &copy; (copyright symbol). HTML entity for Degrees: &deg;.</p>
                        RESULT;
                $html_tag_api_expected = <<<RESULT
-                       <p data-wp-block="{&quot;dropCap&quot;:false}" data-wp-block-name="core/paragraph">The temperature is 23&deg;C &#9728;&#65039; (sun emoji) and &copy; (copyright symbol). HTML entity for Degrees: &deg;.</p>
+                       <p data-wp-block-name="core/paragraph" data-wp-block="{&quot;dropCap&quot;:false}">The temperature is 23&deg;C &#9728;&#65039; (sun emoji) and &copy; (copyright symbol). HTML entity for Degrees: &deg;.</p>
                        RESULT;

                $dom_output = $this->parser->render_block( $html, $block, $instance );

@nicholasio
Copy link
Member

bugbot run

$use_html_tag_api = apply_filters( 'tenup_headless_wp_render_block_use_tag_processor', false );

if ( class_exists( WP_HTML_Tag_Processor::class ) && $use_html_tag_api ) {
if ( $use_html_tag_api ) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: WP HTML Tag Processor Class Check Removal

The removal of the class_exists( WP_HTML_Tag_Processor::class ) check can cause a fatal error on WordPress versions older than 6.2. If the $use_html_tag_api filter is enabled, the code will attempt to instantiate WP_HTML_Tag_Processor without verifying its existence, leading to a fatal error instead of gracefully falling back to the DomDocument implementation.

Locations (1)
Fix in Cursor Fix in Web

use Exception;
use WP_Block;
use WP_HTML_Tag_Processor;
use HeadlessWP\Fixed_WP_HTML_Tag_Processor;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Unused Import Causes Potential Fatal Error

The HeadlessWP\Fixed_WP_HTML_Tag_Processor class is imported but never used. The code continues to use WP_HTML_Tag_Processor for processing, and a separate workaround method (set_block_attributes_tag_api) was implemented instead. This unused import is dead code, likely a remnant of an incomplete fix, and could cause a fatal error if the class does not exist.

Locations (1)
Fix in Cursor Fix in Web

@nicholasio
Copy link
Member

bugbot run

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no bugs!


@yonran
Copy link
Contributor Author

yonran commented Jul 29, 2025

Thanks for fixing the tests and making the solution neater. Looks good.

@nicholasio nicholasio merged commit 2a70173 into 10up:develop Jul 29, 2025
16 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants