Text chunk handlers are deceptively difficult to use correctly

Text chunks can be subdivided into smaller pieces by input boundaries in `rewriter.write()` and the buffer in `TextDecoder`. Our own tests incorrectly assumed this never happens (https://github.com/cloudflare/lol-html/pull/256).

This arbitrary splitting makes the text chunk handlers much more complicated to use than they seem, because the handlers don't get an equivalent of a single DOM text node. They may be invoked many times on arbitrarily small pieces of text, which could be as small as a single codepoint.

Mutations like `.before()` and `.after()` are performed for each arbitrary fragment the handler has been invoked on, not before/after the full run of text between tags.  Similarly `.replace()` replaces each individual bit of text, not the whole run of text, so simply calling `chunk.replace("new text")` is insufficient and incorrect. You have to have a stateful handler that calls `chunk.replace("")` on all other pieces.

Splits make text search very tricky. You can't use `chunk.as_str().contains("needle")`, because the handler could be invoked on `"n", "ee", "dle"`. Search can't be done efficiently with just a state machine, because by the time you find the needle, you may have already "handled" the earlier chunks. So text search requires buffering of the text and removing all text chunks proactively until the match.

This behavior makes text chunk handlers quite different from comment and element handlers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text chunk handlers are deceptively difficult to use correctly #255

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Text chunk handlers are deceptively difficult to use correctly #255

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions