-
Notifications
You must be signed in to change notification settings - Fork 91
Description
Text chunks can be subdivided into smaller pieces by input boundaries in rewriter.write() and the buffer in TextDecoder. Our own tests incorrectly assumed this never happens (#256).
This arbitrary splitting makes the text chunk handlers much more complicated to use than they seem, because the handlers don't get an equivalent of a single DOM text node. They may be invoked many times on arbitrarily small pieces of text, which could be as small as a single codepoint.
Mutations like .before() and .after() are performed for each arbitrary fragment the handler has been invoked on, not before/after the full run of text between tags. Similarly .replace() replaces each individual bit of text, not the whole run of text, so simply calling chunk.replace("new text") is insufficient and incorrect. You have to have a stateful handler that calls chunk.replace("") on all other pieces.
Splits make text search very tricky. You can't use chunk.as_str().contains("needle"), because the handler could be invoked on "n", "ee", "dle". Search can't be done efficiently with just a state machine, because by the time you find the needle, you may have already "handled" the earlier chunks. So text search requires buffering of the text and removing all text chunks proactively until the match.
This behavior makes text chunk handlers quite different from comment and element handlers.