Skip to content

Make constrainByteLength work#751

Open
tsdko wants to merge 1 commit intoynoproject:masterfrom
tsdko:fix-constrain-byte-length
Open

Make constrainByteLength work#751
tsdko wants to merge 1 commit intoynoproject:masterfrom
tsdko:fix-constrain-byte-length

Conversation

@tsdko
Copy link
Contributor

@tsdko tsdko commented Feb 9, 2026

Might require more testing; I have run the tests on Firefox and Chromium and tested the input manually with Firefox (and IME input with Mozc on Linux) but have not tested on other platforms.


Should hopefully prevent overly long non-ASCII messages from getting eaten during send attempts.

Seems like the current implementation as present in master could have worked with a bit more space in buf (enough to fit the next largest UTF-8 character) and comparing written instead of read as read is in UTF-16 code units instead of bytes, but it still has weird behavior when the caret is not at the end of the string (on regular input it's forced to the end; if you paste something and the entire string is too long the existing text at the end gets cut off).

The behavior of the built-in maxlength attribute is not very consistent across browsers: if the user attempts to replace currently selected text and not even one character from the replacement string fits, Firefox preserves the selection while Chromium discards it instead. This implementation discards the selection.

Shortcoming: hitting the length limit breaks undo (does nothing). (This is a problem with the current implementation as well, it's just a bit more hidden as ASCII inputs get properly constrained via HTML maxlength.)

Test code
// "|" is the caret
const tests = [
  // below the byte limit, unchanged
  "🐱|",       "🐱",
  "あい|",     "あい",
  "abc🐱|",    "abc🐱",
  "abcdあ|",   "abcdあ",
  "abcdefg|",  "abcdefg",
  // above the byte limit, truncated
  "abcdefgh|", "abcdefg",
  "あabcde|",  "あabcd",
  "abcdeあ|",  "abcde",
  "abcd🐱|",   "abcd",
  "あいう|",   "あい",
  "🐱🦈|",     "🐱",
  // above the byte limit, caret in the middle of the string
  "abcd|efgh", "abcefgh",
  "あb|cdef",  "あcdef",
  "abc|deあ",  "abdeあ",
  "abc|d🐱",   "abd🐱",
  "あい|う",   "あう",
  "🐱|🦈",     "🦈",
];
const cbl = constrainByteLength(7);
for (let i = 0; i < tests.length; i += 2) {
  const sel = tests[i].indexOf('|');
  console.assert(sel >= 0, `no caret in ${tests[i]}`);
  const inVal = tests[i].substring(0, sel) + tests[i].substring(sel+1);
  const event = {target: {value: inVal, selectionStart: sel, selectionEnd: sel}};
  cbl(event);
  const actual = event.target.value, expected = tests[i+1];
  console.assert(expected === actual, `expected ${expected}, got ${actual}`);
}

Should hopefully prevent overly long non-ASCII messages from
getting eaten during send attempts.

Seems like the current implementation could have worked with
a bit more space in buf (enough to fit the next largest UTF-8
character) and comparing `written` instead of `read` as `read`
is in UTF-16 code units instead of bytes, but it still has
weird behavior when the caret is not at the end of the string
(on regular input it's forced to the end, if you paste
something the existing text at the end gets cut off if the
entire string is too long).

The behavior of the built-in `maxlength` attribute is not very
consistent across browsers: if the user attempts to replace
currently selected text and not even one character from the
replacement string fits, Firefox preserves the selection while
Chromium discards it instead. This implementation discards the
selection.

Shortcoming: hitting the length limit breaks undo (does nothing).
(This is a problem with the current implementation as well, it's
just a bit more hidden as ASCII inputs get properly constrained
via HTML `maxlength`.)

Test code:

// "|" is the caret
const tests = [
  // below the byte limit, unchanged
  "🐱|",       "🐱",
  "あい|",     "あい",
  "abc🐱|",    "abc🐱",
  "abcdあ|",   "abcdあ",
  "abcdefg|",  "abcdefg",
  // above the byte limit, truncated
  "abcdefgh|", "abcdefg",
  "あabcde|",  "あabcd",
  "abcdeあ|",  "abcde",
  "abcd🐱|",   "abcd",
  "あいう|",   "あい",
  "🐱🦈|",     "🐱",
  // above the byte limit, caret in the middle of the string
  "abcd|efgh", "abcefgh",
  "あb|cdef",  "あcdef",
  "abc|deあ",  "abdeあ",
  "abc|d🐱",   "abd🐱",
  "あい|う",   "あう",
  "🐱|🦈",     "🦈",
];
const cbl = constrainByteLength(7);
for (let i = 0; i < tests.length; i += 2) {
  const sel = tests[i].indexOf('|');
  console.assert(sel >= 0, `no caret in ${tests[i]}`);
  const inVal = tests[i].substring(0, sel) + tests[i].substring(sel+1);
  const event = {target: {value: inVal, selectionStart: sel, selectionEnd: sel}};
  cbl(event);
  const actual = event.target.value, expected = tests[i+1];
  console.assert(expected === actual, `expected ${expected}, got ${actual}`);
}
@zebraed
Copy link
Contributor

zebraed commented Feb 10, 2026

I can help testing with this on another platform, please wait a little while

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants