-
Notifications
You must be signed in to change notification settings - Fork 5
Description
09878c7 made me aware that the functions latin1toString, latin1fromString propagate the confusion between "latin1" and "isomorphic" that we see in a lot of the JavaScript ecosystem, and have tried to help combat in whatwg/encoding@36fb4e7.
In short, the "latin1" encoding specified in the ISO-8859-1 spec does not provide any encodings for the bytes 0x00 to 0x1F or 0x7F to 0x9F. So a proper latin1 decoder would never return those bytes, and a proper latin1 decoder would throw when given those bytes.
In practice, nobody does this, and we have either:
- Libraries following the windows-1252 mapping (
TextEncoder/TextDecoder, the entire web platform, Node.js's modern standard library); - or libraries following the isomorphic decoding / encoding (a lot of C++ code, Node.js's old
BufferAPI).
This creates a lot of confusion when people expect one of these interpretations and get the other.
My strong suggestion is to never mention the terms latin1 or iso-8859-1 in public APIs, since they mean windows-1252 for people who read standards and mean something else (usually isomorphic encoding) for people who are coming from certain C++ codebases. (I think V8 is the original source of the confusion, at least in the Node.js ecosystem.) Instead, use the standard and non-overloaded term "isomorphic".
I realize this is a breaking change and might not be one you want to take on, but I thought I should file it, in the interest of making this the best encoding/decoding library for JS.