Skip to content

Avoid using the terms "latin1" and "iso-8859-1" for isomorphic encoding/decoding #55

@domenic

Description

@domenic

09878c7 made me aware that the functions latin1toString, latin1fromString propagate the confusion between "latin1" and "isomorphic" that we see in a lot of the JavaScript ecosystem, and have tried to help combat in whatwg/encoding@36fb4e7.

In short, the "latin1" encoding specified in the ISO-8859-1 spec does not provide any encodings for the bytes 0x00 to 0x1F or 0x7F to 0x9F. So a proper latin1 decoder would never return those bytes, and a proper latin1 decoder would throw when given those bytes.

In practice, nobody does this, and we have either:

  • Libraries following the windows-1252 mapping (TextEncoder/TextDecoder, the entire web platform, Node.js's modern standard library);
  • or libraries following the isomorphic decoding / encoding (a lot of C++ code, Node.js's old Buffer API).

This creates a lot of confusion when people expect one of these interpretations and get the other.

My strong suggestion is to never mention the terms latin1 or iso-8859-1 in public APIs, since they mean windows-1252 for people who read standards and mean something else (usually isomorphic encoding) for people who are coming from certain C++ codebases. (I think V8 is the original source of the confusion, at least in the Node.js ecosystem.) Instead, use the standard and non-overloaded term "isomorphic".

I realize this is a breaking change and might not be one you want to take on, but I thought I should file it, in the interest of making this the best encoding/decoding library for JS.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions