Skip to content

[format.string.escaped] does not specify boundary conditions for sequences of ill-formed code units #80

@tahonermann

Description

@tahonermann

[format.string.escaped]p2.2 states:

For each code unit sequence X in S that either encodes a single character, is a shift sequence, or is a sequence of ill-formed code units, processing is in order as follows:
What constitutes a "sequence of ill-formed code units" is not specified. That is fine for implementation-defined encodings, but a precise definition could be specified for UTF-8, UTF-16, and UTF-32.

Unicode PR-121 provides a definition for "entire ill-formed subsequence" that is a good candidate for how a "sequence of ill-formed code units" might be defined:

In these policy statements, "entire ill-formed subsequence" refers to all code units in the ill-formed subsequence up to but not including the start of the next well-formed code unit sequence.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions