Incorrect properties for characters that are range boundaries in UnicodeData.txt

Full background: https://unix.stackexchange.com/questions/805290/why-doesnt-%e4%b8%80-have-a-numeric-value-in-debians-unicode-utility/805291#805291

As noted there:

```
unicode --format='{numeric_desc}' U+4E00
```

Outputs nothing even though:

```
$ python3 -c 'import unicodedata; print(unicodedata.numeric("\u4e00"))'
1.0
```

The name is also incorrect. For characters in those ranges, the name is meant to be automatically derived:

```
$ unicode --format='{name}\n' U+4E00
<CJK Ideograph, First>
$ python3 -c 'import unicodedata; print(unicodedata.name("\u4e00"))'
CJK UNIFIED IDEOGRAPH-4E00
```

That would affect all those characters whose second field in `UnicodeData.txt` is `<* First>` or `<* Last>`.

Sounds like an easy fix would be not to cache those characters where that second field matches `^<.*(First|Last)>$`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect properties for characters that are range boundaries in UnicodeData.txt #30

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Incorrect properties for characters that are range boundaries in UnicodeData.txt #30

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions