Full background: https://unix.stackexchange.com/questions/805290/why-doesnt-%e4%b8%80-have-a-numeric-value-in-debians-unicode-utility/805291#805291
As noted there:
unicode --format='{numeric_desc}' U+4E00
Outputs nothing even though:
$ python3 -c 'import unicodedata; print(unicodedata.numeric("\u4e00"))'
1.0
The name is also incorrect. For characters in those ranges, the name is meant to be automatically derived:
$ unicode --format='{name}\n' U+4E00
<CJK Ideograph, First>
$ python3 -c 'import unicodedata; print(unicodedata.name("\u4e00"))'
CJK UNIFIED IDEOGRAPH-4E00
That would affect all those characters whose second field in UnicodeData.txt is <* First> or <* Last>.
Sounds like an easy fix would be not to cache those characters where that second field matches ^<.*(First|Last)>$.
Full background: https://unix.stackexchange.com/questions/805290/why-doesnt-%e4%b8%80-have-a-numeric-value-in-debians-unicode-utility/805291#805291
As noted there:
Outputs nothing even though:
The name is also incorrect. For characters in those ranges, the name is meant to be automatically derived:
That would affect all those characters whose second field in
UnicodeData.txtis<* First>or<* Last>.Sounds like an easy fix would be not to cache those characters where that second field matches
^<.*(First|Last)>$.