Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List the Chinese characters in Unicode? #605

Open
xfq opened this issue Feb 13, 2024 · 3 comments
Open

List the Chinese characters in Unicode? #605

xfq opened this issue Feb 13, 2024 · 3 comments
Labels
i:encoding Characters & encoding

Comments

@xfq
Copy link
Member

xfq commented Feb 13, 2024

It might be useful to list the Chinese characters in Unicode, like klreq and alreq:

  • The basic set (U+4E00-U+9FA5), i.e., ISO/IEC 10646:1993
  • CJK Unified Ideographs Extension A, i.e., U+3400-U+4DB5 in ISO/IEC 10646:1999
  • U+3400-U+9FFF (BMP Chinese characters)
  • U+20000-U+2FFFF, i.e., CJK Unified Ideographs Extension B to Extension F (Extension I in September 2023), commonly known as the Supplementary Ideographic Plane (SIP)
  • U+30000-U+3FFFF, i.e., CJK Unified Ideographs Extension G to Extension H, commonly known as the Tertiary Ideographic Plane (TIP)
  • CJK Compatibility Ideographs in the Basic Multilingual Plane (U+F900-U+FAFF)
@yisibl
Copy link

yisibl commented Apr 18, 2024

Should CJK Compatibility Ideographs be abandoned?

@xfq
Copy link
Member Author

xfq commented Apr 21, 2024

Should CJK Compatibility Ideographs be abandoned?

There seem to be some standard Chinese characters in CJK Compatibility Ideographs. @eisoch?

@xfq xfq added the i:encoding Characters & encoding label Apr 22, 2024
@AmeroHan
Copy link
Contributor

U+3007 (〇) IDEOGRAPHIC NUMBER ZERO in CJK Symbols and Punctuation (U+3000..U+303F) is also considered a hanzi by
standards, dictionaries and UCS according to 「〇」算不算汉字? - 知乎 (Is “〇” a hanzi? - Zhihu).

Additionally, outside the list @xfq provided above, there are some other characters with script property “Han” in UCD, such as U+3005 (々) IDEOGRAPHIC ITERATION MARK and Suzhou numerals (U+3021..U+3029). Should they be listed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i:encoding Characters & encoding
Projects
None yet
Development

No branches or pull requests

3 participants