Skip to content

Add Mac OS encodings #5

@joachimmetz

Description

@joachimmetz

Add Mac OS encodings

Defined by Unicode.org - http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/

  • MacArabic
    • Digits 0x30 - 0x39 are currently not adjusted for use in Arabic-script context
  • MacCeltic
  • MacCentralEurRoman
  • MacChineseSimp (MBC encoding)
  • MacChineseTrad (MBC encoding)
  • MacCroatian
  • MacCyrillic (MacOS >= 9.0)
  • MacDevanagari (MBC encoding)
  • MacDingbats
  • MacFarsi
    • Digits 0x30 - 0x39 are currently not adjusted for use in Arabic-script context
  • MacGaelic
  • MacGreek
  • MacGujarati (MBC encoding)
  • MacGurmukhi (MBC encoding)
  • MacHebrew (+/-)
    • Unicode characters for private usage currently not supported
    • 0x81 maps to U+05F2, U+05B7 though Unicode defines U+FB1F
    • 0xde maps to U+05C7, HEBREW.TXT indicates 0x05B8+0xF87F but also hints at alternate form "qamats qatan"
  • MacIcelandic
  • MacInuit
  • MacJapanese (MBC encoding)
  • MacKorean (MBC encoding)
  • MacRoman (MacOS >= 8.5)
  • MacRomanian
  • MacRussian (MacCyrillicCurrSignStdVariant) - superseded by MacCyrillic (MacOS >= 9.0)
  • MacSymbol (+/-)
    • Unicode characters for private usage currently not supported
  • MacThai (+/-)
    • Unicode characters for private usage currently not supported
  • MacTurkish
  • MacUkrainian (MacCyrillicCurrSignUkrVariant) - superseded by MacCyrillic (MacOS >= 9.0)

Notes:

  • <LR> <CHAR> is encoded as U+202d <CHAR> U+202c
  • <RL> <CHAR> is encoded as U+202e <CHAR> U+202c
  • MacOS seems to use Unicode characters for private usage to tag different "variants" of the same character

TODO: unknown, not defined by Unicode.org, also no mapping data generated by https://github.com/dfirlabs/macos-measurements/blob/main/maccodepage.sh

  • MacArmenian
  • MacBengali
  • MacBurmese
  • MacEthiopic
  • MacExtArabic
  • MacGeorgian
  • MacKannada
  • MacKhmer
  • MacLaotian
  • MacMalayalam
  • MacMongolian
  • MacOriya
  • MacSinhalese
  • MacTamil
  • MacTelugu
  • MacTibetan
  • MacVietnamese

Notes:

  • MacOS does not seem to encode stand-alone versions of Unicode characters into the unknown encodings

TODO, consider adding support for:

  • Before Mac OS 8.5, the codepage character 0xDB was mapped to Unicode character U+00A4 (currency sign).

Keeping notes here: https://forensicswiki.xyz/wiki/index.php?title=Mac_OS_X#Codepages

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions