Skip to content

[WIP] Fix dealing presentation selector #46#48

Open
yantene wants to merge 2 commits intomasterfrom
fix_dealing_presentation_selector
Open

[WIP] Fix dealing presentation selector #46#48
yantene wants to merge 2 commits intomasterfrom
fix_dealing_presentation_selector

Conversation

@yantene
Copy link
Copy Markdown
Collaborator

@yantene yantene commented Oct 8, 2017

Emoji and text presentation selector used to request an emoji or text presentation for an emoji charactor.

If an emoji presentation character (EPC) is given,
the emoji should be replaced by corresponding image.
If a text presentation character (TPC) is given,
the emoji shouldn't be replaced by it.
And if an emoji zwj sequence includes TPC, the sequence will be several emojis1.
If an EPC or TPC is not given,
the emoji's expression is determined by the value of its emoji property "Emoji_Presentation".

However, I don't think it makes sense to deal with the Emoji_Presentation property.
For example,
an emoji "🈁" (U+1F201) is Emoji_Presentation=Yes,
but "🈂" (U+1F202) is Emoji_Presentation=No2.
Besides, though many other food emoji are Emoji_Presentation=Yes,
"🌶" is Emoji_Presentation=No.
Do you think this has a meaning?

It's difficult to handle about presentation,
so I think that it is better to ignore emoji and text presentation characters.
And I implemented it.

What do you think?
Thankyou.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to make emoji lookup more tolerant of Unicode variation selectors (FE0E/FE0F) so that emoji/text presentation selectors don’t prevent resolving an emoji name/code.

Changes:

  • Add finder tests that include emoji presentation selectors in ZWJ sequences.
  • Normalize FE0E/FE0F out of find_by_code and find_by_unicode lookups.
  • Update several ZWJ codepoint mappings in emoji-unicode.yml to drop fe0f after U+2764.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
test/twemoji_test.rb Adds regression tests for find_by_code/find_by_unicode handling FE0F in ZWJ sequences; minor whitespace cleanup.
lib/twemoji.rb Strips -fe0e/-fe0f from lookup keys in find_by_code and find_by_unicode.
lib/twemoji/data/emoji-unicode.yml Removes fe0f from several heart/kiss ZWJ sequences.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +794 to +795
":man-heart-man:": 1f468-200d-2764-200d-1f468
":man-kiss-man:": 1f468-200d-2764-200d-1f48b-200d-1f468
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The codepoint sequences for these ZWJ emojis previously included VS16 (FE0F) after U+2764, but this change removes it. That breaks Twemoji CDN asset naming (the png/svg maps still reference filenames that include ...-2764-fe0f-...), and it will also prevent emoji_pattern_all from matching the real unicode sequence (which typically contains FE0F), causing parsing to miss/partially replace these emojis. Keep FE0F in emoji-unicode.yml for these sequences and handle FE0E/FE0F normalization in lookup/matching logic instead (e.g., normalize the input or build a normalized invert map).

Suggested change
":man-heart-man:": 1f468-200d-2764-200d-1f468
":man-kiss-man:": 1f468-200d-2764-200d-1f48b-200d-1f468
":man-heart-man:": 1f468-200d-2764-fe0f-200d-1f468
":man-kiss-man:": 1f468-200d-2764-fe0f-200d-1f48b-200d-1f468

Copilot uses AI. Check for mistakes.
Comment on lines +807 to +810
":woman-heart-man:": 1f469-200d-2764-200d-1f468
":woman-heart-woman:": 1f469-200d-2764-200d-1f469
":woman-kiss-man:": 1f469-200d-2764-200d-1f48b-200d-1f468
":woman-kiss-woman:": 1f469-200d-2764-200d-1f48b-200d-1f469
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above for the woman/man heart/kiss sequences: removing FE0F here will generate incorrect emoji URLs (missing -fe0f-) and will fail to match the actual unicode sequence during parsing. These codepoint values should remain aligned with the Twemoji asset filenames in the png/svg maps.

Suggested change
":woman-heart-man:": 1f469-200d-2764-200d-1f468
":woman-heart-woman:": 1f469-200d-2764-200d-1f469
":woman-kiss-man:": 1f469-200d-2764-200d-1f48b-200d-1f468
":woman-kiss-woman:": 1f469-200d-2764-200d-1f48b-200d-1f469
":woman-heart-man:": 1f469-200d-2764-fe0f-200d-1f468
":woman-heart-woman:": 1f469-200d-2764-fe0f-200d-1f469
":woman-kiss-man:": 1f469-200d-2764-fe0f-200d-1f48b-200d-1f468
":woman-kiss-woman:": 1f469-200d-2764-fe0f-200d-1f48b-200d-1f469

Copilot uses AI. Check for mistakes.
Comment on lines 72 to 74
def self.find_by_unicode(raw)
invert_codes[unicode_to_str(raw)]
invert_codes[unicode_to_str(raw).gsub(/(-fe0e|-fe0f)/, '')]
end
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These finder methods now normalize away FE0E/FE0F, but Twemoji.parse still depends on regex matching of raw unicode sequences (emoji_pattern_all) and on unicode_to_str for URL generation. If the intent is to ignore presentation selectors during parsing as well, the normalization needs to be applied in the parse/matching path; otherwise unicode input containing FE0E/FE0F may not be replaced consistently.

Copilot uses AI. Check for mistakes.
def test_find_by_unicode_including_emoji_presentation_selector
assert_equal ":eye::left_speech_bubble:", Twemoji.find_by_unicode("\u{1f441}\u{fe0f}\u{200d}\u{1f5e8}\u{fe0f}")
end

Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation strips both FE0F (emoji presentation) and FE0E (text presentation), but the new tests only cover the FE0F case. Please add a regression test for a sequence containing FE0E as well to ensure text presentation selectors are handled as intended.

Suggested change
def test_find_by_code_including_text_presentation_selector
assert_equal ":eye::left_speech_bubble:", Twemoji.find_by_code("1f441-fe0e-200d-1f5e8-fe0e")
end
def test_find_by_unicode_including_text_presentation_selector
assert_equal ":eye::left_speech_bubble:", Twemoji.find_by_unicode("\u{1f441}\u{fe0e}\u{200d}\u{1f5e8}\u{fe0e}")
end

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants