Skip to content Skip to sidebar Skip to footer

How To Check The Emoji Property Of A Character In Python?

In unicode a character can have an Emoji property. Is there a standard way in Python to determine if a character is an Emoji? I know of unicodedata, but it doesn't appear to expos

Solution 1:

This is the code I ended up creating to load the Emoji information. The get_emoji function gets the data file, parses it, and calls the enumeraton callback. The rest of the code uses this to produce a JSON file of the information I needed.

#!/usr/bin/env python3
# Generates a list of emoji characters and names in JS format
import urllib.request
import unicodedata
import re, json

'''
Enumerates the Emoji characters that match an attributes from the Unicode standard (the Emoji list).

@param on_emoji A callback that is called with each found character. Signature `on_emoji( code_point_value )`
@param attribute  The attribute that  is desired, such as `Emoji` or `Emoji_Presentation`
'''
def get_emoji(on_emoji, attribute):
    with urllib.request.urlopen('http://www.unicode.org/Public/emoji/5.0/emoji-data.txt') as f:
        content = f.read().decode(f.headers.get_content_charset())

        cldr = re.compile('^([0-9A-F]+)(..([0-9A-F]+))?([^;]*);([^#]*)#(.*)$')
        for line in content.splitlines():
            m = cldr.match(line)
            if m == None:
                continue

            line_attribute = m.group(5).strip()
            if line_attribute != attribute:
                continue

            code_point = int(m.group(1),16)
            if m.group(3) == None:
                on_emoji(code_point)
            else:
                to_code_point = int(m.group(3),16)
                for i in range(code_point,to_code_point+1):
                    on_emoji(i)


# Dumps the values into a JSON format
def print_emoji(value):
    c = chr(value)
    try:
        obj = {
            'code': value,
            'name': unicodedata.name(c).lower(),
        }
        print(json.dumps(obj),',')
    except:
        # Unicode DB is likely outdated in installed Python
        pass

print( "module.exports = [" )
get_emoji(print_emoji, "Emoji_Presentation")
print( "]" )

That solved my original problem. To answer the question itself it'd just be a matter of sticking the results into a dictionary and doing a lookup.


Solution 2:

I have used the following regex pattern successfully before

import re

emoji_pattern = re.compile("["
                               u"\U0001F600-\U0001F64F"  # emoticons
                               u"\U0001F300-\U0001F5FF"  # symbols & pictographs
                               u"\U0001F680-\U0001F6FF"  # transport & map symbols
                               u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
                               "]+", flags=re.UNICODE)

Also check out this question: removing emojis from a string in Python


Post a Comment for "How To Check The Emoji Property Of A Character In Python?"