How To Check The Emoji Property Of A Character In Python?
In unicode a character can have an Emoji property. Is there a standard way in Python to determine if a character is an Emoji? I know of unicodedata, but it doesn't appear to expos
Solution 1:
This is the code I ended up creating to load the Emoji information. The get_emoji
function gets the data file, parses it, and calls the enumeraton callback. The rest of the code uses this to produce a JSON file of the information I needed.
#!/usr/bin/env python3
# Generates a list of emoji characters and names in JS format
import urllib.request
import unicodedata
import re, json
'''
Enumerates the Emoji characters that match an attributes from the Unicode standard (the Emoji list).
@param on_emoji A callback that is called with each found character. Signature `on_emoji( code_point_value )`
@param attribute The attribute that is desired, such as `Emoji` or `Emoji_Presentation`
'''
def get_emoji(on_emoji, attribute):
with urllib.request.urlopen('http://www.unicode.org/Public/emoji/5.0/emoji-data.txt') as f:
content = f.read().decode(f.headers.get_content_charset())
cldr = re.compile('^([0-9A-F]+)(..([0-9A-F]+))?([^;]*);([^#]*)#(.*)$')
for line in content.splitlines():
m = cldr.match(line)
if m == None:
continue
line_attribute = m.group(5).strip()
if line_attribute != attribute:
continue
code_point = int(m.group(1),16)
if m.group(3) == None:
on_emoji(code_point)
else:
to_code_point = int(m.group(3),16)
for i in range(code_point,to_code_point+1):
on_emoji(i)
# Dumps the values into a JSON format
def print_emoji(value):
c = chr(value)
try:
obj = {
'code': value,
'name': unicodedata.name(c).lower(),
}
print(json.dumps(obj),',')
except:
# Unicode DB is likely outdated in installed Python
pass
print( "module.exports = [" )
get_emoji(print_emoji, "Emoji_Presentation")
print( "]" )
That solved my original problem. To answer the question itself it'd just be a matter of sticking the results into a dictionary and doing a lookup.
Solution 2:
I have used the following regex pattern successfully before
import re
emoji_pattern = re.compile("["
u"\U0001F600-\U0001F64F" # emoticons
u"\U0001F300-\U0001F5FF" # symbols & pictographs
u"\U0001F680-\U0001F6FF" # transport & map symbols
u"\U0001F1E0-\U0001F1FF" # flags (iOS)
"]+", flags=re.UNICODE)
Also check out this question: removing emojis from a string in Python
Post a Comment for "How To Check The Emoji Property Of A Character In Python?"