Python And Turkish Capitalization
I have not found a good description on how to handle this problem on windows so I am doing it here. There are two letters in Turkish ı(I) and i (İ) which are incorrectly handled
Solution 1:
You should use PyICU
>>>from icu import UnicodeString, Locale>>>tr = Locale("TR")>>>s = UnicodeString("i")>>>print(unicode(s.toUpper(tr)))
İ
>>>s = UnicodeString("I")>>>print(unicode(s.toLower(tr)))
ı
>>>
Solution 2:
You can define your own hardcoded function for Turkish character problem.
import re
deftr_upper(self):
self = re.sub(r"i", "İ", self)
self = re.sub(r"ı", "I", self)
self = re.sub(r"ç", "Ç", self)
self = re.sub(r"ş", "Ş", self)
self = re.sub(r"ü", "Ü", self)
self = re.sub(r"ğ", "Ğ", self)
self = self.upper() # for the rest use default upperreturn self
deftr_lower(self):
self = re.sub(r"İ", "i", self)
self = re.sub(r"I", "ı", self)
self = re.sub(r"Ç", "ç", self)
self = re.sub(r"Ş", "ş", self)
self = re.sub(r"Ü", "ü", self)
self = re.sub(r"Ğ", "ğ", self)
self = self.lower() # for the rest use default lowerreturn self
regular upper:
>>>print("ulvido".upper())
ULVIDO
our custom upper:
>>>print(tr_upper("ulvido"))
ULVİDO
if you need this conversion a lot you can make it .py file. for example: save it as trtextstyle.py and import into your projects.
if trtextstyle.py is same directory with your file:
from .trtextstyleimport tr_upper, tr_lower
hope this helps.
Solution 3:
deftr_capitalize(param_word):
word_list = param_word.split(sep=" ")
new_word = ""for word in word_list:
first_letter = word[0]
last_part = word[1:]
first_letter = re.sub(r"i", "İ", first_letter)
first_letter = re.sub(r"ı", "I", first_letter)
first_letter = re.sub(r"ç", "Ç", first_letter)
first_letter = re.sub(r"ş", "Ş", first_letter)
first_letter = re.sub(r"ü", "Ü", first_letter)
first_letter = re.sub(r"ğ", "Ğ", first_letter)
last_part = re.sub(r"İ", "i", last_part)
last_part = re.sub(r"I", "ı", last_part)
last_part = re.sub(r"Ç", "ç", last_part)
last_part = re.sub(r"Ş", "ş", last_part)
last_part = re.sub(r"Ü", "ü", last_part)
last_part = re.sub(r"Ğ", "ğ", last_part)
rebuilt_word = first_letter + last_part
rebuilt_word = rebuilt_word.capitalize()
new_word = new_word + " " + rebuilt_word
new_word = new_word.strip()
return new_word
Post a Comment for "Python And Turkish Capitalization"