Skip to content Skip to sidebar Skip to footer

Python And Turkish Capitalization

I have not found a good description on how to handle this problem on windows so I am doing it here. There are two letters in Turkish ı(I) and i (İ) which are incorrectly handled

Solution 1:

You should use PyICU

>>>from icu import UnicodeString, Locale>>>tr = Locale("TR")>>>s = UnicodeString("i")>>>print(unicode(s.toUpper(tr)))
İ
>>>s = UnicodeString("I")>>>print(unicode(s.toLower(tr)))
ı
>>>

Solution 2:

You can define your own hardcoded function for Turkish character problem.

import re

deftr_upper(self):
    self = re.sub(r"i", "İ", self)
    self = re.sub(r"ı", "I", self)
    self = re.sub(r"ç", "Ç", self)
    self = re.sub(r"ş", "Ş", self)
    self = re.sub(r"ü", "Ü", self)
    self = re.sub(r"ğ", "Ğ", self)
    self = self.upper() # for the rest use default upperreturn self


deftr_lower(self):
    self = re.sub(r"İ", "i", self)
    self = re.sub(r"I", "ı", self)
    self = re.sub(r"Ç", "ç", self)
    self = re.sub(r"Ş", "ş", self)
    self = re.sub(r"Ü", "ü", self)
    self = re.sub(r"Ğ", "ğ", self)
    self = self.lower() # for the rest use default lowerreturn self

regular upper:

>>>print("ulvido".upper())
ULVIDO

our custom upper:

>>>print(tr_upper("ulvido"))
ULVİDO

if you need this conversion a lot you can make it .py file. for example: save it as trtextstyle.py and import into your projects.

if trtextstyle.py is same directory with your file:

from .trtextstyleimport tr_upper, tr_lower

hope this helps.

Solution 3:

deftr_capitalize(param_word):
    word_list = param_word.split(sep=" ")
    new_word = ""for word in word_list:
        first_letter = word[0]
        last_part = word[1:]

        first_letter = re.sub(r"i", "İ", first_letter)
        first_letter = re.sub(r"ı", "I", first_letter)
        first_letter = re.sub(r"ç", "Ç", first_letter)
        first_letter = re.sub(r"ş", "Ş", first_letter)
        first_letter = re.sub(r"ü", "Ü", first_letter)
        first_letter = re.sub(r"ğ", "Ğ", first_letter)



        last_part = re.sub(r"İ", "i", last_part)
        last_part = re.sub(r"I", "ı", last_part)
        last_part = re.sub(r"Ç", "ç", last_part)
        last_part = re.sub(r"Ş", "ş", last_part)
        last_part = re.sub(r"Ü", "ü", last_part)
        last_part = re.sub(r"Ğ", "ğ", last_part)


        rebuilt_word = first_letter + last_part
        rebuilt_word = rebuilt_word.capitalize()
        new_word = new_word + " " + rebuilt_word

        
    new_word = new_word.strip()
    return new_word

Post a Comment for "Python And Turkish Capitalization"