How Can I Replace Unicode Characters In Python?
Solution 1:
There seem to be two problems with your program.
Firstly, you are passing the wrong code point to chr()
. The hexdecimal code point of the character ’
is 0x2019
, but you are passing in the decimal number 2019
(which equates to 0x7e3
in hexadecimal). So you need to do either:
temp = temp.replace(chr(0x2019), "'") # hexadecimal
or:
temp = temp.replace(chr(8217), "'") # decimal
in order to replace the character correctly.
Secondly, the reason you are getting the error is because some other part of your program (probably the database backend) is trying to encode unicode strings using some encoding other than UTF-8. It's hard to be more precise about this, because you did not include the full traceback in your question. However, the reference to "charmap" suggests a Windows code page is being used (but not cp1252); or an iso encoding (but not iso8859-1, aka latin1); or possibly KOI8_R.
Anyway, the correct way to deal with this issue is to ensure all parts of your program (and especially the database) use UTF-8. If you do that, you won't have to mess about replacing characters anymore.
Solution 2:
you can Encode your unicode string to convert to type str :
a=u"dataàçççñññ"type(a)
a.encode('ascii','ignore')
this way it will delete the special characters will return you 'data'.
other way you can use unicodedata
Solution 3:
unicode_string = unicode(some_string, 'utf-8')if unicode_string != some_string:
some_string = 'whatever you want it to be'
Post a Comment for "How Can I Replace Unicode Characters In Python?"