Skip to content Skip to sidebar Skip to footer

How Can I Replace Unicode Characters In Python?

I'm pulling Twitter data via their API and one of the tweets has a special character (the right apostrophe) and I keep getting an error saying that Python can't map or character ma

Solution 1:

There seem to be two problems with your program.

Firstly, you are passing the wrong code point to chr(). The hexdecimal code point of the character is 0x2019, but you are passing in the decimal number 2019 (which equates to 0x7e3 in hexadecimal). So you need to do either:

temp = temp.replace(chr(0x2019), "'") # hexadecimal

or:

temp = temp.replace(chr(8217), "'") # decimal

in order to replace the character correctly.

Secondly, the reason you are getting the error is because some other part of your program (probably the database backend) is trying to encode unicode strings using some encoding other than UTF-8. It's hard to be more precise about this, because you did not include the full traceback in your question. However, the reference to "charmap" suggests a Windows code page is being used (but not cp1252); or an iso encoding (but not iso8859-1, aka latin1); or possibly KOI8_R.

Anyway, the correct way to deal with this issue is to ensure all parts of your program (and especially the database) use UTF-8. If you do that, you won't have to mess about replacing characters anymore.

Solution 2:

you can Encode your unicode string to convert to type str :

 a=u"dataàçççñññ"type(a)
a.encode('ascii','ignore')

this way it will delete the special characters will return you 'data'.

other way you can use unicodedata

Solution 3:

unicode_string = unicode(some_string, 'utf-8')if unicode_string != some_string:
    some_string = 'whatever you want it to be'

Post a Comment for "How Can I Replace Unicode Characters In Python?"