Why Does Python 2.x Throw An Exception With String Formatting + Unicode?
Solution 1:
Python 2 implicitly will try and encode unicode values to strings when you mix unicode and string objects, or it will try and decode byte strings to unicode.
You are mixing unicode, byte strings and a custom object, and you are triggering a sequence of encodings and decodings that doesn't mix.
In this case, your Foo() value is interpolated as a string (str(Foo()) is used), and the u'asdf' interpolation triggers a decode of the template so far (so with the UTF-8 Foo() value) to interpolate the unicode string. This decode fails as the ASCII codec cannot decode the \xe6\x9e\x97 UTF-8 byte sequence already interpolated.
You should always explicitly encode Unicode values to bytestrings or decode byte strings to Unicode before mixing types, as the corner cases are complex.
Explicitly converting to unicode() works:
>>> print"this should break %s %s" % (unicode(Foo()), u'asdf')
this should break 林覺民謝冰心故居 asdf
as the output is turned into a unicode string:
>>> "this should break %s %s" % (unicode(Foo()), u'asdf')
u'this should break \u6797\u89ba\u6c11\u8b1d\u51b0\u5fc3\u6545\u5c45 asdf'while otherwise you'd end up with a byte string:
>>> "this should break %s %s" % (Foo(), 'asdf')
'this should break \xe6\x9e\x97\xe8\xa6\xba\xe6\xb0\x91\xe8\xac\x9d\xe5\x86\xb0\xe5\xbf\x83\xe6\x95\x85\xe5\xb1\x85 asdf'(note that asdf is left a bytestring too).
Alternatively, use a unicode template:
>>> u"this should break %s %s" % (Foo(), u'asdf')
u'this should break \u6797\u89ba\u6c11\u8b1d\u51b0\u5fc3\u6545\u5c45 asdf'
Post a Comment for "Why Does Python 2.x Throw An Exception With String Formatting + Unicode?"