Python: Decoding Base64 Encoded Strings Within An HTML File And Replacing These Strings With Their Decoded Counterpart
Solution 1:
Your input is a bit oddly formatted (with a trailing unmatched single quote, for instance), so make sure you're not doing unnecessary work or parsing content in a weird way.
Anyway, assuming you have your input in the form it's given, you have to decode it using base64 in the way you just did, then decode using the given encoding to get a string rather than a bytestring:
import base64
inp = 'charset=utf-8;base64,I2JhY2tydW5uZXJfUV81c3R7aGVpZ2h0OjkzcHg7fWJhY2tydW5uZXJfUV81c3R7ZGlzcGxheTpibG9jayFpbXBvcnRhbnQ7fQ=="'
head,tail = inp.split(';')
_,enc = head.split('=') # TODO: check if the beginning is "charset"
_,msg = tail.split(',') # TODO: check that the beginning is "base64"
plaintext_bytes = base64.b64decode(msg)
plaintext_str = plaintext_bytes.decode(enc)
Now the two results are
>>> plaintext_bytes
b'#backrunner_Q_5st{height:93px;}backrunner_Q_5st{display:block!important;}'
>>> plaintext_str
'#backrunner_Q_5st{height:93px;}backrunner_Q_5st{display:block!important;}'
As you can see, the content of the bytes was already readable, this is because the contents were ASCII. Also note that I didn't remove the trailing quote from your string: base64
is smart enough to ignore what comes after the two equation signs in the content.
In a nutshell, strings are a somewhat abstract representation of text in python 3, and you need a specific encoding if you want to represent the text with a stream of ones and zeros (which you need when you transfer data from one place to another). When you get a string in bytes, you have to know how it was encoded in order to decode it and obtain a proper string. If the string is ASCII-compatible then the encoding is fairly trivial, but once more general characters appear your code will break if you use the wrong encoding.
Post a Comment for "Python: Decoding Base64 Encoded Strings Within An HTML File And Replacing These Strings With Their Decoded Counterpart"