What Is The Deal About Https When Using Lxml?
I am using lxml to parse html files given urls. For example: link = 'https://abc.com/def' htmltree = lxml.html.parse(link) My code is working well for most of the cases, the ones
Solution 1:
I don't know what's happening, but I get the same errors. HTTPS is probably not supported. You can easily work around this with urllib2
, though:
from lxml import html
from urllib2 import urlopen
html.parse(urlopen('https://duckduckgo.com'))
Solution 2:
From the lxml
documentation:
lxml can parse from a local file, an HTTP URL or an FTP URL
I don't see HTTPS in that sentence anywhere, so I assume it is not supported.
An easy workaround would be to retrieve the file using some other library that does support HTTPS, such as urllib2
, and pass the retrieved document as a string to lxml
.
Post a Comment for "What Is The Deal About Https When Using Lxml?"