Skip to content Skip to sidebar Skip to footer

What Is The Deal About Https When Using Lxml?

I am using lxml to parse html files given urls. For example: link = 'https://abc.com/def' htmltree = lxml.html.parse(link) My code is working well for most of the cases, the ones

Solution 1:

I don't know what's happening, but I get the same errors. HTTPS is probably not supported. You can easily work around this with urllib2, though:

from lxml import html
from urllib2 import urlopen

html.parse(urlopen('https://duckduckgo.com'))

Solution 2:

From the lxml documentation:

lxml can parse from a local file, an HTTP URL or an FTP URL

I don't see HTTPS in that sentence anywhere, so I assume it is not supported.

An easy workaround would be to retrieve the file using some other library that does support HTTPS, such as urllib2, and pass the retrieved document as a string to lxml.

Post a Comment for "What Is The Deal About Https When Using Lxml?"