Skip to content Skip to sidebar Skip to footer

Beautifulsoup Httpresponse Has No Attribute Encode

I'm trying to get beautifulsoup working with a URL, like the following: from urllib.request import urlopen from bs4 import BeautifulSoup html = urlopen('http://proxies.org') soup =

Solution 1:

It's not working because urlopen returns a HTTPResponse object and you were treating that as straight HTML. You need to chain the .read() method on the response in order to get the HTML:

response = urlopen("http://proxies.org")
html = response.read()
soup = BeautifulSoup(html.decode("utf-8"), "html.parser")
print (soup.find_all('a'))

You probably also want to use html.decode("utf-8") rather than html.encode("utf-8").

Solution 2:

Check this one.

soup = BeautifulSoup(html.read().encode('utf-8'),"html.parser")

Solution 3:

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://proxies.org")
soup = BeautifulSoup(html, "html.parser")
print(soup.find_all('a'))
  1. First, urlopen will return a file-like object
  2. BeautifulSoup can accept file-like object and decode it automatically, you should not worry about it.

Document:

To parse a document, pass it into the BeautifulSoup constructor. You can pass in a string or an open filehandle:

from bs4 import BeautifulSoup

soup = BeautifulSoup(open("index.html"))

soup = BeautifulSoup("<html>data</html>")

First, the document is converted to Unicode, and HTML entities are converted to Unicode characters

Post a Comment for "Beautifulsoup Httpresponse Has No Attribute Encode"