Extracting Information From A Table On A Website Using Python, Lxml & Xpath
I managed after lots of hard work to extract some information that i needed from a table from this website: http://gbgfotboll.se/serier/?scr=table&ftid=57108 From the table 'K
Solution 1:
I've just did it:
from io import BytesIO
import urllib2 as net
from lxml import etree
import lxml.html
request = net.Request("http://gbgfotboll.se/serier/?scr=table&ftid=57108")
response = net.urlopen(request)
data = response.read()
collected = [] #list-tuple of [(col1, col2...), (col1, col2...)]
dom = lxml.html.parse(BytesIO(data))
#all table rows
xpatheval = etree.XPathDocumentEvaluator(dom)
rows = xpatheval('//div[@id="content-primary"]/table[1]/tbody/tr')
for row in rows:
columns = row.findall("td")
collected.append((
columns[0].find("a").text.encode("utf8"), # Lag
columns[1].text, # S
columns[5].text, # GM-IM
columns[7].text, # P - last column
))
for i in collected: print i
You could to pass URL in lxml.html.parse() directly rather than call urllib2. Also, you'd grab target table by class attribute, like this:
# new versionfrom lxml import etree
import lxml.html
collected = [] #list-tuple of [(col1, col2...), (col1, col2...)]
dom = lxml.html.parse("http://gbgfotboll.se/serier/?scr=table&ftid=57108")
#all table rows
xpatheval = etree.XPathDocumentEvaluator(dom)
rows = xpatheval("""//div[@id="content-primary"]/table[
contains(concat(" ", @class, " "), " clTblStandings ")]/tbody/tr""")
for row in rows:
columns = row.findall("td")
collected.append((
columns[0].find("a").text.encode("utf8"), # Lag
columns[1].text, # S
columns[5].text, # GM-IM
columns[7].text, # P - last column
))
for i in collected: print i
Post a Comment for "Extracting Information From A Table On A Website Using Python, Lxml & Xpath"