Scrapy: Scraping A List Of Links
This question is somewhat a follow-up of this question that I asked previously. I am trying to scrape a website which contains some links on the first page. Something similar to th
Solution 1:
I'm assuming that the urls you want to follow lead to pages with the same or similar structure. If that's the case, you should do something like this:
from scrapy.contrib.spiders import CrawlSpider
from scrapy.selector import Selector
from scrapy.http import Request
classYourCrawler(CrawlSpider):
name = 'yourCrawler'
allowed_domains = 'domain.com'
start_urls = ["htttp://www.domain.com/example/url"]
defparse(self, response):
#parse any elements you need from the start_urls and, optionally, store them as Items.# See http://doc.scrapy.org/en/latest/topics/items.html
s = Selector(response)
urls = s.xpath('//div[@id="example"]//a/@href').extract()
for url in urls:
yield Request(url, callback=self.parse_following_urls, dont_filter=True)
defparse_following_urls(self, response):
#Parsing rules go hereOtherwise, if urls you want to follow lead to pages with different structure, then you can define specific methods for them (something like parse1, parse2, parse3...).
Post a Comment for "Scrapy: Scraping A List Of Links"