Scrapy: Scraping A List Of Links
This question is somewhat a follow-up of this question that I asked previously. I am trying to scrape a website which contains some links on the first page. Something similar to th
Solution 1:
I'm assuming that the urls you want to follow lead to pages with the same or similar structure. If that's the case, you should do something like this:
from scrapy.contrib.spiders import CrawlSpider
from scrapy.selector import Selector
from scrapy.http import Request
classYourCrawler(CrawlSpider):
name = 'yourCrawler'
allowed_domains = 'domain.com'
start_urls = ["htttp://www.domain.com/example/url"]
defparse(self, response):
#parse any elements you need from the start_urls and, optionally, store them as Items.# See http://doc.scrapy.org/en/latest/topics/items.html
s = Selector(response)
urls = s.xpath('//div[@id="example"]//a/@href').extract()
for url in urls:
yield Request(url, callback=self.parse_following_urls, dont_filter=True)
defparse_following_urls(self, response):
#Parsing rules go here
Otherwise, if urls you want to follow lead to pages with different structure, then you can define specific methods for them (something like parse1, parse2, parse3...).
Post a Comment for "Scrapy: Scraping A List Of Links"