Setting Sticky Cookie In Scrapy
The website I am scraping has javascript that sets a cookie and checks it in the backend to make sure js is enabled. Extracting the cookie from the html code is simple enough, but
Solution 1:
I haven't used InitSpider
before.
Looking at the code in scrapy.contrib.spiders.init.InitSpider
i see:
def initialized(self, response=None):
"""This method must be set as the callback of your last initialization
request. See self.init_request() docstring for more info.
"""
self._init_complete = True
reqs = self._postinit_reqs[:]
del self._postinit_reqs
return reqs
def init_request(self):
"""This function should return one initialization request, with the
self.initialized method as callback. When the self.initialized method
is called this spider is considered initialized. If you need to perform
several requests for initializing your spider, you can do so by using
different callbacks. The only requirement is that the final callback
(of the last initialization request) must be self.initialized.
The default implementation calls self.initialized immediately, and
means that no initialization is needed. This method should be
overridden only when you need to perform requests to initialize your
spider
"""
return self.initialized()
You wrote:
I can see that the content is available in
check_test_page
, the cookie works perfectly. But it never even gets toparse_page
sinceCrawlSpider
without the right cookie doesn't see any links.
I think parse_page
is not called because you didn't make a Request with self.initialized
as the callback.
I think this should work:
def check_test_page(self, response):
if 'Welcome' in response.body:
return self.initialized()
Solution 2:
It turned out that InitSpider is a BaseSpider. So it looks like 1) there's no way to use CrawlSpider in this situation 2) there's no way to set a sticky cookie
Post a Comment for "Setting Sticky Cookie In Scrapy"