Skip to content Skip to sidebar Skip to footer

Setting Sticky Cookie In Scrapy

The website I am scraping has javascript that sets a cookie and checks it in the backend to make sure js is enabled. Extracting the cookie from the html code is simple enough, but

Solution 1:

I haven't used InitSpider before.

Looking at the code in scrapy.contrib.spiders.init.InitSpider i see:

def initialized(self, response=None):
    """This method must be set as the callback of your last initialization
    request. See self.init_request() docstring for more info.
    """
    self._init_complete = True
    reqs = self._postinit_reqs[:]
    del self._postinit_reqs
    return reqs

def init_request(self):
    """This function should return one initialization request, with the
    self.initialized method as callback. When the self.initialized method
    is called this spider is considered initialized. If you need to perform
    several requests for initializing your spider, you can do so by using
    different callbacks. The only requirement is that the final callback
    (of the last initialization request) must be self.initialized. 

    The default implementation calls self.initialized immediately, and
    means that no initialization is needed. This method should be
    overridden only when you need to perform requests to initialize your
    spider
    """
    return self.initialized()

You wrote:

I can see that the content is available in check_test_page, the cookie works perfectly. But it never even gets to parse_page since CrawlSpider without the right cookie doesn't see any links.

I think parse_page is not called because you didn't make a Request with self.initialized as the callback.

I think this should work:

def check_test_page(self, response):
    if 'Welcome' in response.body:
        return self.initialized()

Solution 2:

It turned out that InitSpider is a BaseSpider. So it looks like 1) there's no way to use CrawlSpider in this situation 2) there's no way to set a sticky cookie


Post a Comment for "Setting Sticky Cookie In Scrapy"