Skip to content Skip to sidebar Skip to footer

Use Scrapy Parse Function To Parse A Specific Url

I have a scrapy crawler which works fine. I now want to use its 'parse' function to parse a given url. While there exists a command line utility to do so for a single url using co

Solution 1:

Managed to figure it out.

Essentially, I just needed to pass the response body, url and the scrapy request to create the response object.

bs = BaseSpider('some')
head = 'www.mywebsite.com'
httpcon = httplib.HTTPConnection(head)
tail = '/mypage.html'
httpcon.request('GET',tail)
sreq = bs.make_requests_from_url(link)
sresp = TextResponse(url=link,status=200,body=httpcon.getresponse(),encoding='utf-8')

Solution 2:

A quick kludge (with pieces from here and here) in case, unlike for OP, subprocess is an option.

importsubprocessbashCommand="Scrapy fetch http://www.testsite.com/testpage.html"
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
page, scrapy_meta_info = process.communicate()

Post a Comment for "Use Scrapy Parse Function To Parse A Specific Url"